The EchoUser research team had quite a busy December. Our schedules were filled with recruiting users, drafting test plans, moderating usability sessions, writing reports, and, last but not least, arranging check-in meetings with clients throughout the project cycle.
Clients — regardless of their UX background — would raise questions and concerns about UX methodology in those meetings to make sure that their studies were on the right track and that they would get valuable and defensible data from the projects.
In the two usability projects I am on (both benchmark studies), I came across the following two interesting questions from our clients. Though the two questions seemed to have come from two different angles, they both point to one of the key issues in doing usability studies: how to interpret usability data with a small number of users. I thought I’d share the two client questions and hope to elicit some extended discussions here.
Client Question 1: How many participants is enough for a benchmark usability study? Eight, 10, or 12?
A lot of times, the question actually becomes, “Do we need a single-digit participant number or a double-digit one?” Clients want the usability study results to be defensible both from a statistical and a PR standpoint. When time and resources allow and it’s easy to recruit target participants, the question of “Should we get two more participants for the study?” has an easy solution: Let’s just do two more sessions. However, in a scenario in which qualified participants are very difficult to find or recruit (for instance, the study requires a highly specific user profile) or time and resources are limited, how many participants are needed? Is it worthwhile to spend two more weeks on the study just to make it to a total of 10 participants?
The bigger issue: What is the rationale we should use to validate the number of participants for a usability study?
If we go back to the classic model from Nielsen, five users are enough to uncover 85% of usability issues. That has been the UX industry standard’ for a long time, as Jakob Nielsen and his colleagues were among the first UX professionals to calculate the relationship between the number of UX issues uncovered and the number of participants involved. The mathematical model is derived from their years of experience conducting usability studies. Faulkner challenged Nielsen’s model in 2004 with a paper named “Beyond the five-user assumption: Benefits of increased sample sizes in usability testing.” She carefully designed and conducted a few studies with different sample sizes (5, 10, 20, 30, 40, 50, and 60 participants). What she learned from the follow-up data simulation and analysis is that 10 participants are enough to identify at least 82% of the usability issues, whereas a sample size of 15 can help to identify at least 90% of the issues. I even came across a sample size calculator on Jeff Sauro’s Measuring Usability site. Based on the binomial probability formula, it allows you to calculate, for instance, how many users are needed to discover 80% of the usability issues when all issues’ probability of occurrence is above 30%.
All of the above can be used as reference rationales to validate using a certain number of participants for a study. However, as specifically mentioned in Faulkner’s paper, having a highly representative user sample is crucial in uncovering the priority usability issues. Indeed, beyond all those statistical models, getting the right users is sometimes as important as (if not more important than) getting enough users.
Client Question 2: Are we telling the product team that 80% of our customers will fail to use this functionality because 8 out of 10 users failed in the usability study?
Well, the primary purpose of usability studies is to discover qualitative usability issues with an interface, as opposed to predicting the probability of those issues’ occurrence. However, the task completion rate is one of the key metrics we use to evaluate the usability of different UI features, and it is our responsibility to give clients and the product team a clear idea of how to interpret the completion rate.
The confidence level of the results is, again, closely related to the number of users included in the study. From a statistical standpoint, it’s not difficult to understand that the more users in the study, the more confident we can be in the results. However, with only 10 participants, how confident can we say we are in our results?
John Sorflaten has an interesting article discussing this topic. He put forward the limitation of using task success data to predict customer behavior on a larger scale. He recommended using the Adjusted Wald Interval calculator coded by Jeff Sauro to generate the lower and higher bounds of the task success data.
For instance, if 8 out of 10 participants succeed in a task, how could this data be used to predict 1,000 or 10,000 users’ behavior? By using a confidence level of 95% (if you run the same test 100 times, 95 of the times the results will fall within the acceptable +/- margin), Jeff’s calculator generates a lower bound of 48% success and a higher bound of 96% success based on the 80% task success rate from the usability study and accounting for the small sample size. And the same is true if 8 out of 10 participants fail in a task: The calculator predicts a chance of as few as 48% or as many as 96% of users failing the task when the UI is actually released and on the market.
In that sense, as opposed to using the 80% task success rate to predict broader user behavior, we as usability professionals can show the range between 48% and 96% as a reference range for the product manager or marketing team to make further interpretations or decisions.
Next time, when clients are debating between 8 or 10 participants, or the product manager is asking why the task completion rate does not match large-scale user data, these basic stats will help to answer the questions.

In case you didn’t know, Facebook launched another redesign of its service a few days ago. Typical of The Book, it did it without warning, and shuffled things around just enough that everyone I know has something to say about it. If Zuck is looking for a galvanized public response, he definitely got it.
It’s interesting to see how Facebook being a monopoly is affecting how it approaches design. By all accounts, the good people on the design team are doing their best to be collaborative, with the implied hope that this will lead to collaborative — and, more importantly, effective — design. This hope is a false one, I’d wager, because collaborative design in a vacuum is still design within a vacuum. It might be fun for people on the inside, but it sure feels authoritarian for everyone else.
So anyway. I decided I’d put Facebook’s redesign approach in perspective with a very loose analogy:
Let’s take this scenario: You ride the same bus to work every day. When the bus service first came to your neighborhood, you were really excited to ride it every day, but now you appreciate it for being mostly on time and getting you where you need to go without any fuss. It also has what every bus should: functional aisle seating, windows that open, buttons to request a stop, and some nice features like ads on flatscreens to keep you entertained. Add to that the regulars with whom you like to chat every once in a while, and you’d say the overall bus experience is pretty good.
Then picture this: One day, as you’re stepping onto your usual bus, you have a strange feeling that something’s…different. For one, the seats are all shoved toward the back, making it really hard to sit down as you have to push past the throngs in the way. Second, you notice that while some windows still open, others don’t any more (and some are even missing). The bus request button has been replaced with a pull-down cable thingy, and you see that somebody decided to leave a pamphlet explaining the change. This wouldn’t ordinarily bother you, except that the pamphlet doesn’t say much, and it’s plastered every few inches along the cable (which makes it tough to actually call a stop).
Now you’re trapped. Trapped on a bus that was once familiar to you but is now different — and you’re feeling frustrated because there isn’t another bus service available that is as reliable or has as many friendly faces in it on a regular basis. You consider writing a note to the head of the bus company, but since you know he has a bad reputation when it comes to actually listening to his customers, you decide instead to resign yourself to your fate and find ways of tolerating this semi-new bus lest you go insane.
Trapping consumers is never acceptable. Facebook, are you listening?
what’s interesting is the whole “users being trapped” by a monopoly blended with the “silicon valley innovative, throw-things-out-there” culture. Companies like Google & Facebook are so used to running their businesses in a way where they just throw a bunch of ideas out there and see what sticks. Historically this has worked really well for them. But when they start doing this to a user base who feels like they can’t leave the service, it turns into big problems a la Google Buzz.
Konigi, a designer who always posts interesting insights and contributions to the design landscape, recently wrote an article about his thoughts on Thomas Malone’s paper about design & learning from games. Definitely worth a read.
The concept he focused on is the idea of designing a multi-layer experience, where the majority of users can be satisfied with the simple features, but power users can “unlock” more advanced features, as they do when gaming.
The section below that he quoted from Malone’s article definitely reminded me of usability tests I’ve done:
In a sense, a good game is intentionally made difficult to play, but a tool should be made as easy as possible to use. This distinction helps explain why some users of complex system may enjoy mastering tools that are extremely difficult to use. To the extent that these users are treating the systems as toys rather than tools, the difficulty increases the challenge and therefore the pleasure of using the systems.
It brought me back to those times I’ve tested particularly technical products, like testing the set-up and configuration of routers. Sometimes the participants (usually engineers) being tested will say things like “oh yeah, this is fine, I like it this way” even though the system seems unnecessarily difficult to use. They’ll start acting like they’re in a game, saying things like “I know I can find this” or “maybe if I use the command line or <insert some other incredibly convoluted route here>, I’ll be able to finish this task.” And this type of person won’t get frustrated – they’ll get stimulated. And they’ll feel special and smart because they’re figuring it out.
Now, I’m certainly not advocating making an unnecessarily convoluted system for power users. But I think this speaks even more to the necessity of user research before design. It’s important to address and simplify the usability issues. But it’s also important to identify the key frustrations and pain points of users. For example, maybe you looked over a software system, did a quick heuristic analysis, and noticed 5 usability issues. It’s not until you talk to the user that you would learn that 2 of the issues you thought were more minor are actually super frustrating to the users because they’re annoying details they have to do 10 times a day. And maybe one of the other issues is not necessarily an issue for this user group but a challenge that contributes to the users’ sense of worth and achievement in their field.
right, how to design a simple, intuitive interface while seamlessly integrating ways to unlock or reveal advanced features is definitely difficult and intriguing. This post is a bit more a tangent about those users who think of more convoluted interfaces as proving their worth, etc. All inter-related and interesting.
Clumsy last sentence there. Really meant to say “…so there’s a tradeoff, and it’s important to know how to select the right defaults and execute on the powerful and more advanced stuff.”
I’m with you. Identifying key points of pain is in the same category to me as finding the tip of the iceberg. I feel like that’s a huge part of my job, and why I do customer service as a large part of my work every day. If a point of pain is expressed repeatedly and often enough, it’s likely to be happening because the feature is being utilized heavily by an important user group, and I can take that as an opportunity to address the issue.
I certainly wouldn’t advocate for making difficult interfaces on purpose, but I would advocate for making visible what’s necessary and important given a product’s purpose and the given the expectations of the primary personae. Figuring out how to progressively expose or activate the features that aren’t utilized heavily is the challenge that I’m most concerned with. Having a huge tool that meets and exposes every need comes with a cost in terms of simplicity, so the trade off is in knowing how to select the right defaults, and when and how to execute on the powerful and more advanced stuff.