An analysis of smaller sample groups for greater outcomes
Abstract
Performing usability testing on prototypes for an application will reduce costs, find problems early, increase user loyalty, and in for-profit companies, increase your revenue. It is more expensive to fix problems in code than it is to fix it prior. Testing on large populations of users can be expensive, so determining how many to test is important when reducing costs and increasing the number of problems found in design. A smaller sample size such as five can find 80% of your problems. Companies can save money by having fewer participants that will still give large insights when conducting usability studies. Higher risk applications will need to be tested with 15 users to give 99% accuracy into the problems faced by a typical user.
Introduction
When the web was in its infancy in the ’80s and ’90s, technology was the driving factor in why someone used or purchased a device. However, as we have gotten accustomed to technology in our lives, we don’t buy or use the latest thing because it can do x or y. We buy or use technology based on its appearance and ease of use. People have higher expectations for their software. They expect it to be thoroughly vetted, crowdsourced and tested unlike the wild west days of early software development. Once a user visits a site with poor user experience, they aren’t going to go back in a few months or ever. They will have moved on to the competitor with better user experience. User-centered design has become a buzzword for good reason. People want to use something that can integrate seamlessly into their lives[1]. Take, for example, the smartphone. How did we ever live without it?
The design should be so good that you don’t notice it unless you’re looking for it. This leads us to question: how do technology providers get to that point? The answer is through user research, prototyping, and usability testing.
Background Information
User research entails finding your target audience, interviewing them and watching how they work. Once you’ve interviewed a few people in different groups that make up your target audience, you can find out their demographics, technology experience, motivations, and pain points.[2] This information can build your personas. Having a well-developed persona will help the designers build solutions that are empathetic and targeted to a specific user. For example, you have a persona for Bob. He is in his mid-twenties, has a smartphone, and is an architect. Bob is getting married to his college sweetheart Beth. They are planning a summer wedding and were told by their officiant to go to the courthouse nearby and purchase a marriage license at least 72 hours before their wedding. He has never been to a courthouse except for jury duty. He doesn’t know where to go once he’s there or how he converts the license to a certificate for his fiancé’s name change.
Another factor of the interview process is to test out your initial hypothesis through observations. Watch the interviewees and see how they work inside and outside of your planned solution. For example, if you are trying to improve the marriage certificate process at the local courthouse, you will, of course, need to know where the couples get the forms to apply for the marriage license. What may get overlooked is what happened before and what happened after. If you asked Bob and Beth where they learned about the marriage certificate process, you will find out their marriage officiator told them where and when to apply. Then you follow them to the courthouse, watch them apply and see what materials they receive. Is it clear what they do next to receive the official marriage certificate? What type of payments are accepted? Does it match what is on the website, the paperwork, and what the clerk told them? Does their mood change with each process? What does Beth need to do to change her name on her government documents? Is that process clear? All of those questions will be answered by spending time with users.
The next step in the user experience process is prototyping. Prototyping is a draft design concept created after sketching through black and white wireframing and high fidelity mockups to show how an application will look like and interact before any code is put in place.[3]
Usability testing is where participants/users, who closely resemble the persona, perform task analysis while talking aloud with a facilitator, who is usually a user researcher, while observers watch and take notes.[4] There are two ways to do a usability test. The first is the traditional method and the second is the RITE method.[5] The traditional method is where you run your application through all participants and then prioritize to make changes to the application based on the test. In the RITE method, you test the application with one user, determine a solution to a problem found in the days’ test, then test the solution with subsequent participants.[6] This method is a lot quicker since the usability designer can iterate quickly, solve problems faster, determine if the new way is better without having to wait to test many users in another cycle and delay the development process.
A good usability expert can help you prioritize problems to fix. You may not want to fix a problem that only one participant finds. The RITE method may work better after more than one participant finds the same problem. Once fixed, test again to make sure the fix helps reduce errors.
Figure 1: RITE testing versus “standard” usability testing [7]
You might think user research with usability testing would be an easy sell to business leaders, but many project managers and developers see the user experience designers and engineers as people that get in their way. “There is no time to do it” or “Let’s do it after we develop because we have to get the Application/Hardware out” they say. An important statistic found by Bias & Mayhew is “The rule of thumb in many usability-aware organizations is that the cost-benefit ratio for usability is $1: $10-$100. Once a system is in development, correcting a problem costs 10 times as much as fixing the same problem in design. If the system has been released, it costs 100 times as much relative to fixing in design. (Gilb, 1988)”[8]
What business leaders may not realize, is there are ways to do it quickly with maximum benefit and in the long run, the investment in usability will save the business money.
Method
Once you get approval for usability testing. How many users do you need to test in your sample to get the maximum benefit? The researchers found the usability problems of an application first through heuristic studies, and then tested the application with users to see if they found the problems during testing. Per Determining Usability Test Sample Size, “Observing four or five participants allows practitioners to discover 80% of a product’s usability problems.[9]” In the table from Faulkner, when five users were tested, the minimum percent of problems found in a study were 55%, and the mean was 85.55%[10]. When the study reached 10 users, the minimum of problems found were 82% and mean was 94.686%. As you can see from Faulkner’s study, that if you test between 5-10 users, you will find minimally 55-82% of the problems.
Table 1: Abstract from Faulkner (2003) [11]
No. Users | Minimum % Found | Mean % Found |
---|---|---|
5 | 55 | 85.55 |
10 | 82 | 94.686 |
15 | 90 | 97.05 |
20 | 95 | 98.4 |
30 | 97 | 99 |
40 | 98 | 99.6 |
50 | 98 | 100 |
In order to find out the recommended sample size to test one problem that you found your users are calling to complain about, you would use the cumulative binomial probability formula is below:[12]
1 – (1 – p)n
Table 2:[13]
Number of Participants | Likelihood of Link Name Confusion | The Calculation | Probability of Users Finding the Usability Problem |
---|---|---|---|
1 | 50% (.50) | 1 – (.50)1 – .50 | 50% (.50) |
2 | 50% (.50) | 1 – (.50)2 – .75 | 75% (.75) |
3 | 50% (.50) | 1 – (.50)3 – .87 | 87% (.87) |
5 | 50% (.50) | 1 – (.50)5 – .97 | 97% (.97) |
6 | 50% (.50) | 1 – (.50)6 – .98 | 98% (.98) |
7 | 50% (.50) | 1 – (.50)7 – .99 | 99% (.99) |
8 | 50% (.50) | 1 – (.50)8 – .99 | 99% (.99) |
Table 3[14] is an example of testing the application with eight participants where the tester knows beforehand that they have four different problems through heuristic studies. Then the table charts out the probability of each participant finding a problem. Problem 2 and 4 will most likely not count as substantial enough to fix since only one participant found that an issue. In Determining Usability Test Sample Size, the researchers removed the two problems to calculate an adjusted p to equal 0.25. By using the binomial formula: 1-(1-0.25)8 = 0.9 the problem discovery comes to 90% which is good if that was the goal for the test.[15]
Table 3:[16]
Data from a Hypothetical Usability Test with Eight Subjects, pest = .375Problem Number
Subject | 1 | 2 | 3 | 4 | Count | p |
---|---|---|---|---|---|---|
1 | 1 | 0 | 1 | 0 | 2 | 0.5 |
2 | 1 | 0 | 1 | 1 | 3 | 0.75 |
3 | 1 | 0 | 0 | 0 | 1 | 0.25 |
4 | 0 | 0 | 0 | 0 | 0 | 0 |
5 | 1 | 0 | 1 | 0 | 2 | 0.5 |
6 | 1 | 0 | 0 | 0 | 1 | 0.25 |
7 | 1 | 1 | 0 | 0 | 2 | 0.5 |
8 | 1 | 0 | 0 | 0 | 1 | 0.25 |
Count | 7 | 1 | 3 | 1 | ||
P | 0.875 | 0.125 | 0.375 | 0.125 | 0.375 |
Data and Calculations
Based on these simulations, I created my own Monte Carlo simulation of 15 participants finding problems 1-4. My problem discovery rate was 99%, which means 15 would need to be the largest number of participants to test to reduce risk on a high-risk project. This is in line with the research conducted by Nielson in Why You Only Need to Test with 5 Users, where he states “test with at least 15 users to discover all the usability problems.”
Table 4: Monte Carlo Simulated Test
Problems | Frequency | Probability |
---|---|---|
1 | 10.00 | 0.67 |
2 | 2.00 | 0.13 |
3 | 3.00 | 0.20 |
4 | 0.00 | 0.00 |
Sum | 15.00 | |
Average | 0.25 | |
Problem Discovery | 0.99 |
Higher risk projects may need to test their applications with 15 participants. For example, on an Army project DCGS per NY Post[17], an “Army testing report [found DCGS] ‘not operationally effective, not operationally suitable and not survivable.’” Soldiers were dying because of an untested software where the Army blames it on poor training instead of poorly designed software. The Army put the soldier’s lives at risk by not testing the software with any users. By reducing the mistakes a user encounters through usability testing of 15 participants, they could follow the Lean Six Sigma method where usability testing is in your measure step, and the defect rate will decrease to get to the goal of 3.4 defects per million.[18]
Conclusion
The more users you test, the risk of finding undiscovered problems later in the system development lifecycle will be reduced. Testing minimally of five for a low-pressure application is reasonable, whereas more participants may be necessary depending on the problem and how many are already finding it. If you are working on a project where any errors can result in life or death, such as on a military project, then testing as many users as possible on your application is imperative. If a “likelihood of link name confusion” from Table 2, could cause your user to injure or worse die, the testing goal should be with 99% accuracy. In that case, you would want to test with minimally seven participants and maximum 15 as simulated in Table 4. For low-risk projects, testing with five users is enough. Usability testing is important, and companies can test their products on a few users to gain large insights on customer behaviors, which will aid in their success.
Sources
Associated Press. (2014, October 27). Army spent $5B on failed technology created by vets. Retrieved May 7, 2017, from New York Post: http://nypost.com/2014/10/27/army-spent-5b-on-failed-technology-created-by-vets/
Bailey, P. B. (2006, September 1). Determining the Correct Number of Usability Test Participants. Retrieved May 7, 2017, from usability.gov: https://www.usability.gov/get-involved/blog/2006/09/correct-number-of-test-participants.html
Bias, R. G., & Mayhew, D. J. (2005). Cost-Justifying Usability: An Updated for the Internet Age (2 ed.). (M. Buehler, Ed.) San Francisco, CA: Morgan Kaufmann Publishers.
Macefield, R. (2009, November). How To Specify the Participant Group Size for Usability Studies: A Practitioner’s Guide. Journal of Usability Studies, 5(1), 34-45.
Nielson, J. (2000, March 9). Why You Only Need to Test with 5 Users. Retrieved from Nielson Norman Group: https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/
Nielson, J. (2003, November 24). Two Sigma: Usability and Six Sigma Quality Assurance. Retrieved from Nielson Normal Group: https://www.nngroup.com/articles/usability-and-six-sigma/
Turner, C. W., Lewis, J. R., & Nielson, J. (2006). Determining Usability Test Sample Size. In N/A, & W. Karwowski (Ed.), International Encyclopedia of Ergonomics and Human Factors (2 ed., Vol. 3, pp. 3084-3088). Boca Raton, FL: CRC Press.
U.S. Department of Health and Human Services. (2017, May 7). Personas. Retrieved from usability.gov: https://www.usability.gov/how-to-and-tools/methods/personas.html
U.S. Department of Health and Human Services. (2017, May 7). Prototyping. Retrieved from Usability.gov: https://www.usability.gov/how-to-and-tools/methods/prototyping.html
U.S. Department of Health and Human Services. (2017, May 7). Usability Testing. Retrieved from usability.gov: https://www.usability.gov/how-to-and-tools/methods/usability-testing.html
[1] (Bias & Mayhew, 2005) p.2
[2] (U.S. Department of Health and Human Services, 2017) Personas
[3] (U.S. Department of Health and Human Services, 2017) Prototyping
[4] (U.S. Department of Health and Human Services, 2017) Usability Testing
[5] (Bias & Mayhew, 2005) p. 489-516
[6] (Bias & Mayhew, 2005) p. 489-516
[7] (Bias & Mayhew, 2005) p. 492
[8] (Bias & Mayhew, 2005) p. 19
[9] (Turner, Lewis, & Nielson, 2006)
[10] (Macefield, 2009) p. 37
[11] (Macefield, 2009) p. 37 Table
[12] (Bailey, 2006)
[13] (Bailey, 2006) Table
[14] (Bailey, 2006)
[15] (Turner, Lewis, & Nielson, 2006) p. 3087
[16] (Turner, Lewis, & Nielson, 2006) p. 3087
[17] (Associated Press, 2014)
[18] (Nielson, Two Sigma: Usability and Six Sigma Quality Assurance, 2003)
My opinions expressed are my own and not those of the Federal Reserve Bank.