Here at team Experience UX, we have been using the System Usability Scale (SUS) in a run of recent projects. This got us talking about how, why and when is it appropriate to use the SUS? Should scoring be a part of every project? What are its strengths and limitations? We sat down to discuss our collective experiences and consider the future of scoring in our research. Here’s an inside take on our discussion:
Ali: The prompt for scoring came through a project that required us to compare two design ideas to inform which version should be progressed. The research will go on to inform a business case that requires sound data to directly compare each variation.
There are three core aspects of usability: efficiency, effectiveness and satisfaction. While efficiency and effectiveness are quantifiable, time to complete a task, for example, satisfaction is only ever personal to each user. Cue the System Usability Scale (SUS), which was created to meet that specific need.
SUS is reliable, recognised, and has a 35-year history. The plethora of references and usage provided firm foundations in which to ground this research project and inform the business case. Hence we were comfortable including it in our project.
Emma: What’s interesting is that we are very much qualitative researchers! Oftentimes, people are drawn to numbers or quantitative data because they’re seen as objective and measurable over time. There is a perception that observational or qualitative findings can be subjective, despite our rigorous process to ensure our research is neutral and unbiased. If anything, we found that SUS simply elevates what we’ve seen through observational research; it provides a benchmark upon which to measure usability.
Amy: We’ve educated our clients not to worry about numbers; that the value is in observing what people do, open your eyes, and just spend some time with your users – and that still stands! It’s encouraging when we hear a client say, “I want to see our users using our products and services”. This is invaluable!
While our research will always remain about observing what people do, a number can enhance and provide a sense check.
An alternate project saw us testing the two variants of web pages. What we found was that comparatively, it works well. The research involved two phases of unmoderated and moderated usability testing to tell a clear story. In analysing the first round of unmoderated testing, we (and our clients) were left eager to ask follow-up questions, a scratch we were able to itch due to the second round of moderated testing.
If I’m honest, I was dubious when I first heard of SUS, so I read the background research. The thing that’s still hard for me to understand is that all the research states that it works with small sample sizes. With quantitative research, you often want enough of a sample to say that it’s representative when it’s scaled up. Yet, it has worked in these projects, and the moderated testing has correlated with the unmoderated testing. I’m interested to see how it plays out when we do progressive testing and to see how those numbers change over time.
Does it have any limitations?
Ali: When it’s reported to the client, it does need to be well articulated. It shouldn’t be used to set goals; if you go into it trying to hit a certain score, that’s not right. Yes, if you have a low score, it is an opportunity to improve. But more than anything, it’s a measure at a point in time.
Amy: I don’t think it’s required in every project. When we haven’t included it, we haven’t missed it.
I can see why it’s designed with the ten questions and the Likert scale; some are framed positively (I would like to use this website regularly), others are negative (it was cumbersome). So, the intention is to get the participant to share their view without thinking too much, yet they must read and consider each question to ensure they answer it correctly. SUS should tie together with qualitative observations, so you should be able to watch people struggle with or get frustrated by a particular feature.
Ali: In many ways, the qualitative value comes in asking the follow-up questions – questions based on our observations of their scoring. What we need to ensure is that the participant feels relaxed, that it’s not just another task. A reminder that we must always structure usability tests in a way that guards against participant fatigue.
Amy: In an ideal world, we look at a design or prototype early in the development cycle, and we test it regularly throughout its formation, as we are doing in a current project. As the prototype is developed, informed by our research, we’re not really testing the same thing each time, so the score isn’t strictly comparable.
It’s important to step back and look at each score at that point in time and question what is it about the test material that resulted in that score?
Ali: An unanswered question for me, considering our international research this year, is how does it work cross-languages, or when English isn’t a participants’ first language? Are there any translations of it? Has that translation been tested in other languages? This is something to complete further reading on.
Emma: And it’s important to think about the accessibility of the SUS. Participants frequently commented on the word ‘cumbersome’; “what does that mean?” Or, “I haven’t heard that word in ages”.
Ali: That’s tricky, if people question what the question means, it removes them from gut feeling and in to head thinking.
Emma: Yes, and it runs the risk of the varying interpretations of what a particular word means.
Amy: The scoring is complex, and there is a potential that participants misread the questions as the positive/negative is inverted on each question. But even then, the scoring aggregates across all ten questions to provide a measured indication.
Ali: Another limitation is that some questions aren’t relevant to specific products and services. ‘How frequently would you use this website?’ for example, I purchase my car insurance once a year, so in that instance, the question is defunct.
Can we expect to see more of the SUS within our research?
Ali: Our approach should be to always ask ourselves the question, would I want to include SUS in this? Or, rather, does this project warrant SUS, the single ease question, or our emotional response tool? To ask that question as part of our briefing is an important consideration. What we do isn’t about numbers, it’s about people, and trusting our instincts and experience. In many senses, the value of the data is intangible.
Amy: And we reveal the intangible through our research. Art; music; culture; experience, these things are intangible.
Ali: The reason our whole society is focused on numbers is that they are tangible. But the beauty of life is that there’s a lot that’s intangible to it.
The analogy I come back to is the Seven Sisters star formation, which is best viewed out of the corner of your eye. As soon as you look directly at them, the view blurs around the edges. It’s the same with measuring usability, you can sense and feel it, and we can put context around it, but you can’t put a definitive number on it. In doing so, you lose the essence of it.
Fundamentally, using the SUS appropriately within these projects has proven the case of the work that we do. This project was off the back of much research and co-design workshops, and it’s all credit to our client and their desire to stay true to the user-centred process.
SUS provides a glimpse of something that can’t really be measured. It’s an insight and an indicator into a perception of an experience, and it’s right for the right project. We’re still curious and have more to learn as we keep integrating it and other methods of scoring into our research.