Maybe the easiest thing is to take you through an example.
Forming the right question
On a study I’m working on now, we have about 10 research questions, but the heart of the research is about this one:
Do people make more errors on one version of the system than the other?
Note that this is not a hypothesis, which would be worded something more like, “We expect people to make more mistakes and to be more likely to not complete tasks on the B version of the system than on the A version of the system.” (Some would argue that there are multiple hypotheses embedded in that statement.)
But in our study, we’re not out to prove or disprove anything. Rather, we just want to compare two versions to see what works well about each one and what doesn’t.
Choosing data to answer the question
There are dozens of possible measures you can look at in a usability test. Here are just a few examples:
Number and percentage of tasks completed correctly with and without prompts or assistance
Number and type of prompts given
Number and percentage of tasks completed incorrectly
Count of all incorrect selections (errors)
Count of errors of omission
Count of incorrect menu choices
Count of incorrect icons selected
Count of calls to the help desk
Count of user manual accesses
Count of visits to the index
Count of visits to the table of contents
Count of “negative comments or mannerisms”
Time required to access information in the manual
Time required to access information in online help
Time needed to recover from error(s)
Time spent reading a specific section of a manual
Time spent talking to help desk
Time to complete each task
Which data will answer the question? We’re mainly concerned about numbers of errors people make, but at some point we might want to know the types of errors. So we can eliminate any of the time measures.
So, we decided to count incorrect selections. We’re also going to count it every time people leave something out that they should have done. This of course means that we have to know what they should have done, which isn’t always possible in a very formative test. Here, in a summative test, we do.
In our study, we’re not giving hints if people get stuck, so we’re not counting prompts or assistance.
We do want to know where in each system people have questions or problems, so in addition to tracking incorrect steps, we’re going to track where in the system people go to the online help and whether they complete the tasks correctly after going to help.
So here’s our list of data measures to answer the research question: Did people make more errors on one version of the system than the other?
Count of all incorrect selections (errors)
Count and location of incorrect menu choices
Count and location of incorrect buttons selected
Count of errors of omission
Count and location of visits to online help
Number and percentage of tasks completed incorrectly
How do you take notes on that? I’ll talk about that in the next post.
No comments:
Post a Comment