Usability Testing: data analysis

Showing posts with label data analysis. Show all posts

Monday, April 26, 2010

Making sense of the data: Collaborative data analysis

I've often said that most of the value in doing user research is in spending time with users -- observing them, listening to them. This act, especially if done by everyone on the design team, can be unexpectedly enlightening. Insights are abundant.

But it's data, right? Now that the team has done this observing, what do you know? What are you going to do with what you know? How do you figure that out?

The old way: Tally the data, write a report, make recommendations

This is the *usual* sequence of things after the sessions with users are done: finish the sessions; count incidents; tally data; summarize data; if there's enough data, do some statistical analysis; write a report listing all the issues; maybe apply severity ratings; present the report to the team; make recommendations for changes to the user interface; wait to see what happens.

There are a couple of problems with this process, though. UXers feel pressure to analyze the data really quickly. They complain that no one reads the report. And if you're an outside consultant, there's often no good way of knowing whether your recommendations will be implemented.

And, the researcher owns the data. I say this like it's a problem because it is. Although the team may have observed sessions and now have some image of the users in their heads, the researcher is responsible for representing what happened by reporting on the data and drawing conclusions. The users are re-objectified by this distance between the sessions and the design direction. And, the UXer is now in the position of *suggesting* to designers what to do. How well is that working for you? Teams I work with find it, well, difficult.

The better way: Tell stories, come to consensus on priority, discuss theories

Teams that consistently turn out great experiences do one cool thing with data from user research and usability testing. They Talk To One Another. A lot. That's the process.

Okay, there's a slightly more systematic way to approach collaborative analysis. I've happened on a combination of techniques that work really well, with a major hat tip to User Interface Engineering, which originated most of the techniques in this process. As I've tried these techniques with teams, I've monkeyed with them a bit, iterating improvements. So here's my take:

- Tell stories

- Do a KJ analysis on the observations from the sessions

- Explore the priority observations

- Brainstorm inferences

- Examine the weight of evidence to form opinions

- Develop theories about the design

Tell stories to share experiences with users

This is the simplest thing, ever. These teams can't wait to tell their teammates what happened in sessions. Some teams set up debrief scrums. Some teams send around emails. Some teams use wikis or blogs. The content? A 300-word description of the person in the session, what she did, what was surprising, and anything else interesting about what the observers heard and saw. (Ideally, the structure for the story comes from the focus questions or answers the research questions that the study was designed for.)

Do a KJ analysis to come to consensus on priority issues

KJs, as they're affectionately known by devotees, are powerful, short sessions in which teams democratically prioritize observations. There are two keys to this technique. First, there's no point in being there unless you've observed sessions with users. Second, because there's no discussion at all until the last step, every CxO in the company can be there and have no more influence on the design than anyone else in the room. This lack of discussion also means that the analysis happens super fast. (I've done this with as many as 45 people, who generated about a thousand observations, and we were done in 45 minutes.)

Explore the priority observations

What did the team see? What did the team hear? After the KJ bubbles the priorities up, either with the whole group of observers or with a subset, pull the key observations from the issues and drill in. Again, this is only what people heard and what they saw (no interpreting). The team usually will have different views on the same issue. That's good. Getting the different perspectives of business people, technologists, and designers on what they observed will get things ready for the next step.

Brainstorm inferences

These are judgments and guesses about why the things the team observed happening. The question the team should be asking themselves about each observation is What's the gap between the user's behavior and what the user interface design supports?

So, say you saw users click choices in a list of items that they wanted to compare. But they never found the button to activate the comparison. When the team looks at the gap between what users did or what they were looking for and how the UI is designed, what can you infer from that? Don't hold back. Put all the ideas out there. But remember, we're not going to solutions, yet. We're still guessing about *What happened*.

Examine the weight of evidence to form opinions

By now the team has pored through what they heard and what they saw. They've drawn inferences about what might be happening in the gap between behavior and UI. *Why* are these things happening?

A look at the data will tell you. Note that by now you're analyzing a relatively small subset of the data from the study because through this process you've eliminated a lot of the noise. The team should now be asking, How many incidents were there of the issue? Which participants had the issue? Are there any patterns or trends that give more weight to some of the inferences the team came up with than other inferences?

By collaborating on this examination of the weight of evidence, the team shares points of view, generates feasible solutions, and forms a group opinion on the diagnosis.

Develop theories about the design

By now the team should have inferences with heft. That is, the winning inferences have ample data to support them. Having examined that evidence, the team can easily form a consensus opinion about why the issue is an issue. It's time to determine a design direction. What might solve the design problem?

In essence, this decision is a theory. The team has a new hypothesis -- based on evidence from the study they've just done -- about the remedies to issues. And it's time to implement those theories and, guess what, test them with users.

Did you see the report?

Look, Ma, no report! The team has been involved in all of the data analysis. They've bought in, signed up, signed off. And, they're moving on. All without a written report. All without recommendations from you.

What's valuable is having the users in the designers' heads

Getting the team to spend time with users is a first step. Observing users, listening to users will be enlightening. Keeping the users in the heads of designers through the process of design is more difficult. How do you do that? Collaborate on analyzing observations; explore inferences together; weigh the evidence as a group. From this, consensus on design direction reveals itself.

+ + + + + + + + + + + + + + + +

I stole all these ideas. Yep. User Interface Engineering gets all the credit for these awesome techniques. I just repackaged them. To see the original work, check out these links:

Group Activities to Demonstrate Usability and Design
The KJ-Technique: A Group Process for Establishing Priorities
The Road to Recommendation

But this isn't going to stop me from talking about these techniques. In fact, I'm going to talk about them a lot. So come learn more at these events:

UX Lx, Lisbon, Portugal, May 12-14. "Making Smart Design Decisions" on Wednesday, May 12 at 3:30pm. http://www.ux-lx.com/index.html
Wharton UIConf 2010, July 21-22. "Making Sense of Usability Results" and "Usability Testing in the WILD" on July 22 at 10:45am and 1:30pm, respectively. http://web.wharton.upenn.edu/uiconf2010/

Monday, December 7, 2009

What to do with the data: Moving from observations to design direction

What is data but observation? Observations are what was seen and what was heard. As teams work on early designs, the data is often about obvious design flaws and higher order behaviors, and not necessarily tallying details. In this article, let's talk about tools for working with observations made in exploratory or formative user research.

Many teams have a sort of intuitive approach to analyzing observations that relies on anecdote and aggression. Whoever is the loudest gets their version accepted by the group. Over the years, I've learned a few techniques for getting past that dynamic and on to informed inferences that lead to smart design direction and creating solution theories that can then be tested.

Collaborative techniques give better designs

The idea is to collaborate. Let's start with the assumption that the whole design team is involved in the planning and doing of whatever the user research project is.

Now, let's talk about some ways to expedite analysis and consensus. Doing this has the side benefit of minimizing reporting – if everyone involved in the design direction decisions has been involved all along, what do you need reporting for? (See more about this in the last section of this article.)

Some collaborative analysis techniques I've seen work really well with teams are:

- Between-session debriefs

- Rolling issues lists

- K-J analysis

- Cross-matching rolling issues lists with K-Js

Between-session debriefs

Do you just grind through sessions until you're through them all, only to end up having an excruciatingly long meeting with the team where you're having to re-play every session because no one was there but you?

Schedule extra time between sessions

Try this: Schedule more time than usual between sessions. If you usually schedule 15 minutes between usability test sessions, for example, then next time, schedule 30 minutes. Use the additional time to debrief with observers.

If the team sees that there will be discussion in between the sessions that will help move the design forward, they're more engaged. If team members are already observing sessions, then this gives you a chance to manage the conversations that they're already having.

Knowing that you're going to want to debrief between sessions, the team is more likely to come to more sessions and to pay full attention. They'll learn that if they're at the sessions, they get more say in the design outcome, and the design outcomes will make more sense. If they don't attend, they don't get as much say, simply because they've observed less and have less evidence for their inferences.

All you have to do is get the team to talk about what they saw and what they heard, and what was most surprising about that. Save the design solutions for later, unless you're doing rapid iterative testing.

Play 'guess the reason'

To get teams in the practice of sticking to discussing observations rather than jumping to design conclusions, I've tried playing a game called "Guess the Reason" with them. It's easy. Show a user interface – just one page or screen or panel – and describe the behavior observed. Then ask the team to guess why that happened. It's a brainstorming activity. The first person to go to a design solution has to put money in the team's drink fund. You can use the same system during your own debriefs, which can make it fun (and profitable).

Rolling issues lists

I've written about these before. Simply put, this technique gets the team further engaged in the collecting of observations and takes the burden off the moderator/researcher.

Gather whiteboard and markers

The idea is that those observations that come out in the debrief get written down on a white board that all the observers can see. Each observation gets tracked to the participants who had the issues. As the moderator, you start the list, but as the sessions go on, you encourage the team to add their own.

Natural consensus through debrief disucssion

As team members add, and the team talks about the observations that go onto the list, there's a natural consensus building that goes on. Does everyone agree that this is something we want to track? Does everyone agree that this way of talking about it makes sense to everyone?

Draws out what is important to the team

When I moderate user research sessions, doing this often means that I don't have to take notes at all because the team is recording what is important to them. As they're doing that, I also get to see what is important to the different roles on the team.

K-J analysis

I admit that I stole this idea from User Interface Engineering (UIE). But it's one of the most powerful tools in the collaboration toolbox. Jared Spool has an excellent article (that doubles as a script) about this technique.

When I do K-Js in workshops, everyone gets really excited. It's an amazing tool that objectively, democratically identifies what the high priority items are from subjective data.

The technique was invented by Jiro Kawakita to help his co-workers quickly come to consensus on priorities by getting them to discuss only what was really important to the whole team. There are 8 steps:

1. Create a focus question. For a usability test, it might be, "What are the most important changes to make to improve the user experience for this design?" In workshops, I often choose a more philosophical question, like, "What obstacles to teams face in implementing user experience design practices in their organizations?"

2. Get the group together. When I use this technique with teams at the end of a user research project, I invite only people who observed at least one session.

3. Put data or opinions on sticky notes. For the user research focus question, I ask for specific, one-sentence observations that are clear enough for other people to understand. (Team members often bring their computers with them to go through the notes they took during sessions.)

4. Put the sticky notes on the wall. Everyone puts their sticky notes up, in random order on one wall, while reviewing what other people are also putting on the wall. Allow no discussion.

5. Group similar items. This step is like affinity diagramming. Pick up a sticky; find something that seems related to it. Move to another wall, and put those two stickies on the wall, one above the other to form a column. All team members do this step together. Keep going until all the stickies have moved from one wall to the other and all the stickies are in a column. No discussion.

6. Name each column. Using a different color of stickies now, everyone in the room writes down a name for each group and puts their name on the wall above the appropriate column. Everyone must name every column, except if someone else has already stuck up a name that was exactly what you had written down. No discussion.

7. Vote for the most important columns. Everyone writes down what they think are the 3 most important columns. Next, they vote by marking 3 Xs for their most important group, 2 Xs for the second most important, and 1 X for their third most important group. Again, no discussion.

8. Tally the votes, which ranks the columns. On a flip chart or a white board, number a list from 20 to 1. Pull all the column name stickies that have votes and stick them next to the number of votes that are on the sticky. Now the facilitator can read off to the team which groups had the highest votes and thus are the highest priority. Now is the opportunity for discussion as the team determines which stickies can be combined. The decision to combine stickies – and thus, what the most important topics are – must be unanimous.

You're done. Very cool. Now the team knows exactly what to focus on to discuss, resolve, and remedy. And, if you're doing a report, you now know what to bother to report on. (See what I meant about reporting?)

Cross-match the rolling issues with the K-J

If your team or your management is into validation, you can now go back to your desk and compare what came out of the rolling issues with what the K-J generated. My experience so far has been that they match up. And it isn't because everyone at the K-J was primed by being at all the debriefs between sessions. People who observed remotely often contribute to the K-Js live, so you'd think that their data might change the K-J results. Your mileage may vary, but so far, mine matches up.

Directed collaboration is fun and generates better design solutions

When you help the team review together what they saw and heard during user research sessions, there is more likely to be consensus, buy-in, and a shared vision of the design direction. In testing early designs especially, consensus, buy-in, and shared vision are crucial to ending up with great user experiences. Collaborative techniques for analyzing observations turn work into fun, and take the pressure off the researcher to generate results. Because everyone on the team was involved in generating observations and setting priorities, everyone can move on quickly to making informed decisions that lead to coordinated, smart designs.

P.S. To anyone who gets my email newsletter whose email address was in the CC field rather than the BCC field: I apologize. The production manager (me) must still have been asleep, and the QA manager (me) didn't catch that. We'll try to get it right next time.

Wednesday, October 21, 2009

Easier data gathering: Techniques of the pros

In an ideal world, we'd have one person moderating a user research session and at least one other person taking notes or logging data. In practice it often just doesn't work out that way. The more people I talk to who are doing user research, the more often I hear from experienced people that they're doing it all: designing the study, recruiting participants, running sessions, taking notes, analyzing the data, and reporting.

I've learned a lot from the people I've worked with on studies. Two of these lessons are key:

- Doing note taking well is really hard.

- There are ways to make it easier, more efficient, and less stressful.

Today, I'm going to talk about a couple of the techniques I've learned over the years (yes, I'll give credit to those I, um, borrowed from so you can go to the sources) for dealing with stuck participants, sticking to the data you want to report on, and making it easy to see patterns.

Graduated prompting

Say you're testing a process that has several steps and you want to see the whole thing, end-to-end. This is not realistic. In real life, if someone gets stuck in a process, they're going to quit and go elsewhere. But you have a test to do. So you have to give hints. Why not turn that into usable data? Track not only where in the user interface people get stuck, but also how much help they need to get unstuck.

This is also an excellent technique for scavenger hunt tasks – you can learn a lot about where the trigger words are not working or where there are too many distractions from the happy path or people are simply going to need more help from the UI.

Here's what I learned from Tec-Ed about what to do when a participant is stuck but you need them to finish:

- First, ask participants to simply try again.

- If participants are unable to move forward, give a hint about where to look: "I noticed that you seem to be focused mostly in this area (pointing). What if you look elsewhere?"

- If participants are still stuck and want to give up or say they would call someone, let them call a "help desk" or, depending on the study, give a stronger hint without being specific.

- Finally, you may have to get specific.

The idea is to note where in the UI you're giving the hints and how many for any particular hindrance. This gives you weighted evidence for any given participant and then some great input to design decisions as you look at the data across participants.

Pick lists

You may say this is cheating. But don't you feel like you have a pretty good idea of what's going to happen when a participant uses a design? This technique is about anticipating what's going to happen without projecting to participants what the possibilities are. Make a list of all the possible wrong turns you can imagine. Or at least the ones you care about fixing.

Being able to do this comes from awareness and the researcher's experience with lots of user interfaces. This is not easy to do if you've only done one or two studies. But as you get more observations under your belt, looking ahead gets easier. That is, most of us are paying attention to the happy path as the optimum success in a task, but then have to take lots of notes about any deviation from that path. If you look at what the success and error conditions are as you design a study, you can create list to check off to make data gathering quicker and less taxing as you're doing both that and moderating.

Here's an example from a study I did with Ginny Redish researching the language of instructions on ballots. This is on a touch screen, so "touch" is the interaction with the screen, not with an actual candidate:

There are a lot of things wrong with this particular example: having the same word at the beginning of many of the possible selections does not make the list easy to skim and scan; there are too many items in the list (we ended up not using all of those error conditions as data points). As we designed the test, we were interested in what voters did. But as we analyzed the data and reported, we realized that what didn't matter for this test as much as whether they got it right the first time or made mistakes and recovered or made mistakes and never recovered. That would have been a different pick list.

But all was not lost. We still got what we wanted out of the pick lists. This is what they ended up looking like as note-taking devices:

Judged ratings

Usually in a formative or exploratory study, you can get your participants to tell you what you need to know. But sometimes you have to decide what happened from other evidence: how the participant behaved, what they did to solve a problem or move forward, where they pointed to.

As a researcher, as a moderator, you're making decisions all the time. Is this data? Is it not? Am I going to want to remember this later, or is it not needed?

After we realized that we were just going to make a judgment anyway, Christine Perfetti and I came up with a shortcut for making those kinds of decisions. Really, what we're doing is assisting judgments that experienced researchers have probably automatized. That is, after dozens or hundreds of observations, you've stored away a large library of memories of participant behaviors that act as evidence of particular types of problems.

To make these on-the-fly judgments, Christine and I borrowed a bunch of techniques from Jared Spool at UIE and used variations of them for a study we worked on together. As the moderator of the day asked, "What do you expect to happen when you click on that?" and followed up with, "How close is this to what you expected?" the note taker for the day recorded something like this:

Another way to use this trick is to ask, "The task is [X]. How close do you feel you are now to getting to where you want to be? Warmer? Cooler?"

I also think that most of us collect too much data. Well, okay, I often do. Then I wonder what to do with it. I've found that when I really focus on the research questions, I can boil the data collecting down significantly. So here's a minimalist note-taking device: I created a one-sheet data collector that covered three tasks and helped me document a pass/fail decision for voting system documentation. You can quibble about some of the labeling in the example below, but I was happy to have one piece of paper that collected what happened along with how I know that, and what that combination of things happening means.

It attempts to encapsulate the observation-inference-theory process all in one place.

Again, if you haven't done much user research or usability testing, you may not be happy with this approach. And, let's not forget how valuable qualitative data like quotes is. But you more experienced people out there may find that codifying the judgments you're making this way makes it much quicker to note what happened and what it might mean, expediting analysis.

Shortcuts are not for sissies

Most user research is not run in an ideal world. Note taking in user research is one of the most difficult skills to learn. Luckily, I have had great people to work with who shared their secrets for making that part of the research less stressful and more efficient.

Graduated prompting is a way to quantify the hints you give participants when you need them to complete tasks in a process or continue on a path in a scavenger hunt.

Judged ratings are based on observations and evidence that fall into defined success and error criteria.

Got several dozen hours under your research belt? Focus the data collecting. Try these techniques for dealing with stuck participants, sticking to the data you want to report on, and making it easy to see patterns.

:: :: :: :: :: :: :: :: NEWS FLASH :: :: :: :: :: :: :: ::

Come to UI 14 in Boston November 1-3 and get a discount on me. Type in promotional code CHISNELL when you register and you'll get $50 off each day. If you sign up for all three days, you'll also get a set of Bose headphones. Sweeeet.

Do it! You know you want to. Mastering the Art of User Research.

Saturday, March 7, 2009

What counts: Measuring the effectiveness of your design

Let’s say you’re looking at these behaviors in your usability test:

Where do participants start the task?
How easily do participants find the right form? How many wrong turns do they take on the way? Where in the navigation do they make wrong turns?
How easily and successfully do they recognize the form they need on the gallery page?
How well do participants understand where they are in the site?

How does that turn into data from which to make design decisions?

What counts?
It’s all about what counts. What did the team observe that shows that these things happened or did not happen?

Say the team does 10 individual usability test sessions. There were 5 major “scavenger hunt” tasks. Everyone has their own stack of yellow stickies that they’ve written down observations on. (Observations of behavior, only – there should be no interpreting, projecting, guessing, or inferring yet.) Or, say the team has kept a rolling issues list. All indications are that the team is in consensus about what happened.

Example 1: Entry points
Here’s an example. For the first task, Find an account open form, the first thing the team wanted to observe for was whether participants started out where we thought they should (Forms), and if not, where participants did start.

The data looked like this:

Seven of the 10 started out at Forms – great. That’s what the team expected based on the outcomes of card sorts. But 3 participants didn’t. But those 3 all started out at the same place. (First inference: Now the team knows there is strong scent in one link and some scent in another link.)

Example 2: Tracking navigation paths – defining “wrong turn”
Now, what about the wrong turns? In part, this depends on how the team defines “wrong turn.”

What you’re finding out in exploratory tests with early designs is where users go. Is that wrong? Not necessarily. Think of it in the same way that some landscapers and urban planners do about where to put walkways in a park. Until you can see where the traffic patterns are, there’s not a lot of point in paving. The data will tell you where to put the paths outside where the team projects the path should be.

As each session goes on, the team tracks where participants went. The table below actually tracks the data for multiple issues to explore:

How many wrong turns do they take on the way?
Where in the navigation do they make wrong turns?
How easily and successfully do they recognize the form they need on the gallery page?

The data looked like this:

Everyone ended up at the right place. Some participants even took the path that the team expected everyone to take: Forms / Account Open / Form #10.

But the participants who started out at Products had to go back to the main navigation to get to the right place. There’s a decision to make. The team could count those as “wrong turns” or they could look at them as a design opportunity. That is, the team could put a link to Forms on the Product page – from the point of view of the user, they’re still on the “right” path and the design has prevented the user from making a mistake.

Account Open is a gallery page. Kits is the beginning of a wizard. Either way, the right form is available in the next step and all the participants chose the right one.

Measures: Everything counts
So, how do you count what counts? The team counted errors (“wrong turns”) and task successes. How important are the counts? The team could have gone with their impressions and what they remembered. There’s probably little enough data to be able to do that. In smaller tests, your team might be comfortable with that. But in larger tests – anything over a few participants – observers typically remember the most recent sessions the best. Earlier sessions either fade in memory or the details become fuzzy. So tracking data for every session can keep the whole team honest. When there are numbers, the team can decide together what to do with them.

What we saw
This team learned that we got the high-level information architecture pretty close to right – most participants recognized where to enter the site to find the forms. We also learned that gallery pages were pretty successful; most participants picked the right thing the first or second time. It was easy to see all of this in tracking and counting what participants did.

Usability Testing