Usability Testing: collecting data

Showing posts with label collecting data. Show all posts

Friday, May 18, 2012

Wilder than testing in the wild: usability testing by flash mob

It was a spectacularly beautiful Saturday in San Francisco. Exactly the perfect day to do some field usability testing. But this was no ordinary field usability test. Sure, there’d been plenty of planning and organizing ahead of time. And there would be data analysis afterward. What made this test different from most usability tests?

16 people gathered to make 6 research teams
Most of the people on the teams had never met
Some of the research teams had people who had never taken part in usability
testing before
The teams were going to intercept people on the street, at libraries, in farmers’
markets

Ever heard of Improv Everywhere? This was the UX equivalent. Researchers just appeared out of the crowd to ask people to try out a couple of designs and then talk about their experiences. Most of the interactions with participants were about 20 minutes long. That’s it. But by the time the sun was over the yardarm (time for cocktails, that is), we had data on two designs from 40 participants. The day was amazingly energizing.

How the day worked
The timeline for the day looked something like this:

8:00
Coordinator checks all the packets of materials and supplies

10:00
Coordinator meets up with all the researchers for a briefing

10:30
Teams head to their assigned locations, discuss who should lead, take notes, and intercept

11:00
Most teams reach their locations, check in with contacts (if there are contacts), set up

11:15-ish
Intercept the first participants and start gathering data

Break when needed

14:00
Finish up collecting data, head back to the meeting spot

14:30
Teams start arriving at the meeting spot with data organized in packets

15:00-17:00
Everybody debriefs about their experiences, observations

17:00
Researchers head home, energized about what they’ve learned

Later
Researchers upload audio and video recordings to an online storage space

On average, teams came back with data from 6 or 7 participants. Not bad for a 3-hour stretch of doing sessions.

The role of the coordinator
I was excited about the possibilities, about getting a chance to work with some old friends, and to expose a whole bunch of people to a set of design problems they had not been aware of before. If you have thought about getting everyone on your team to do usability testing and user research, but have been afraid of what might happen if you’re not with them, conducting a study by flash mob will certainly test your resolve. It will be a
lesson in letting go.

There was no way I could join a team for this study. I was too busy coordinating. And I wanted to be available in case there was some kind of emergency. (In fact, one team left the briefing without copies of the thing they were testing. So I jumped in a car to deliver to them.)

Though you might think that the 3-or-so hours of data collection might be dull and boring for the coordinator, there were all kinds of things for me to do: resolve issues with locations, answer questions about possible participants, reconfigure teams when people had to leave early. Cell phones were probably the most important tool of the day.

I had to believe that the planning and organizing I had done up front would work for people who were not me. And I had to trust that all the wonderful people who showed up to be the flash mob were as keen on making this work as I was. (They were.)

Keys to making flash mob testing work
I am still astonished that a bunch of people would show up on a Saturday morning to conduct a usability study in the street without much preparation. If your team is half as excited about the designs you are working on as this team was, taking a field trip to do a flash mob usability test should be a great experience. That is the most important ingredient to making a flash mob test work: people to do research who are engaged with the project, and enthusiastic about getting feedback from users.

Contrary to what you might think, coordinating a “flash” test doesn’t happen out of thin air, or a bunch of friends declaring, “Let’s put on a show!” Here are 10 things that made the day work really well to give us quick and dirty data:

1.    Organize up front
2.    Streamline data collection
3.    Test the data collection forms
4.    Minimize scripting
5.    Brief everyone on test goals, dos and don’ts
6.    Practice intercepting
7.    Do an inventory check before spreading out
8.    Be flexible
9.    Check in
10.    Reconvene the same day

Organize up front

Starting about 3 or 4 weeks ahead of time, pick the research questions, put together what needs to be tested, create the necessary materials, choose a date and locations, and recruit researchers.

Introduce all the researchers ahead of time, by email. Make the materials available to everyone to review or at least peek at as soon as possible. Nudge everyone to look at the stuff ahead of time, just to prepare.

Put together everything you could possibly need on The Day in a kit. I used a small roll-aboard suitcase to hold everything. Here’s my list:

Pens (lots of them)
Clipboards, one for each team
Flip cameras (people took them but did most of the recording on their phones)
Scripts (half a page)
Data collecting forms (the other half of the page)
Printouts of the designs, or device-accessible prototypes to test
Lists of names and phone numbers for researchers and me
Lists of locations, including addresses, contact names, parking locations, and public transit routes
Signs to post at locations about the study
Masking tape
Badges for each team member – either company IDs, or nice printed pages with the first names and “Researcher” printed large
A large, empty envelope

About 10 days ahead, I chose a lead for each of the teams (these were all people who I knew were experienced user researchers) and talked with them. I put all the stuff listed above in a large, durable envelope with the team lead’s name on it.

Streamline data collection

The sessions were going to be short, and the note-taking awkward because of doing this research in ad hoc places, so I wanted to make data collection as easy as possible. Working from a form I borrowed from Whitney Quesenbery, I made something that I hoped would be quick and easy to fill in and easy for me to understand what the data meant later.

Data collector for our flash mob usability test

The data collection form was the main thing I spent time on in the briefing before everyone went off to collect data. There are things I will emphasize more, next time, but overall, this worked pretty well. One note: It is quite difficult to collect qualitative data in the wild by writing things down. Better to audio record.

Test the data collection forms

While the form was reasonably successful, there were some parts of it that didn’t work that well. Though a version of the form had been used in other studies before, I didn’t ask enough questions about the success or failure of the open text (qualitative data) part of the form. I wanted that data desperately, but it came back pretty messy. Testing the data collection form with someone else would have told me what questions researchers would have about that (meta, no?), and I could have done something else. Next time.

Minimize scripting

Maximize participant time by dedicating as much time to the session as possible to their interacting with the design. That means that the moderator does nothing to introduce the session, instead relying on an informed consent form that one of the team members can administer to the next participant while the current one is finishing up.

The other tip here is to write out the exact wording for the session (with primary and follow up questions), and threaten the researchers with being flogged with a wet noodle if they don’t follow the script.

Brief everyone on test goals, dos and don’ts

All the researchers and I met up at 10am and had a stand-up meeting in which I thanked everyone profusely for joining me in the study. And then I talked about and took questions on:

The main thing I wanted to get out of each session. (There was one key concept that we wanted to know whether people understood from the design.)
How to use the data collection forms. (We walked through every field.)
How to use the script. (“You must follow the script.”)
How to intercept people, inviting them to participate. (More on this below.)
Rules about recordings. (Only hands and voices, no faces.)
When to check in with me. (When you arrive at your location; at the top of each hour, when you’re on the way back.)
When and where to meet when they were done.

I also handed cash that the researchers could use for transit or parking or lunch, or just keep.

Practice intercepting people

Intercepting people to participate is the hardest part. You walk up to a stranger on the street asking them for a favor. This might not be bad in your town. But in San Francisco, there’s no shortage of competition. Homeless people, political parities registering voters, hucksters, buskers, and kids working for Greenpeace all wanting attention from passers-by. And there you are, trying to do a research study. So, how to get some attention without freaking people out? A few things that worked well:

Put the youngest and/or best-looking person on the task.
Smile and make eye contact.
Using cute pets to attract people. Two researchers who own golden retrievers brought their lovely dogs with them, which was a nice icebreaker.
Start off with what you’re not: “I’m not selling anything, and I don’t work for Greenpeace. I’m doing a research study.”
Start by asking for what you want: “Would you have a few minutes to help us make ballots easier to use?”
Take turns – it can be exhausting enduring rejection.

Do an inventory check before spreading out

Before the researchers went off to their assigned locations, I asked each team to check that they had everything they needed, which apparently was not thorough enough for one of my teams. Next time, I will ask each team to empty out the contents of the packet and check the contents. I’ll use the list of things I wanted to include in each team’s packet and my agenda items for the briefing to ask the teams to look for each item.

Be flexible

Even with lots of planning and organizing, things happen that you couldn’t have anticipated. Researchers don’t show up, or their schedules have shifted. Locations turn out to not be so perfect. Give teams permission to do whatever they think is the right thing to get the data – short of breaking the law.

Check in

Teams checked in when they found their location, between sessions, and when they were on their way back to the meeting spot. I wanted to know that they weren’t lost, that everything was okay, and that they were finding people to take part. Asking teams to check in also gave them permission to ask me questions or help them make decisions so they could get the best data, or tell me what they were doing that was different from the plan. Basically, it was one giant exercise in The Doctrine of No Surprise.

Reconvene the same day

I needed to get the data from the research teams at some point. Why not meet up again and share experiences? Turns out that the stories from each team were important to all the other teams, and extremely helpful to me. They talked about the participants they’d had and the issues participants ran into with the designs we were testing. They also talked about their experiences with testing this way, which they all seemed to love. Afterward, I got emails from at least half the group volunteering to do it again. They had all had an adventure, met a lot of new people, got some practice with skills, and helped the world be a become a better place through design.

Wilder than testing in the wild, but trust that it will work

On that Saturday in San Francisco the amazing happened: 16 people who were strangers to one another came together to learn from 40 users about how well a design worked for them. The researchers came out from behind their monitors and out of their labs to gather data in the wild. The planning and organizing that I did ahead of time let it feel like a flash mob event to the researchers, and it gave them room to improvise as long as they collected valid data. And it worked. (See the results.)

P.S. I did not originate this approach to usability testing. As far as I know, the first person to do it was Whitney Quesenbery in New York City in the autumn of 2010.

Tuesday, June 28, 2011

Usability testing is HOT

For many of us, usability testing is a necessary evil. For others, it’s too much work, or it’s too disruptive to the development process. As you might expect, I have issues with all that. It’s unfortunate that some teams don’t see the value in observing people use their designs. Done well, it can be an amazing event in the life of a design. Even done very informally, it can still show up useful insights that can help a team make informed design decisions. But I probably don’t have to tell you that.

Usability testing can be enormously elevating for teams at all stages of UX maturity. In fact, there probably isn’t nearly enough of it being done. Even on enlightened teams that know about and do usability tests, they’re probably not doing it often enough. There seems to be a correlation between successful user experiences and how often and how much the designers and developers spend time observing users. (hat tip Jared Spool)

Observing people using early designs can be energizing as designers and developers get a chance to see reactions to ideas. I’ve seen teams walk away with insights from observing people use their designs that they couldn’t have got any other way – and then make better designs than they’ve ever made. Close to launch, it is exciting – yes, exciting – to see a design perform as useful, usable, and desirable.

I’ve been negative on usability testing and our failure of imagination regarding bringing the method up to date, lately. But there’s a lot of good to any basic usability test. In fact, I went looking for the worth, the value, the alluring in usability testing a few weeks ago when I asked on Quora, “What’s the sexiest thing about usability testing?”

Some of the answers surprised me. Some of the answers were more about what people love about usability testing than what makes it seductive. But let’s go with seductive. People who find usability testing hot say it’s about data that can end the opinion wars, revelations and surprises, and getting perspective about real use, motivations, and context of use. Okay. We’re nerds.

The kiss of data

We always learn from users. Of course, we could just ask. But observing is so much more interesting. People do unpredictable things; they create workarounds, hacks, and alternative paths to make tools fit for their use.

This is the best case I can think of for watching rather than asking. From this observing, we get data. Juicy, luscious data like verbal protocols, task success rates, and physical behavior. This package makes it much easier to make good design decisions because we know have evidence on which to create theories about what should work better. There’s nothing like having hard evidence for going with a design direction – or changing direction.

Voyeuristic revelations

When designers, developers, and stakeholders of all persuasions get to observe people using a design – especially the first time – there’s often an “ah ha!” moment. (That’s the clean version.) Observers exclaim, “Wow, that was amazing!” when they see something surprising, both the good and the bad. The reaction that follows a completed usability study often is, “Damn. I wish we’d done this years ago. It would have saved us a ton of rework!” After watching one over-qualified participant struggle with a design recently, I heard a client say, “If that guy can’t do it, we’re in serious trouble.” That’s powerful.

When participants are surprised, that’s when the real fun begins. Not everyone likes surprises in their user interfaces, especially if they’re not the delightful Easter egg kind. While a team hopes not to hear, “I feel lost and abandoned,” you’ve got to wonder how bad it’s been when a participant squeals, “Oh, my gosh! This is so much better! When can I have it?!” Those eureka moments can reveal what to do to improve a design or an experience.

Relationship dynamics

One of the magical things about observing users working with a design is that suddenly, disputes within the team melt away. Chances are, the disputing parties were both wrong because neither (unless they have a ton of experience already observing these kinds of users in this domain doing this task) could accurately predict how the user would behave and perform.

Now, even with observations from watching just one user, there’s data on which to base design decisions. Data trumps gut. Data outweighs feelings. Data can put to rest those endless, circular discussions where inevitably, the person with the biggest paycheck or the most important title wins. The opinion wars come to an end.

When the whole team is involved in deciding what to test and observing sessions, everyone can share in making and carrying out agreed design decisions. Whenever a question comes up where no one knows but everyone has an opinion, the answer in a team doing usability testing is, “Let’s do some user research on that,” or “Let’s find out what users do.”

For the love of users

It’s so easy to get caught up in the business goals and issues with the underlying technology of a design. It’s so easy to stay in the safe bubble of the office, cranking out code, designs, plans, and reports. It’s easy to lose touch with users.

Teams that spend a couple of hours observing their users every few weeks keep that connection. They fall in love with their users. They relish the chance to see for themselves why people do the things they do with designs.

Getting out of their own heads, a successful team uses usability testing to get perspective, learn about users’ contexts, and remember the people and their stories. For these teams, usability testing is inspiring. And that’s hot.

What’s sexy about usability testing?

Observing people use a design can be revelatory. It turns up the volume on design by helping teams make informed design decisions. What’s sexy about usability testing? Data for evidence-based design. Ending opinion wars. Knowing users from observations and surprises. Getting perspective and knowledge of context of use.

The UX equivalent of a romantic dinner or a walk on the beach? Perhaps not, even for a geek girl like me. But it can be exciting, fun, funny, encouraging, and empowering. Just what you want from a relationship. That’s pretty seductive, if you ask me.

Thursday, August 19, 2010

Researcher as director: scripts and stage direction

For most teams, the moderator of user research sessions is the main researcher. Depending on the comfort level of the team, the moderator might be a different person from session to session in the same study. (I often will moderate the first few sessions of a study and then hand the moderating over to the first person on the design team who feels ready to take over.)

To make that work, it's a good practice to create some kind of checklist for the sessions, just to make sure that the team's priorities are addressed. For a field study or a formative usability test, a checklist might be all a team needs. But if the team is working on sussing out nuanced behaviors or solving subtle problems, we might want a bit more structure.

A couple of the teams I work with ensure that everything is lined up and that *anyone* on the team could conduct the sessions by creating detailed scripts that include stage direction. Here are a couple of samples:

Whether the team is switching up moderators or it's the same person conducting all the sessions, creating a script for the session that includes logistics is a good idea:

think through all the logistics, ideally, together with the team
make sure the sessions are conducted consistently, from one to the next
back up the main researcher in case something drastic happens -- someone else could easily fill in

Logistics rehearsal

When you walk through, step by step, what's supposed to happen during a session, it helps everyone visualize the steps, pacing, and who should be doing what. My client teams use the stage direction in the script as a check to make sure everything is being covered to reach the objectives of the sessions. It's also a good way to review what tools, data, and props you might need.

Estimate timing

Teams often ask me about timing. When they get through a draft of a script that includes stage directions, they get a pretty solid feeling pretty quickly for what is going to take how long. From this they can assign timing estimates and make decisions about whether they want participants to keep going on a task after the estimated time is reached or redirect to the next task.

Mapping out location flow

It's easy to overlook the physical or geographic flow - what a director would call blocking - of a session. Where does the participant start the session? In a waiting room, at her desk, or somewhere else? Will you change locations within a room or building during the session? How do you get from one place to the next?

Consistency and rigor

Including stage directions in a script for a user research session can help reviewer-stakeholders understand what to expect. More importantly, the stage directions act as reminders to the moderator so she's doing the same things with and saying the same things to every participant in the study. This means nothing gets left out deliberately and nothing gets added that wasn't agreed on ahead of time. (For example, the team could identify some area to observe for and put a prompt in the script for the moderator to ask follow-up questions that are not specifically scripted, depending on what the participant does.)

Insurance

Any really good project manager is going to have a Plan B. With a script that includes detailed stage directions, anyone who has been involved in the planning of a study should be able to pick up the script and moderate a session. The people I worked with at Tec-Ed called this "the bus test" (as in, If you get hit by a bus we still have to do this work).

Some teams I work with want to spread out and run simultaneous sessions. The stage directions can help ensure consistency across moderators. (Rehearse and refine if you're really going to do this.)

Finally, when it comes time to write the report about the insights the team gained, the script -- with its stage directions -- can help answer the questions that often come asking why things were done the way they were done or why the data says what it says.

Stage it

Each person in a session is an actor, whether participant or observer. The moderator is the director. If the script for a study includes instructions for all the actors in the session as well as the director in addition to documenting what words to say, everyone involved will give a great performance.

Tuesday, June 8, 2010

Overcoming fear of moderating UX research sessions

It always happens: Someone asks me about screwing up as an amateur facilitator/moderator for user research and usability testing sessions. This time, I had just given a pep talk to a bunch of user experience professionals about sharing responsibility with the whole team for doing research. "But what if the (amateur) designer does a bad job of moderating the session?"

What not to do

There are numerous ways in which a moderator can foul things up. Here are just a few possibilities that might render the data gathered useless:

Leading the participant
Interrupting or intervening at the wrong time
Teaching or training rather than observing and listening
Not following a script or checklist
Arguing with the participant

Rolf Molich and Chauncey Wilson put together an extensive list of the many wrong things moderators could do. There are dozens of behaviors on the list. I have committed many of these sins myself at some point. It's embarrassing, but it is not the end of the world. So, here, let's talk about what to do to be the best possible moderator in your first session.

Your role as a moderator

To be the best moderator you can be, remember that there are three basic roles of the moderator in user research and usability testing. When Carolyn Snyder worked for User Interface Engineering, she codified these:

Flight attendant. Though you might think that your priority is collecting data, your number one job during the session is to see to the comfort and safety of the participant. Make sure this person is comfortable, is appreciated, and knows she can stop at any time. Set up a relaxed situation that is still focused on the goal of learning from the person.

Sportscaster. The line of sight and the acoustics of the session situation aren't always ideal for the observers. Because the observers from your team will be helping you take notes and analyze the data, you can help them by talking just enough so they can keep their places in the session. For example, if the participant is vague about a UI element in pointing out goods and bads, simply echo the last couple of words the participant said to get them to clarify or expand.

Scientist. The moderator is usually the person who designed the study and will be responsible for analyzing the data that comes out of it. This means managing any recordings to ensure the privacy of the participants, tracking notes and data gathering from observers, and pulling observations and data together so the team can come to a design direction based on the evidence gathered.

(Hat tip to Carolyn Snyder and Jared Spool for the moderator roles.)

Who should moderate UX sessions?

Who makes a good moderator? Anyone who is a quick learner, is a good listener, can build rapport with a participant, and has a good memory. Typically, there isn't a lot of time to know all the nuances of a UI before going into a usability test. Likewise, if you're in the field doing basic ethnographic research, you may learn characteristics of the participants or the environment that inform the rest of the interview direction. Handling those on-the-fly perceptions will help everyone get value out of the session.

The listening is important for asking insightful follow-up questions as participants think aloud. Getting clarification on comments made, drilling in a bit to get to specifics, and always keeping in mind "why is this behavior happening" will come to you from listening and (gently) questioning.

Rapport with the participant is key to creating trust. The participant is always trying to get a reading from you about whether what he's doing is correct and whether what he's giving you is what you want. Even a newbie to moderating can be friendly, objective, and neutral at the same time. (It may take some practice.)

Remembering what happened early in the session will help you ask useful follow up questions later in the session. Remembering the main, interesting behaviors will help you work with the observers after the session is over and you're all telling the story of what happened, especially if you have assigned someone else to take notes while you concentrate on running the session.

If there's someone on the design team besides you who has these characteristics, that's who you want to moderate sessions, no matter what their regular job is.

How to be a great UX moderator

Keeping those roles and attributes in mind, this is what I tell clients and workshop attendees about how to be a good moderator. You can pass the list below to your team's candidate.

Be willing to let go of ownership of the design. When you're doing field research, you may enter the session with design ideas in mind. Try not to. Instead, let the heft of the data over sessions help build the ideas. If you're testing a design, as soon as you put a design in front of another person, you no longer own it, the person you're showing it to or who is using it owns it. In that act, you have specifically asked for reactions and interactions. Open yourself up to the possibilities.

Shut up. After you've explained the purpose of the session, and explained your role, and the roles of the other people in the room, stop talking. Even when there are silences, don't be too quick to fill them. Wait. Count to 20 slowly and silently before you say anything. Chances are, something interesting has happened by then and you won't have to open your mouth.

Listen. This is not the same as shutting up. Listening is about being present. Be fully attentive so you can not only hear the words, but process their meaning. Be empathetic to the participant and what she's trying to do.

Suspend judgment. This is one of the hardest things to do, but it is also the most important. You have invited the participant to help you learn about what you're designing. If you have shut up, and listened well, and the participant is appropriate for the study, then let go of assessing what is happening in that moment. Give yourself time to process later. This will also prevent you from asking inappropriate questions during the session that may betray your feelings about a participant or what she has to say about the design.

Plan ahead. Script, create checklists, and read Beth Loring and Joe Dumas's book, Moderating Usability Tests. You may feel silly using a script, but really, there's absolutely nothing wrong with doing that. Having a script to follow means you say the same things the same way to every participant. You can also ensure that you've hit everything on the list that the team wanted to learn about. Finally, scripting out what to say and thinking through the checklist of focus questions will give everyone a better feeling for how much can reasonably be covered during a given session.

Rehearse. If you don't set up a pilot session, then your first "real" session will end up being the rehearsal. Practice in a dry run by yourself, out loud. Record it. Change the script if you need to, making sure that the words you say make sense and feel authentic. Then find someone down the hall or in the next cube to play your participant and try the script out again. Make changes if you need to.

Do enough sessions. As you moderate each session, you will get better at it. Remember, it's not about you. Though you may feel awkward doing this in front of your team, and reading from a script, they're not paying attention to you, they're paying attention to the participant. If you feel like you have made a mistake – you said the wrong thing, or asked a question the wrong way, or you led the participant somehow, keep going. And then, go do another session. You don't have to throw everything out from the session you made a mistake in. Salvage what you can and move on.

It's a chance for team members to get closer to the participants

The whole object of doing user research is for the team to learn about current experiences. More data, even data gathered sloppily, is better than a tiny bit of data gathered expertly. And the new moderator will get better at it. No one is born to the role; moderating well is a set of learned skills. And I think that anyone can learn them with time, practice, and coaching. Find your next moderator on your design team. And keep up the good work.

Other resources you might find useful:

Common Problems in Usability Test Facilitation by Rolf Molich and Chauncey Wilson

Moderating Usability Tests: Principles and Practices for Interacting (Interactive Technologies), by Joseph S. Dumas and Beth A. Loring

Remote Research: Real Users, Real Time, Real Research (Rosenfeld Media), by Nate Bolt and Tony Tulathimutte

Seven Common Usability Testing Mistakes, by Jared Spool

Moderating with Multiple Personalities: 3 Roles for Facilitating Usability Tests, by Jared Spool

Usability Tools Podcast: Moderating Usability Tests, Part 1

Jared Spool and Brian Christiansen

Effectively Moderating Usability Tests, with Beth Loring

Recorded virtual seminar from UIE ($149)

Wednesday, October 21, 2009

Easier data gathering: Techniques of the pros

In an ideal world, we'd have one person moderating a user research session and at least one other person taking notes or logging data. In practice it often just doesn't work out that way. The more people I talk to who are doing user research, the more often I hear from experienced people that they're doing it all: designing the study, recruiting participants, running sessions, taking notes, analyzing the data, and reporting.

I've learned a lot from the people I've worked with on studies. Two of these lessons are key:

- Doing note taking well is really hard.

- There are ways to make it easier, more efficient, and less stressful.

Today, I'm going to talk about a couple of the techniques I've learned over the years (yes, I'll give credit to those I, um, borrowed from so you can go to the sources) for dealing with stuck participants, sticking to the data you want to report on, and making it easy to see patterns.

Graduated prompting

Say you're testing a process that has several steps and you want to see the whole thing, end-to-end. This is not realistic. In real life, if someone gets stuck in a process, they're going to quit and go elsewhere. But you have a test to do. So you have to give hints. Why not turn that into usable data? Track not only where in the user interface people get stuck, but also how much help they need to get unstuck.

This is also an excellent technique for scavenger hunt tasks – you can learn a lot about where the trigger words are not working or where there are too many distractions from the happy path or people are simply going to need more help from the UI.

Here's what I learned from Tec-Ed about what to do when a participant is stuck but you need them to finish:

- First, ask participants to simply try again.

- If participants are unable to move forward, give a hint about where to look: "I noticed that you seem to be focused mostly in this area (pointing). What if you look elsewhere?"

- If participants are still stuck and want to give up or say they would call someone, let them call a "help desk" or, depending on the study, give a stronger hint without being specific.

- Finally, you may have to get specific.

The idea is to note where in the UI you're giving the hints and how many for any particular hindrance. This gives you weighted evidence for any given participant and then some great input to design decisions as you look at the data across participants.

Pick lists

You may say this is cheating. But don't you feel like you have a pretty good idea of what's going to happen when a participant uses a design? This technique is about anticipating what's going to happen without projecting to participants what the possibilities are. Make a list of all the possible wrong turns you can imagine. Or at least the ones you care about fixing.

Being able to do this comes from awareness and the researcher's experience with lots of user interfaces. This is not easy to do if you've only done one or two studies. But as you get more observations under your belt, looking ahead gets easier. That is, most of us are paying attention to the happy path as the optimum success in a task, but then have to take lots of notes about any deviation from that path. If you look at what the success and error conditions are as you design a study, you can create list to check off to make data gathering quicker and less taxing as you're doing both that and moderating.

Here's an example from a study I did with Ginny Redish researching the language of instructions on ballots. This is on a touch screen, so "touch" is the interaction with the screen, not with an actual candidate:

There are a lot of things wrong with this particular example: having the same word at the beginning of many of the possible selections does not make the list easy to skim and scan; there are too many items in the list (we ended up not using all of those error conditions as data points). As we designed the test, we were interested in what voters did. But as we analyzed the data and reported, we realized that what didn't matter for this test as much as whether they got it right the first time or made mistakes and recovered or made mistakes and never recovered. That would have been a different pick list.

But all was not lost. We still got what we wanted out of the pick lists. This is what they ended up looking like as note-taking devices:

Judged ratings

Usually in a formative or exploratory study, you can get your participants to tell you what you need to know. But sometimes you have to decide what happened from other evidence: how the participant behaved, what they did to solve a problem or move forward, where they pointed to.

As a researcher, as a moderator, you're making decisions all the time. Is this data? Is it not? Am I going to want to remember this later, or is it not needed?

After we realized that we were just going to make a judgment anyway, Christine Perfetti and I came up with a shortcut for making those kinds of decisions. Really, what we're doing is assisting judgments that experienced researchers have probably automatized. That is, after dozens or hundreds of observations, you've stored away a large library of memories of participant behaviors that act as evidence of particular types of problems.

To make these on-the-fly judgments, Christine and I borrowed a bunch of techniques from Jared Spool at UIE and used variations of them for a study we worked on together. As the moderator of the day asked, "What do you expect to happen when you click on that?" and followed up with, "How close is this to what you expected?" the note taker for the day recorded something like this:

Another way to use this trick is to ask, "The task is [X]. How close do you feel you are now to getting to where you want to be? Warmer? Cooler?"

I also think that most of us collect too much data. Well, okay, I often do. Then I wonder what to do with it. I've found that when I really focus on the research questions, I can boil the data collecting down significantly. So here's a minimalist note-taking device: I created a one-sheet data collector that covered three tasks and helped me document a pass/fail decision for voting system documentation. You can quibble about some of the labeling in the example below, but I was happy to have one piece of paper that collected what happened along with how I know that, and what that combination of things happening means.

It attempts to encapsulate the observation-inference-theory process all in one place.

Again, if you haven't done much user research or usability testing, you may not be happy with this approach. And, let's not forget how valuable qualitative data like quotes is. But you more experienced people out there may find that codifying the judgments you're making this way makes it much quicker to note what happened and what it might mean, expediting analysis.

Shortcuts are not for sissies

Most user research is not run in an ideal world. Note taking in user research is one of the most difficult skills to learn. Luckily, I have had great people to work with who shared their secrets for making that part of the research less stressful and more efficient.

Graduated prompting is a way to quantify the hints you give participants when you need them to complete tasks in a process or continue on a path in a scavenger hunt.

Judged ratings are based on observations and evidence that fall into defined success and error criteria.

Got several dozen hours under your research belt? Focus the data collecting. Try these techniques for dealing with stuck participants, sticking to the data you want to report on, and making it easy to see patterns.

:: :: :: :: :: :: :: :: NEWS FLASH :: :: :: :: :: :: :: ::

Come to UI 14 in Boston November 1-3 and get a discount on me. Type in promotional code CHISNELL when you register and you'll get $50 off each day. If you sign up for all three days, you'll also get a set of Bose headphones. Sweeeet.

Do it! You know you want to. Mastering the Art of User Research.

Saturday, March 7, 2009

What counts: Measuring the effectiveness of your design

Let’s say you’re looking at these behaviors in your usability test:

Where do participants start the task?
How easily do participants find the right form? How many wrong turns do they take on the way? Where in the navigation do they make wrong turns?
How easily and successfully do they recognize the form they need on the gallery page?
How well do participants understand where they are in the site?

How does that turn into data from which to make design decisions?

What counts?
It’s all about what counts. What did the team observe that shows that these things happened or did not happen?

Say the team does 10 individual usability test sessions. There were 5 major “scavenger hunt” tasks. Everyone has their own stack of yellow stickies that they’ve written down observations on. (Observations of behavior, only – there should be no interpreting, projecting, guessing, or inferring yet.) Or, say the team has kept a rolling issues list. All indications are that the team is in consensus about what happened.

Example 1: Entry points
Here’s an example. For the first task, Find an account open form, the first thing the team wanted to observe for was whether participants started out where we thought they should (Forms), and if not, where participants did start.

The data looked like this:

Seven of the 10 started out at Forms – great. That’s what the team expected based on the outcomes of card sorts. But 3 participants didn’t. But those 3 all started out at the same place. (First inference: Now the team knows there is strong scent in one link and some scent in another link.)

Example 2: Tracking navigation paths – defining “wrong turn”
Now, what about the wrong turns? In part, this depends on how the team defines “wrong turn.”

What you’re finding out in exploratory tests with early designs is where users go. Is that wrong? Not necessarily. Think of it in the same way that some landscapers and urban planners do about where to put walkways in a park. Until you can see where the traffic patterns are, there’s not a lot of point in paving. The data will tell you where to put the paths outside where the team projects the path should be.

As each session goes on, the team tracks where participants went. The table below actually tracks the data for multiple issues to explore:

How many wrong turns do they take on the way?
Where in the navigation do they make wrong turns?
How easily and successfully do they recognize the form they need on the gallery page?

The data looked like this:

Everyone ended up at the right place. Some participants even took the path that the team expected everyone to take: Forms / Account Open / Form #10.

But the participants who started out at Products had to go back to the main navigation to get to the right place. There’s a decision to make. The team could count those as “wrong turns” or they could look at them as a design opportunity. That is, the team could put a link to Forms on the Product page – from the point of view of the user, they’re still on the “right” path and the design has prevented the user from making a mistake.

Account Open is a gallery page. Kits is the beginning of a wizard. Either way, the right form is available in the next step and all the participants chose the right one.

Measures: Everything counts
So, how do you count what counts? The team counted errors (“wrong turns”) and task successes. How important are the counts? The team could have gone with their impressions and what they remembered. There’s probably little enough data to be able to do that. In smaller tests, your team might be comfortable with that. But in larger tests – anything over a few participants – observers typically remember the most recent sessions the best. Earlier sessions either fade in memory or the details become fuzzy. So tracking data for every session can keep the whole team honest. When there are numbers, the team can decide together what to do with them.

What we saw
This team learned that we got the high-level information architecture pretty close to right – most participants recognized where to enter the site to find the forms. We also learned that gallery pages were pretty successful; most participants picked the right thing the first or second time. It was easy to see all of this in tracking and counting what participants did.

Friday, February 27, 2009

Consensus on observations in real time: Keeping a rolling list of issues

Design teams often need results from usability studies yesterday. Teams I work with always want to start working on observations right away. How to support them while giving good data and ensuring that the final findings are valid?

Teams that are fully engaged in getting feedback from users – teams that share a vision of the experience they want their users to have – have often helped me gather data and evaluate in the course of the test. In chatting with Livia Labate, I learned that the amazing team at Comcast Interactive Media (CIM) came to the same technique on their own. Crystal Kubitsky of CIM was good enough to share photos of CIM's progress through one study. Here’s how it works:

1. Start noting observations right away
After two or three participants have tried the design, we take a longer break to debrief about what we have observed so far. In that debrief, each team member talks about what he or she has observed. We write it down on a white board and note which participants had the issue. The team works together to articulate what the observation or issue was.

2. Add observations to the list between sessions
After each succeeding session, as the team generates observations, I add observations to that list, including the numbers for each participant who had the issue. We note any variations on each observation – they may end up all in one, or they may branch off, depending on what else we see.

Here’s an example observation from one study. We noted on the first day of testing that

Participants talked about location, but scrolled past the map without interacting with it to get to the search results (the map may not look clickable)

The team and I later added the participant numbers for those who we observed doing this:

Participants talked about location, but scrolled past the map without interacting with it to get to the search results (the map may not look clickable) PP, P1, P3

Each day of testing, the team and I add more observations and more participant numbers. The CIM team debriefs to review top observations, highlighting what they learned and color-coding participants or segments as they capture rolling observations:

3. Invite team member observers to add observations to the list themselves
As the team gets better at articulating the issues they have seen in a test session, it is my experience that they start adding to the list on their own. Often one of the observers voluntarily takes over adding to the list. This helps generate even more buy-in from the team and means that I can concentrate on focusing the discussion on the issues we agreed to explore when we planned and designed the test.

At the end of the last session, it’s easy then to move to a design direction meeting because the team has observed sessions, articulated issues, and already analyzed that data together.

The CIM team documents which participants had which behaviors in the table at the top right of the photo above.

What makes it a “rolling” list of observations and issues
There are three things that are “rolling” about the list. First, the team adds issues to the list as they see new things come up (or that you didn’t notice before, or seemed like a one-off problem). Second, the team adds participant numbers for each of the issues as the test goes along. Third, the team refines the descriptions of the issues as they learn more from each new participant.

Double-check the data, if there’s time
Unless the team is in a very rapid, iterative, experimental mode, I still go back and tally all of the official data that I collected during each session. I want to be sure that there are no major differences between the rolling issues list and the final report. Usually, the rolling issues list and the final data match pretty closely because the team took care in defining the research questions and issues to explore together when we planned and designed the test.

Doing the rolling list keeps teams engaged, informed, and invested. It helps the researcher cross-check data later, and it gives designers and developers something to work from right away.

Tuesday, February 17, 2009

Looking for love: Deciding what to observe for

Last winter I worked with a team that wanted to find out whether a prototype they had designed for a new intranet worked for users. Their new design was a radical change from the site that had been in place for five years and in use by 8,000 users. Going to this new design was a big risk. What if users didn’t like it? Worse, what if they couldn’t use it?

We went on tour. Not to show the prototype, but to test it. Leading up to this moment we had done heaps of user research: stakeholder interviews, field observations (ethnography, contextual inquiry – pick your favorite name), card sorting, taxonomy testing. We learned amazing things, and as our talented interaction designer started translating all that into wireframes, we got pressure to show them. We knew what we were doing. But we wanted to be sure. So we made the wireframes clickable and strung them together to make them feel like they were doing something. And then we asked (among other things):

How well does the design support the tasks of each user group?
How easily do users move through the site for typical tasks?
Where do they take wrong turns? What trigger words are missing? What trigger words are wrong?

Validating the research
In some ways, you could look at this as a validation test – not validating the design necessarily, but instead validating the user research we had done. Did we interpret our observations correctly by making the right inferences, in turn getting us to the design we got to?

What was possible: where the design might break
To find out, we had to answer those Big Questions. What were the issues within them that we wanted to investigate? Let’s take an example: How easily do users move through the site for typical tasks? We wanted to know whether users took the same path we wanted them to take, and if they didn’t, why not. On a task to find forms to open a brokerage account, we listed the possible issues. Users might

start at the wrong place in the site
get lost
pick the wrong form
not recognize they’ve reached the right place

From that discussion of the disasters that we could imagine came a list of behaviors to observe for, or as my friends at Tec-Ed say, issues to explore:

Where do participants start the task?
How easily do participants find the right form? How many wrong turns do they take on the way? Where in the navigation do they make wrong turns?
How easily and successfully do they recognize the form they need on the gallery page?
How well do participants understand where they are in the site?

What we saw
From these questions, we learned that we got the high-level information architecture right – most participants recognized where to enter the site to find the forms. We also learned that there were a couple of spots in the task path that had a combination of weak trigger words and other distractions that drew attention away from the things that would have gotten participants to the goal more quickly. But the groupings on the gallery page were pretty successful; most participants picked the right thing the first or second time. It was easy to see all of this in the way participants performed, but we also heard clues from them about what they were looking for and why.

And, by the way, the participants loved it. We knew because they said so.

Tuesday, February 10, 2009

Popping the big question(s): How well? How easily? How valuable?

When teams decide to do usability testing on a design, it is often because there’s some design challenge to overcome. Something isn’t working. Or, there’s disagreement among team members about how to implement a feature or a function. Or, the team is trying something risky. Going to the users is a good answer. Otherwise, even great teams can get bogged down. But how do you talk about what you want to find out? Testing with users is not binary – you probably are not going to get an up or down, yes or no answer. It’s a question of degree. Things will happen that were not expected. The team should be prepared to learn and adjust. That is what iterating is for (in spite of how Agile talks about iterations).

Ask: How well
Want to find out whether something fits into the user’s mental model? Think about questions like these:

How well does the interaction/information information architecture support users’ tasks?
How well do headings, links, and labels help users find what they’re looking for?
How well does the design support the brand in users’ minds?

Ask: How easily
Want to learn whether users can quickly and easily use what you have designed? Here are some questions to consider:

How easily and successfully do users reach their task goals?
How easily do users recognize this design as belonging to this company?
How easily and successfully do they find the information they’re looking for?
How easily do users understand the content?
How easy is it for users to understand that they have found what they were looking for?
How easy or difficult is it for them to understand the content?

Ask: How valuable

What do users find useful about the design?
What about the design do they value and why?
What comments do participants have about the usefulness of the feature?

Ask: What else?

What questions do your users have that the content is not answering?
What needs do they have that the design is not addressing?
Where do users start the task?

Teams that think of their design issues this way find that their users show them what to do in the way they perform with a design. Rarely is the result of usability testing an absolute win or lose for a design. Instead, you get clues about what’s working – and what’s not – and why. From that, you can make a great design.

Tuesday, August 19, 2008

Retrospective review and memory

One of my favorite radio programs (though I listen to it as a podcast) is Radiolab, “ a show about science,” which is a production of WNYC hosted by Robert Krulwich and Jad Abmurad and distributed by NPR. This show contemplates lots of interesting things from reason versus logic in decision making to laughter to lies and deception.

The show I listened to last night was about how memories are formed. Over time, several analogies have developed for human memory that seem to be related to the technology available at that time. Robert said he thinks of his memory as a filing cabinet. But Jad, who is somewhat younger than Robert, described his mind as a computer hard disk. Neurologists and cognitive scientists they talked to, though, said No, memory isn’t like that at all. In fact, we don’t store memories. We recreate them every time we think of them.

Huh, I thought. Knowing this has implications for user research. For example, there are several points at which usability testing relies on memory: the memory of the participant if we’re asking questions about the past behavior; the memory of the facilitator for taking notes, analyzing data, and drawing inferences; the memories of observers in discussions about what happened in sessions and what it means.

Using a think-aloud technique – getting participants to say what they’re thinking while working through a task – avoids some of this. You have a verbal protocol as “evidence.” If there’s disagreement about what happened among the team members, you can go back to the recording to review what the participant said as well as what they did.

But there are times when think-aloud is not the right technique, either because the participant cannot manage the divided attention of doing a task and talking about it at the same time, or because of other circumstances. In those situations, you might think about doing retrospective review, instead.

“Retrospective review” is just a fancy name for asking people to tell you what happened. If you have the tools and time available, you can go to a recording after a session, so the participant can see what she did and respond to that by giving you a play-by-play commentary.

As soon as participants start viewing or listening to the beginning of an episode – up to 48 hours after doing the task – they’ll remember having done it. They probably won’t be able to tell you how it ended. But they will be able to tell you what’s going to happen next.

And that’s the really useful thing about doing retrospective review. As the participant recreates the memory of the task, you can ask, “What happens next? What will you do next and why?” Pause. Listen. Take notes. And then start playing back the recording again. Sure enough, it’ll be like the participant said. Only now you know why.

Asking participants what happens next in their own stories also avoids most revisionist history. That is, if you ask participants to explain had what happened after they view it, they may rationalize what they did. This isn’t the same as remembering it.

Usability Testing