Tuesday, June 30, 2009

Testing in the wild, seizing opportunity

When I say “usability test,” you might think of something that looks like a psych experiment, without the electrodes (although I’m sure those are coming as teams think that measuring biometrics will help them understand users’ experiences). Anyway, you probably visualize a lab of some kind, with a user in one room and a researcher in another, watching either through a glass or a monitor.

It can be like that, but it doesn’t have to. In fact, I’d argue that for early designs it shouldn’t be like that at all. Instead, usability testing should be done wherever and whenever users normally do the tasks they’re trying to do with a design.


Usability testing: A great tool
It’s only one technique in the toolbox, but in doing usability testing, teams get crisp, detailed snapshots about user behavior and performance. As a bonus, gathering data from users through observing them do tasks can resolve conflict within a design team or assist in decision-making. The whole point is to inform the design decisions that teams are making already.


Lighten up the usability testing methodology
Most teams I know start out thinking that they’re going to have a hard time fitting usability testing into their development process. All they want is to try out early ideas, concepts and designs or prototypes with users. But reduced to its essence, usability testing is simple:
  • Develop a test plan and design
  • Find participants
  • Gather the data by conducting sessions
  • Debrief with the team

That test plan/design? It can be a series of lists or a table. It doesn’t have to be a long exposition. As long as the result is something that everyone on the team understands and can agree to, you have written enough. After that, improvising is encouraged.

The individual sessions should be short and focused on only one or two narrow issues to explore.


But why bother to do such a quick, informal test?
First, doing any sort of usability test is good for getting input from users. The act of doing it gets the team one step closer to supporting usable design. Next, usability testing can be a great vehicle for getting the whole team excited about gathering user data. There is nothing like seeing a user use your design without intervention.

Most of the value in doing testing – let’s say about 70% – comes from just watching someone use a design. Another valuable aspect is the team working together to prepare for a usability test. That is, thinking about what Big Question they want answered and how to answer it. When those two acts align, having the team discuss together what happened in the sessions just comes naturally.


When not to do testing in the wild: Hard problems or validation
This technique is great for proving concepts or exploring issues in formative designs. It is not the right tool if the team is facing subtle, nuanced, or difficult questions to answer. In those cases, it’s best to go with more rigor and a test design that puts controls on the many possible variables.

Why? Well, in a quick, ad hoc test in the wild, the sample of participants may be too small. If you have seized a particular opportunity (say, with a seatmate on an airplane or a bus, as I have been known to do – yeah, you really don’t want me to sit next to you on a cross-country flight), a sample of one may not be enough to instill confidence with the rest of the team.

It might also happen, because the team is still forming ideas, that the approach in conducting sessions is not consistent from session to session. When that goes on, it isn’t bad necessarily. It can just mean that it’s difficult to draw meaningful inferences about what the usability problems are and how to remedy them.

If the team is okay with all that and ready to say, “let’s just do it!” to usability testing in the wild, then you can just do more sessions.


So, there are tradeoffs
What might a team have to consider in doing quick, ad hoc tests in the wild rather than a larger, more formal usability test? If you’re in the right spot in a design, for me doing usability testing in the wild is a total win:
  • You have some data, rather than no data (because running a larger, formal test is daunting or anti-Agile).
  • The team gets a lot of energy out of seeing people use the design, rather than arguing among themselves in the bubble of the conference room.
  • Quick, ad hoc testing in the wild snugs nicely into nearly any development schedule; a team doesn’t have to carve out a lot of time and stop work to go do testing.
  • It can be very inexpensive (or even free) to go to where users are to do a few sessions, quickly.


Usability testing at its essence: something, someone, and somewhere
Just a design, a person who is like the user, and an appropriate place – these are all a team needs to gather data to inform their early designs. I’ve seen teams whip together a test plan and design in an hour and then send a couple of team members to go round up participants in a public place (cafes, trade shows, sporting events, lobbies, food courts). Two other team members conduct 15- to 20-minute sessions. After a few short sessions, the team debriefs about what they saw and heard, which makes it simple to agree on a design direction.


It’s about seizing opportunity
There’s huge value in observing users use a design that is early in its formation. Because it’s so cheap, and so quick, there’s little risk of making a mistake in making inferences from the observations because a team can compensate for any shortcomings of the informality of the format by doing more testing – either more sessions, or another round of testing as follow-up. See a space or time and use it. It only takes four simple steps.

Monday, June 8, 2009

Tools for plotting a future course of design, checking progress

“Let’s check this against the Nielsen guidelines for intranets,” she said. We were three quarters of the way through completing wireframes for a redesign. We had spent 4 months doing user research, card sorting, prototyping, iterating, and testing (a lot). At the time, going back to the Nielsen Norman Group guidelines seemed like a really good idea. “Okay,” I said. “I’m all for reviewing designs from different angles.”

There are 614 guidelines.

This was not a way to check designs to see if the team had gone in the right design direction.


Are you designing or inspecting?
They are not interchangeable, guidelines and heuristics, but many UXers treat them that way. It’s common to hear someone saying that they’re doing a heuristic evaluation against X guidelines. But it doesn’t quite work like that.

Designing is an act of creation, whether you’re doing research, drawing on graph paper, or coding CSS. Inspecting is an act of checking, of examining, often with some measure in mind.

Guidelines are statements of direction. They’re about looking to the future and what you want to incorporate in the design. Guidelines are aspirational, like these:

  • Add, update, and remove content frequently.
  • Provide persistent navigation controls.
  • Index all intranet pages.
  • Provide org charts that can be viewed onscreen as well as printed.*

Heuristics challenge a design with questions. The purpose of heuristics is to provide a way to “test” a design in the absence of data by making an inspection. Heuristics are about enforcement, like these:


Visibility of system status
The system should always keep users informed about what is going on…
Match between system and the real world
The system should speak the users' language….
User control and freedom
The system should provide a clearly marked "emergency exit" to leave the unwanted state … **

Creating or diagnosing?
Heuristics are often cast as pass/fail tests. Does the UI comply or not? While you could use the usability.gov guidelines to evaluate web site designs, they were developed as tools for designing. They present things to think about as teams make decisions.

Both guidelines and heuristics are typically broad and interpretable. They’re built to apply to nearly interface. But they come into play at different points in a design project. Guidelines are things to think about in reaching a design; they are considerations and can interact with one another in interesting ways. Heuristics are usually diagnostic and generally don’t interact.


Don’t design by guidelines alone
For example, on the intranet project, we looked at guidelines about the home page. One directive says to put the most important new information on the home page, and the next one says to include key features and company news on the home page. A third says to include tools with information that changes every day. But earlier in the list of guidelines, we see a directive to be “judicious about having a designated ‘quick links’ area.” Guidelines may feel complementary to one another or some may seem to cancel others out. Taken together, there’s a set of complex decisions to make just about the home page.

And it was too late on our intranet to pay attention to every guideline. The decisions had been made, based on stakeholder input, business requirements, and technology constraints, as well as user requirements. Though we were thoughtful and thorough in designing, anyone scoring our site against the guidelines might not give us good marks.


Don’t evaluate by heuristics alone
Likewise, when looking at heuristics such as “be consistent,” there’s a case for conducting usability tests with real users. For example, on the intranet I was working on, one group in the client company was adamant about having a limited set of page templates, with different sections of the site meeting strict requirements for color, look, and feel. But in usability testing, participants couldn’t tell where they were in the site when they moved from section to section.


Guidance versus enforcement
What are you looking for at this point in your design project? In the intranet project, we were much closer to an evaluative mode than a creation mode (though we did continue to iterate). We needed something to help us measure how far we had come. Going back to the guidelines was not the checkpoint we were looking for.

We sallied forth. The client design team decided instead to create “heuristics” from items from the user and business requirements lists generated at the beginning of the project, making a great circle and a thoughtful cycle of research, design, and evaluation.

I don’t know whether the intranet we designed meets all of the guidelines. But users tell us and show us every day that it is easier, faster, and better than the old intranet. For now, that’s enough of a heuristic.


* From "Intranet Usability: Design Guidelines from Studies with Intranet Users" by Kara Pernice Coyne, Amy Schade, and Jakob Nielsen

** From Jakob Nielsen's 10 heuristics, see http://www.useit.com/papers/heuristic/heuristic_list.html



Related:

Where do heuristics come from?

What are you asking for when you ask for heuristic evaluation?


:: :: :: :: :: :: :: :: :: ::

Note: I'm moving!
After 20 years in the San Francisco Bay Area, I'm bugging out. As of September 1, I will be operating out of my new office and home in Andover, Massachusetts. I'm excited about this move. It's big!

You can still find me at www.usabilityworks.net, email me at dana@usabilityworks.net, on Twitter as danachis, and on the phone at 415.519.1148.

Sunday, June 7, 2009

Secret word for discounts on Web Design World in Seattle in July

I'll be doing two sessions at Web Design World in Seattle in July (20-22):

What to do when: Informing design at every phase

Getting to insights: A radical approach to usability testing with Jared Spool

Please come! I promise you'll get tools and techniques that you can't get anywhere else.

To get you there, I'm giving you the secret password so you can get a discount on registration for the event. When you register with the code, you get the best possible price for the WDW Passport Package. This code slashes $395 from the standard price of the three-day Web Design World Passport—$1000 instead of $1395. (The Passport Package includes all sessions, networking events, welcome reception, breakfasts and lunches, and the post-conference workshops.)

The secret password? S9W04 (Pass it on.)

Monday, May 18, 2009

Webinar June 3: Quick, easy, and insightful - usability testing in the wild

Ever want to do a usability test but felt you couldn't because you didn't have a lab? Or couldn't record sessions? Or have something special to test?

Fear no more. You can learn how to do informal, ad hoc usability tests in the formative stages of design that will help you gain knowledge about your users and hone your design direction.

This 2-hour webinar is being offered by the US General Services Administration. C'mon along. The price couldn't be better at $75 for non-government attendees, $50 for government employees.

See the course details and register!

Tuesday, May 5, 2009

Where do heuristics come from?

Recently I had the honor and pleasure of working on a project for the National Institute of Standards and Technology (NIST) to develop style guidelines for voting system documentation. Yawner, right? Not at all, it turns out. It made me think about where guidelines and heuristics come from for all kinds of design. Yes, if you live in the United States, you paid for me to find this out. Thank you.

What I learned in the process of developing style guidelines for voting system documentation (which, astonishingly took about a year) is that most heuristics -- accepted principles -- used in evaluating user interfaces come from three sources: Lore or folk wisdom, specialist experience, and research.

Though style guidelines for content are important, I’m going to talk about each of these sources of heuristics with various design examples. I’m sure you’ll see something that you’ve encountered before.


Lore or folk wisdom
First come guidance from “They,” as in “They say…,” for which no one knows the true source. For example: “Feed a cold, starve a fever.” “Never end a sentence with a preposition.” “Limit the number of items in the main navigation to seven plus or minus two.”

Where do these come from? Someone’s belief that this is a good practice. They may have heard something or done something that they think supports the practice, but there’s really not basis in fact for any of these.

A New York Times article published on February 13, 2007 by Anahad O’Connor says that recent research about whether to eat a lot when you have a cold and fast when you have a fever is inconclusive. No one seems to know how this one started. It may just feel like there’s some inherent logic to it.

Not ending sentences with prepositions was encoded by a British guy named Henry Fowler in 1926. He was a crotchety, proscriptionist pedant but his book was a best seller. People wanted guidance about how to speak and write “properly,” especially in class-conscious England. So, a rule to not use words like “to,” “in,” “for,” “with,” or “on” as the last word in a sentence became wildly popular as a marker of a well bred, well educated person. But it was really just Fowler’s personal preference, and today the usage seems affected.

My favorite example in the web design world is a guideline about limiting the number of items in a navigation menu or list to five to seven items. Most people don’t know where this came from – if they did, they’d know that this isn’t the best use of that “rule” and imposing it actually won’t make the design usable. This one does originate in research, specifically, an article published in 1956 by George Miller in the Psychological Review called “The magical number seven, plus or minus two: Some limits on our capacity for processing information.” (You can see a reproduction of the article here: http://www.musanim.com/miller1956/) The findings from the research Miller describes are about working memory. The lore passed down from that article is that humans can only hold about seven things in their short-term memory at a time. BUT, Miller heavily, heavily qualifies this as “suggestive” and an “estimate.” More importantly, what in the world does this have to do with designing web site navigation? Nothing. Navigation is persistent. We’re not asking people to remember from section to section where they can go. It’s right there for them to see and use. The number of items in navigation should be determined by the data from research about the users and their task goals.

If you catch yourself saying, “they say,” or “I’ve heard” when making an observation about a design issue when you’re inspection mode, you may be caught without a lot of support for your point. Basing an inspection on your own experiences observing users can hold more authority – but not as much as research-based guidelines or heuristics.


Specialist experience
Older adults who use the Web need high contrast and large targets. If they are not expert Web users, they can be easily distracted, so to ensure that they’re successful, we should design in smooth task paths and clear labeling that doesn’t use jargon.

One of my special interests since about 2003 has been Web design for older adults. I’ve internalized the design principles above (as well as many others) after watching dozens of people who are age 55 or older use a variety of web sites. I am confident that implementing a design that takes these design principles into account will make the design easier for older people to use than designs that use subtle colors layered on one another, small buttons and links, and cluttery page layout with trendoid headings and labels. Though I have observed many types of people using lots of different kinds of web sites, I have specialist experience from watching one audience try to do typical tasks on a variety of web sites.

Specialist experience means expertise in a particular domain or product. You get it only after hours and hours and hours of seeing the same kinds of things happen to the same types of users. Basing an inspection on specialist experience is definitely a step up from working from lore, but if you haven’t distilled what you have found in the many hours of observing a type of users using a site or type of site, then you may be working from hunches and opinion that could make it difficult for you to justify the evaluation recommendations.


Evidence from research
Some things that experienced designers have internalized do have data to support them: Eliminate horizontal scrolling. Design for working memory limitations. Facilitate scanning.

These all rate a 5 for strength of evidence in the guidelines on usability.gov. (I’ll get to the rating thing in a minute.) Usability.gov started as a project at the National Cancer Institute (NCI), which is part of the US Department of Health and Human Services. NCI needed help designing usable cancer information web sites. A simple goal.

On the way, the NCI team realized that not all guidelines were equal. Some guidelines were supported by a lot of data from multiple studies (like the high scoring heuristics above). Some guidelines might come from only one study. Still, evidence was evidence and NCI wanted to use “quantified, peer-reviewed web site design guidelines,” which they found simply didn’t exist. And as far as I know, there’s nothing like the resource NCI created at usability.gov.

To reach their goal, NCI put together panels of experts to review research. The panelists then who rated each guideline for strength of evidence (which among other considerations was “cumulative and compelling” for a 5 out of 5).

The idea was that teams could use the ratings to help them make design decisions. But the guidelines were not meant to be a substitute for usability testing. Why not? The main reason was that the guidelines at usability.gov were developed for information-rich web sites (versus e-commerce or transaction-based sites) with content about major illnesses. That’s fairly specialized. But when you read through the 500 guidelines that NCI identified, it is obvious that almost all could apply to many types of web sites or many types of pages within sites. Your mileage may vary.


The basis of the heuristic matters
As the folks at NCI learned in developing usability.gov and I learned in the work for NIST*, provenance is important. This is true of all implicit and explicit heuristics applied in design decisions.

Learning about where heuristics come from – lore or folk wisdom; specialist experience; or research – helped me understand better where some of the teams I’ve worked with were coming from as they developed design principles. Sometimes they based the principles on lore, sometimes on expertise. Rarely did they go to the research.

Expertise is good, but research is better. Research-based heuristics simply have more heft: credibility, specificity, and applicability.

Still, there’s no substitute for primary research. Firsthand observation of your users in their context reveals subtleties of behavior that even research-based heuristics can’t match. And if your research of your users in their context contradicts the known research, what do you do? (You don’t get two guesses to answer this question.) If you go with what your users do, then even the most deeply researched heuristics are at best a poor substitute for doing the right thing.


* I couldn't have made the discoveries I did on that project without Susan Becker, my project partner, who did most of the heavy lifting.


:: :: :: :: ::

Related links

You can pore through the evidence-based guidelines for usability developed by the National Cancer Institute (NCI) at www.usability.gov.

You might also want to check out other sets of research-based heuristics and guidelines. I've worked on a couple of them:

www.aarp.org/olderwiserwired holds a set of heuristics for designing web sites for older adults. The National Institute on Aging (NIA) has just published an update of its guidelines for designing senior-friendly websites at http://www.nia.nih.gov/HealthInformation/Publications/website.htm.

vote.nist.gov now links to style guidelines for voting system documentation, which are based on research in technical communication, information design, usability, and instructional design. Click on the Publications link to download the PDF of the guidelines and to view other design research related to voting systems.

Oh, and you might want to read Miller's article to judge for yourself: http://www.musanim.com/miller1956/

Thursday, April 16, 2009

What are you asking for when you ask for a heuristic evaluation?

Every usability professional I know gets requests to do heuristic evaluations. But it isn’t always clear that the requester actually knows what is involved in doing a heuristic evaluation. Some clients who have asked me to do them have picked up the term “heuristic evaluation” somewhere but often are not clear on the details. Typically, they have mapped “heuristic evaluation” to “usability audit,” or something like that. It’s close enough to start a conversation.

Unfortunately, the request usually suggests that a heuristic evaluation can substitute for usability tests. I chat with the person, starting by talking about what a heuristic evaluation is, what you get out of it, and how it compares to what you find out in a usability test.



How do you do a heuristic evaluation?

Let’s talk about what a “classic” heuristic evaluation is. When Jakob Nielsen and Rolf Molich published the method in 1990, these two really smart guys were trying to distill some of the basic principles that make a user interface usable to its audience. They came up with 10 “accepted usability principles” (heuristics) that, when multiple evaluators applied them to any user interface, should reveal gaps in the design of the UI that could cause problems for users.

Armed with the Nielsen checklist of accepted usability principles – heuristics – someone who had never seen the UI before and who was not necessarily knowledgeable about the domain should be able to determine whether any UI complied with these 10 commandments of usable user interface design. If three or four or five people sat down for an hour or two and inspected an interface separately, they could come up with piles of problems. Then they could compare their lists, normalize the issues, and then hand a list off to the engineers to go fix.



What do you get out of a heuristic evaluation?

Let’s say that the person who called me the other day was asking for a review in the form of a heuristic evaluation to resolve a conflict on the team. The conflict on this team was about the page flow: What should the order of steps in the process be: the same as site X or the same as site Y? Should the up-sell be at the beginning or the end of the purchase process? ‘Could you please review the UI and just tell us what to do because we don’t have time and money to do a usability test.’

Several of the Nielsen heuristics might apply. Some probably don’t. For example, did the success of the page flow require users to remember things from step to step (recognition rather than recall)? Were there any shortcuts for return customers (flexibility and efficiency of use)? Where might users get bogged down, distracted, or lost (aesthetic and minimalist design)? By applying these heuristics, what have we found out?

The flow might require people to remember something from one step to another. The way the heuristic is written, requiring this of users is always bad. But it might not be.

The flow might not have shortcuts for expert users. The way the heuristic is written, not having short cuts is bad. But it might not be.

There may be places in the flow that slow people down. The way the heuristic is written, you always want users to be able to do tasks quickly. But you might not.

And I don’t think we have resolved the conflict on the team.

When applying what I call “checklist usability” in a heuristic evaluation to learn what the flaws and frustrations of a design might be, the outcome is a determination of whether the user interface complies to the heuristics. It is an inspection, not an evaluation. It is not about the user experience. It’s not even about task performance, which is what the underlying question was in the team’s conflict: Will users do better with this flow versus that flow? If we interrupt them, will they still complete a purchase? Any inspection method that claims to answer those kinds of questions is just guessing.

A team may learn about some design flaws, but the frustrations could remain stubbornly hidden -– unless the reviewer has already observed many users trying to reach goals using this site or process, or something very like it in the same domain. Even then, there’s a huge risk that a single inspector or even a small group of inspectors -- who are applying very general guidelines, are not like the users, and are not actually using the design as part of the inspection – will miss flaws that will be task-stoppers. Worse, they may identify things that don’t comply with the heuristics that should not be changed.



How does heuristic evaluation compare to usability testing?

Heuristic evaluation was codified about 1990, at a time when it was expensive to get access to users. It wasn’t uncommon for people to have to be trained to use the technology being evaluated before they could sit down in a usability lab to perform some tasks. The whole concept of there being a user interface was pretty new. Conventions were just settling into place.

Usability testing has been around since at least the 1980s, but began to be widely practiced about the same time Nielsen and Molich published their heuristic evaluation method. While usability testing probably needs some updating as a method, the basic process still works well. It is pretty inexpensive to get access to users. User interfaces to technology are everywhere. For many of the applications of technology that I test, users don’t need special training.

Heuristic evaluation may help a team know whether their UI complies to someone else’s guidelines. But observing people using a design in a usability test gives a team primary data for making design decisions for their users using their design – especially in a world evolved far beyond command line entry and simple GUIs to options like touch screens, web 2.0, and ubiquitous connectivity. Separately and in combination these and other design decisions present subtle, complex problems of usability. For me, observing people using a design will always trump an inspection or audit for getting solid evidence to determine a design direction. There is nothing like that “ah ha!” moment when a user does something unexpected to shed light on how well a design works.

Saturday, March 7, 2009

What counts: Measuring the effectiveness of your design

Let’s say you’re looking at these behaviors in your usability test:
  • Where do participants start the task?
  • How easily do participants find the right form? How many wrong turns do they take on the way? Where in the navigation do they make wrong turns?
  • How easily and successfully do they recognize the form they need on the gallery page?
  • How well do participants understand where they are in the site?
How does that turn into data from which to make design decisions?

What counts?

It’s all about what counts. What did the team observe that shows that these things happened or did not happen?

Say the team does 10 individual usability test sessions. There were 5 major “scavenger hunt” tasks. Everyone has their own stack of yellow stickies that they’ve written down observations on. (Observations of behavior, only – there should be no interpreting, projecting, guessing, or inferring yet.) Or, say the team has kept a rolling issues list. All indications are that the team is in consensus about what happened.

Example 1: Entry points
Here’s an example. For the first task, Find an account open form, the first thing the team wanted to observe for was whether participants started out where we thought they should (Forms), and if not, where participants did start.

The data looked like this:

Seven of the 10 started out at Forms – great. That’s what the team expected based on the outcomes of card sorts. But 3 participants didn’t. But those 3 all started out at the same place. (First inference: Now the team knows there is strong scent in one link and some scent in another link.)

Example 2: Tracking navigation paths – defining “wrong turn”

Now, what about the wrong turns? In part, this depends on how the team defines “wrong turn.”


What you’re finding out in exploratory tests with early designs is where users go. Is that wrong? Not necessarily. Think of it in the same way that some landscapers and urban planners do about where to put walkways in a park. Until you can see where the traffic patterns are, there’s not a lot of point in paving. The data will tell you where to put the paths outside where the team projects the path should be.


As each session goes on, the team tracks where participants went. The table below actually tracks the data for multiple issues to explore:


  • How many wrong turns do they take on the way?
  • Where in the navigation do they make wrong turns?
  • How easily and successfully do they recognize the form they need on the gallery page?

The data looked like this:




Everyone ended up at the right place. Some participants even took the path that the team expected everyone to take: Forms / Account Open / Form #10.

But the participants who started out at Products had to go back to the main navigation to get to the right place. There’s a decision to make. The team could count those as “wrong turns” or they could look at them as a design opportunity. That is, the team could put a link to Forms on the Product page – from the point of view of the user, they’re still on the “right” path and the design has prevented the user from making a mistake.

Account Open is a gallery page. Kits is the beginning of a wizard. Either way, the right form is available in the next step and all the participants chose the right one.

Measures: Everything counts
So, how do you count what counts? The team counted errors (“wrong turns”) and task successes. How important are the counts? The team could have gone with their impressions and what they remembered. There’s probably little enough data to be able to do that. In smaller tests, your team might be comfortable with that. But in larger tests – anything over a few participants – observers typically remember the most recent sessions the best. Earlier sessions either fade in memory or the details become fuzzy. So tracking data for every session can keep the whole team honest. When there are numbers, the team can decide together what to do with them.

What we saw
This team learned that we got the high-level information architecture pretty close to right – most participants recognized where to enter the site to find the forms. We also learned that gallery pages were pretty successful; most participants picked the right thing the first or second time. It was easy to see all of this in tracking and counting what participants did.