Sunday, November 28, 2010

Usability testing is broken: Rethinking user research for social interaction design

How many of you have run usability tests that look like this: Individual, one-hour sessions, in which the participant is performing one or more tasks from a scenario that you and your team have come up with, on a prototype, using bogus or imaginary data. It’s a hypothetical situation for the user, sometimes, they’re even role-playing.

Anyone? That’s what I thought. Me too. I just did it a couple of weeks ago.

But that model of usability testing is broken. Why? Because one of the first things we found out is that the task we were asking people to do - doing some basic financial estimates based on goals for retirement - involved more than the person in the room with me.

For the husbands, the task involved their wives because the guys didn’t actually know what the numbers were for the household expenses. For the women, it was their children, because they wanted to talk to them about medical expenses and plans for assisted living. For younger people it was their parents or grandparents, because they wanted to learn from them how they’d managed to save enough to help them through school and retire, too.

There’s a conversation there. There’s a support network there. And that’s what’s broken about usability testing. It always has been.

I first started thinking about this when Google launched Buzz. Buzz used Gmail users’ contacts to automatically generate online social networks to connect users’ most frequent contacts. Google employees - 20,000 of them - had been using Buzz inside the garden walls for a year. A nice, big sample. The problem became evident, however, when Buzz was let into the wild -- almost immediately. One example: A blogger who calls herself Harriet is one of the most famous cases. She wrote about how one of her most frequent correspondents in Gmail was her boyfriend. Another was her abusive ex-husband. Now they were publicly connected, and this made her very, very unhappy. In fact, the post was titled, Fuck You, Google.

There might have been no harm done in the retirement planning study. But there might. Would the 31-year-old who broke down crying in the session because her mother was in late-stage ALS have had a better experience if we’d tested in her context, where she could work with her closest advisor - her dad? Might it have been a calming process, where she felt in control and became engaged in envisioning her independent future because someone she trusted could give her perspective that I could not? Maybe.

As for Buzz, Harriet certainly wasn’t pleased, and she was left with a mess to clean up. How to unconnect two people who were now connected?

When companies started doing usability testing with regularity in the 1980s, it was about  finding design problems in what now look like fairly simple UIs that frustrated or hindered users. It was one person, one machine, as the human did a usually work-based task. That’s why it was called computer-human interaction.

But today, technology large and small is fully integrated into peoples’ lives in a much more ephemeral, less compartmentalized way. It is rare to sit next to a corded phone holding the handset only talking and listening.

When you look at what the social web is, there are some characteristics that I think we’re not taking into account very well in doing usability tests:

- It’s about relationships among people
- In context
- Conducted fluidly, across time across time and space, continuously

I also think that people who are new to usability testing and user research are not going to do a good job of testing social interaction design, because constructing a study cannot be very formal or controlled. Measuring what’s happening is much more complex. And scope and scale make a difference. Testing Buzz with 20,000 Googlers for a year wasn’t enough; it took letting it out to a million people who hadn’t drunk the Koolaid to find the *real* problems, the real frustrations, the real hindrances that truly affect uptake and adoption.


The nature of online is social
Let’s back up and talk about a key definition. What I mean by “social” is anything that someone does that changes the behavior of someone else.

This is how I can say that being online *is* social. Email is social. Publishing a flat HTML document is social. Putting something on a calendar is social. Everything is social. Social isn’t the special secret sauce that you pour on top of an experience. Social is already there. Choosing a bank is social. Planning a vacation is social. Buying an appliance is social. I SMSd a series of photos to my boyfriend the other day, of me in different eyeglass frames because I couldn’t decide by myself. This was *not* a computer-centered, or an app-centered interaction. This was a decision being made by two people, a conversation, mediated by fully integrated technology in fluid activities in different contexts for two people. It was social.

Social isn’t sauce. It’s sustenance. It’s already there, and we’re not seeing it. So we’re not researching it, and we’re definitely not testing for it.

We have to stop thinking about human-computer interaction. That model by default is too limiting. Look around. It’s really about human relationships and interactions mediated by technology. Technology is supporting the communication, not driving it. Ask any parent who has used Facetime or Skype to have a video chat with their baby for the first time.

Scale is the game changer

Discount usability testing is great for some things, but what we’re really studying when doing user research and usability testing for the social web is social behavior. And that takes scale. That takes connections. That takes observing people’s real networks and understanding what makes those work, what makes those friends, family, colleagues, neighbors, acquaintances, associates, clients, vendors, pen pals, drinking buddies, partners for life, or friends with benefits.

Those are rich, life-framing relationships that affect how someone interacts with a user interface that most of us are not even scratching the surface of when when we “micro-test” a commenting feature on an online invitation web site.

“Task” doesn’t mean what you think it means

For the retirement planning tool, I did a little interview to start the session that I hoped would set some context for the participant to do the behavior that I wanted to observe. But it was woefully inadequate. Don’t get me wrong, the client wasn’t unhappy; they thought it was a cool technique. But as soon as I learned who the participant went to for financial advice, where was I? Putting the participant in a situation where they had to pretend. They did, and they did a fair job of it. But it was lacking.

But tasks are the wrong unit. What we’re asking people to do in usability tests is like attending a cocktail party while grocery shopping. Even with an interview, even with careful recruiting, it’s incongrous. There are very few discrete tasks in life. Instead there are activities that lead people to goals. Multiple activities and goals might be intermixed on the way to achieving any one of them. In the meantime, the technology is completely integrated, ambient, almost incidental. LIke asking your bf which eyeglass frames look nerdy-sexy, versus just nerdy.

The activity of interest isn’t computer based. Look at retirement planning. It’s *retirement planning*! That’s not the task. The activity is planning for the future, a future in which you have no real idea of what is going to happen, but you have hopes, aspirations.

Using Buzz is not a task. It’s not an activity. People who use Buzz don’t have the goal of connecting to other people, not in that deliberate way. They’re saying, hey, I’ve read something interesting you might be interested in, too. The task isn’t “sharing.” It’s putting  themselves out in the world hoping that people they care about will notice. How do you make that scenario in a usability test?

Satisfaction may now equal user control

The ISO measures of usability are efficiency, effectiveness, and satisfaction. What is effective about having Tweetdeck open all day long while you’re also writing a report, drafting emails, taking part in conference calls, attending virtual seminars, going back to writing a report, calling your mother?

When the efficiency measure went into the ISO definition, most people were measuring time on task. But if you don’t have a discrete task, how do you measure time?

Satisfaction may be the most important thing in the end for the social web, and that may be the degree to which the user feels she has control of the activities she’s doing while she’s using your tool. How much is the UI forcing her to learn the tool, versus quickly integrating it into her life?

Measuring success in the social web often defies what we’ve been taught to count for data. How do you measure engagement in the social web? Is it about time on the site? I could lurk on Twitter.com or Facebook all day. Am I engaged? Is it about minutes spent pursuing and perusing content? Is it about how likely someone is to recommend something to someone else? I wrote my first product review, EVER last week, for a pair of jeans on the Lands End web site. Am I engaged with the site? I would say no.

We have to look hard at the goodness of conventional metrics. They’re not translating to anything meaningful, I don’t think, because we’ve been thinking about all this all wrong - or not enough. What is goodness to a user? Control of her life. Control of her identify. Control of her information.

Users are continuously designing your UI

What does Task mean, what does Success mean, how do you measure new features that users create with your UI on the fly? Twitter has hashtags and direct messages. Users created those. Facebook is continuously being hacked for fresh activities. Look at commenting systems on blog posts or articles. Spammers, yes, but people are also talking to one another, arguing, flirting, solving problems, telling their own stories. No matter what you build, and what your intentions were in designing it, users are going to hijack it to make it useful to them. How do you test for that?



Ideas from smart people
I had all these questions and more when I met with a bunch of smart people who have been working in researching the social web. Out of that discussion came some great stories about what people had tried and worked, and what had not worked so well.

For creating task scenarios for usability tests, getting participants to tell stories of specific interactions helped. Doing long interviews helped learn context, scope, priorities, connections. Getting people to talk about their online profiles and explain relationships helped set the scene for activities. Getting them to use their own log-ins with their real relationships helped everyone know whether the outcomes were useful, usable, and desirable. Whether the outcomes were satisfying and even enriching.

Some of the people in this informal workshop also offered these ideas:
  • Screen sharing with someone outside the test lab or test situation
  • Making video diaries and then reviewing them retrospectively
  • Developing and testing at the same time, with users
  • Including friends or other connections in the same test session, setting up multi-user sessions
  • Sampling the experience in the same way that Flow was discovered: prompting people by SMS at select or random moments to ask people to report their behavior

There’s also bodystorming, critical incident analysis, co-designing paper or other prototypes. A few things seemed clear through that discussion. To make user research and usability testing useful to designers, we have to rethink how we’re doing it. It’s got to reflect reality a bit better, which means it takes more from social science and behavioral science than psychology. It takes more time. It takes more people. It takes a wider view of task, success, and engagement. And we’re just beginning to figure all that out.

Rethink research and testing

Everything is social. Scale is the game changer. Tasks aren’t what you think they are. User satisfaction may be about control. Users are continuously designing your UI. I invite you to work with me on rethinking how we’re doing user research and usability testing for what’s really happening in the world: fluid, context-dependent, relationships mediated by technology.

I want to thank Brynn Evans, Chris Messina, Nate Bolt, Ben Gross, Erin Malone, and Jared Spool for spending the better part of a day talking with me about their experiences in researching social. These musings come from that cooperative, ahem, social effort. 

23 comments:

  1. I agree with your assessment that research needs to evolve with how technology has evolved in our customer's lives. Traditional practices only get us part of the picture and that part may be inaccurate and ineffective.

    I believe this is also applies to personas. It's difficult for me to find a single-person persona useful in the areas that I research -- entertainment. Every time I encounter them I'm left with questions about how other people in their lives affect their decisions and their activities. So much so that I want to see the persona in the context of their influential circle.

    Our methods need to evolve and we should feel free to try new approaches to do so, but we may not always be able to get the extra time that is required to do them well. Experimenting with new techniques and finding ways to save time with them will be important in changing how this kind of research is adopted.

    I would love to hear more about the ideas shared with you and an opportunity to share my own. Are you planning a workshop or discussion at an event soon?

    ReplyDelete
  2. Hi Crystal,

    Did someone set you up to ask about a talk? In any case, thanks.

    I did a brief talk about this at User Research Friday in San Francisco on November 19. Someday, there'll be video of it online. (Though I'm not sure that's a good thing.)

    Inspired to expand the URF talk, and get some input from other smart people, I put in a proposal to the IA Summit for 2011.

    There seems to be a lot of interest in the topic whenever I bring it up. After the day I had with the smart people I mentioned in the post, I also interviewed user researchers from Twitter and Google. They had some awesome ideas for approaching user research for SxD, as well. I've been keeping a little notebook with methodological goodies in it.

    Maybe there's a book in this...

    ReplyDelete
  3. That's quite a thought provoking article.

    These days we do mostly remote testing - so we can test people where and when it suits them, but getting face time is important too. I guess like with everything in life it's about striking a balance, and getting the right mix of information for whatever project you happen to be working on.

    We run our own usability tool (http://intuitionhq.com) and one of the more interesting things about that from my perspective is that I get to watch and learn how people interact, what kind of information they are looking for, and the ways they go about gathering information.

    Thanks again for sharing, really an interesting article.

    ReplyDelete
  4. Great post! I agree completely. We absolutely need to stop making participants pretend. The best tasks are "created" by the participants.

    It would be great if clients would dedicate more time and participants for research studies, as you stated, but I fear that we would become a bottleneck -- a very expensive one too. Finding a way to provide real value through REAL data and still be affordable (in time and $) is key.

    ReplyDelete
  5. You can definitely count on me for attending that IA Summit talk in whatever form it takes ;) I love this topic!

    ReplyDelete
  6. Jacob,

    Thanks for saying those nice things. These are thoughts that have been percolating in my brain for a long while. It was great to talk to smart people and to get reactions at User Research Friday. The reaction to this blog post has been surprising, though, I must say!

    The striking of balances is how we've rationalized doing what we do for a long time. I think getting time with users in their own time in their own place is a terrific step toward building a better, more realistic situation for an evaluation.

    However, I think one of the problems we've created for ourselves in striking those balances is what we are unknowingly leaving out that causes real problems for real people down the road -- especially as more people become more aware of protecting their privacy and security online and off. As researchers and testers, we're making trade offs for the convenience of the business or the design team, often without thinking of what the real life implications of those trade offs are.

    Thanks for writing. I'm glad you're out there.

    Dana

    ReplyDelete
  7. Hi Kyle!

    I appreciate your comment - and the constraints that UX researchers face. I also think it is up to us to not sell out when the design team shrugs off the importance of observing users themselves. It's awesome that so many organizations are including testing in their design cycles. But we have to stop usability testing being treated like QA. This is very hard. I get that.

    The teams I see succeeding at making evaluation more realistic manage to do it because they're doing research and testing continuously. It's not a one-off event that interrupts or is separate from "design".

    I think the rest of us can do it in small steps. As you suggest, eliciting "tasks" from users rather than coming in with pre-fab ones is huge. But it is difficult. It takes practice to do very well. And it might mean getting involved in "pre-research" activities like recruiting (a thing I could go on about for another 2,000 words pretty easily).

    By the way, affordability is a strategic thing as well as a tactical thing. What was the cost to Google rolling out Buzz the way it did? Regardless of the monetary count, that episode degraded the credibility and trust that many users had with Google. Over the long term, that's a much more costly problem than a $15,000 or $50,000 usability test.

    Thanks for reading. Keep up the good work!

    Dana

    ReplyDelete
  8. Crystal,

    Well, I hope to get on the IA Summit '11 program with this topic. It seems there's interest from organizers. But you never know what happens during the review and selection process.

    In any case, I'll see you there. We can have a "breakout" over a drink or two.

    In the meantime, your blog post here - http://www.kubitsky.net/archives/322 - is a wonderful addition to the conversation. Thanks for that excellent, thoughtful response.

    Dana

    ReplyDelete
  9. You're right, testing is a social context - and everything is social - it's hard.

    For me the main problem is about scale, or as I prefer saying, it's that it's a complex system. And for that reason there's no deterministic, simple approach to the problem.

    I'm not an expert about complex systems by themselves, but I'm studying those mainly as social environments and the first thing you always see cited is about seeing the weak signals that are emerging. But the system needs to be live and running: that's because otherwise you'll work in a not-complex, not-social system. And that wouldn't be the same. It's like using Twitter alone.

    So even if I'm not sure, I'm toward thinking that there's not "testing" possible. You either roll-out or not. And that's exactly how Facebook does things for example (but as well as other companies, like Google, even if they aren't social).

    Still, testing shouldn't be thrown away, because it solves the layers existing even before social dynamics kicks in. For example, I found that using the Social Usability Checklist we discussed some time ago and doing tests with a user interacting with another person online (simulating another social network user) are indeed useful to check the "connecting tissue".

    But once the connecting tissue is there, you need to release and pay attention to the weak signals of the complex system you're working on.

    This topic connects perfectly to agile methodologies (small iterations) and management techniques (small teams, release often, release early, and test, test, test). But that's probably a whole other topic, I'm just saying that everything is connected and the change should happen on multiple layers, where "testing" is just one of those. :)

    ReplyDelete
  10. Great post Dana!
    Both the challenges and how much we could be learning from new ways of constructing research. Thanks so much for pulling this together here, I vote for hearing more from you about it at IA Summit '11! We have definitely seen challenges like this in researching collaborative applications.

    ReplyDelete
  11. Very interesting and thought-provoking. I think you're right; we need to test products and services in a context that matches where they'll be used - or actually *is* the context where they'll be used.

    I believe that there is still *huge* value in focused 1:1 testing to identify points of the experience where things aren't clear, satisfying - where we, as designers, have made assumptions.

    Very interesting :)

    ReplyDelete
  12. Dana, argghhh! "But that model of usability testing is broken." No it is not broken, any more than a hammer used to hit someone in the head is "broken" -- it is just a tool that can be misapplied. I fear you've been reading too much from some guy named Spool who told a roomful of 1000 "beginners" at SXSW-I in 2009 that the era of user-centered design is dead, it proven ineffectual. No way - UCD is alive and well. Now, it can be misapplied, just as one-on-one usability testing of user interfaces can be (and often is, when we claim to be testing "usability" when we're actually testing "learnability"). It can be leaned on too much. There are many problems it can't solve -- how to julienne fries being just one. But the need for an expanded world view, and the concomitant need for new tools doesn't dismiss the need for traditional usability testing, just as the development of nail guns didn't preclude the need for . . . oh, hell, even I am tired of that analogy.

    "I also think that people who are new to usability testing and user research are not going to do a good job of testing social interaction design, because constructing a study cannot be very formal or controlled. Measuring what’s happening is much more complex."

    Absolutely. But I would bet for them to get good at the testing of social interaction design they'll have to be good at fundamental usability testing. (There's a great book on this topic I might point you towards!) (Hmm, which analogy should we lean on, here? The best expressionists being good, first, at portraiture? The best free-verse poets having first labored under the strictures of rhyme and metre?) Just yesterday the students in my Advanced Usability class were pitching their final projects, and a couple had discovered (pioneered?) some pretty creative ways to get good usability data about social networking systems. But they were all well prepared for this by having read your book, and others, about the fundamentals of testing human subjects.

    "We have to stop thinking about human-computer interaction. That model by default is too limiting." Because it is "limiting" doesn't mean we stop thinking about it. There are many questions yet to be answered about HCI (an individual H I-ing with a C). And indeed EVERY social interaction of which you speak that includes some piece of technology will STILL entail HCI -- you canNOT get the social interaction right if an individual cannot interact with his/her piece of technology. So say it is limiting. Say it doesn't cover the new, social waterfront. But we "stop thinking about it" at our peril

    “'Task' doesn’t mean what you think it means." I humbly submit that you have no idea what I think -- well, except for all the foregoing.

    Warmly.

    Randolph.

    ReplyDelete
  13. I would also comment that it feels that we are back loading too much work into the "usability" part of the process.
    There's a stack of work that should be done upfront at the "design research" end of the process which would provide more complete tasks and context - and hopefully better designs that include the social dimensions - so that the issues don't have to be resolved in usability.
    Simulating these situations in order to usability test the efficiencies and effectiveness is really best done in the real context of use where people have access to the resources they normally have available (and in an ideal world the time they would normally take to do these things).
    But IMHO the reality is that the social stuff should have been figured out long before you hit testing.

    ReplyDelete
  14. Thanks, everyone for your comments. It's good to hear from you all!

    ReplyDelete
  15. Dear Randolph,

    I'm honored that you've reacted to my article about rethinking usability testing for social interaction design. 

    Knowing your love of analogy, let's go with the classic. I'm not saying that we shouldn't do "traditional usability testing." That tool is good for what it is. I'm saying that we're using that hammer so much we've worn off the head and we're banging on the nail with the handle. 

    And now the nail has changed to a screw. 

    I said that usability testing is broken. If usability testing is a hammer, we broke it not because it's not a good tool, but many of us are not thinking about other tools for the expanded carpentry we're facing with fluid, ambient use of technology.

    The screw is social interaction design. As a field, UXers are miserable at testing this because we also don't know how to research it. All of the senior researchers I talked to at all those social network companies admitted that. We're making it all up. Which is what we were all doing in the 1980s, when we borrowed heavily from psychology. 

    But social interaction isn't just psychological. It's behavioral, cognitive, cultural, sociological, political, and bunch of other -ologies. And as a field, I don't think we're doing a good job of looking at those aspects or borrowing from those fields to learn anything about people as they integrate technology in their lives. 

    By the way, your pushing the Learn-the-rules-first approach exposes your academic-osity. But you're right, everyone has to start somewhere. I don't know how you're teaching usability testing at UT-Austin, but I will say that the book you're using, Handbook of Usability Testing Second Edition, is pretty prescriptive. It doesn't even *suggest,* "Hey, if this doesn't answer your research question, try some other techniques!" I helped build that hammer, I'll cop to that. But now I'm saying that technology has evolved, users of technology have evolved, culture has evolved, but in general people working in user experience research have not evolved their methods, and we need to. 

    As for not knowing what you think, I'm not surprised by the content of your reaction at all. And I appreciate it because it's helping me refine my argument. 

    Thanks, Randolph. As always, a stimulating exchange.

    Dana

    ReplyDelete
  16. Last year I used your book. This year I didn't assign a textbook -- rather I had everyone pick a different book, read it, and do a book review of it for others to read. So we had 19 textbooks.

    And yes, I've got some academiosity -- wear it proudly. But it follows 22 years of practice, and is geared towards helping students find and excel at jobs. Knowing rules is good. Understanding the theory and history of practice that undergird the rules is even better -- it enables one to adjust intelligently when the situation changes (say, when the nail turns into a screw). (What kinda weird alchemy are we dealin' with, here?) And so, as I said in my earlier post, we (at least at UT) ARE evolving our methods, to supplement, not replace, usability testing.

    Hope to see you at UPA or the like.

    R.

    ReplyDelete
  17. Great reading.

    Got me thinking about:

    * Stop thinking about the people we design for and with as "lab rats"
    * Get more fluid and creative about how we run our research (improvisation is ok too)
    * Break down research questions into components that allow us to learn from each part towards better design (as opposed to treating research as something where we are required to get all the answers to the universe in one sitting)
    * Design the research to feel more relaxed in order to nurture the good stories and warm interactions from the people we design for and with

    In short, be human :)

    rgds,
    Dan

    ReplyDelete
  18. So really ... usability testing is not broken, it just needs to be rethought and extended for certain situations. But I suppose the soundbite, albeit incorrect, title of the post gets more people to look. Certainly worked for me... athough I now have zero interst in seeing what other overstated claims are made on this blog.

    ReplyDelete
  19. Calum,

    I'm surprised that anyone comes here, and I write the stuff. I'm glad you came, whatever you might think.

    Too bad you felt it necessary to say not-so-nice things.

    I hope you did look around. If you have any interest in usability testing and user research techniques and methods, you might get some ideas here. Or, you might find that you know it all already. In that case, I'd like to learn from you.

    Best,
    Dana

    ReplyDelete
  20. Nah, usability testing isn't broken. Gross overstatement. As other people have said here, what's broken is the misapplication of evaluative usability methods. To use Randolph's analogy, you've changed to a screw but you're still whacking at it with a hammer. The best solution is to get a screwdriver, not to ruin the current useful applications of the hammer by trying to turn it into a screw-driving tool.

    Usability testing is a tool to outline usability problems - i.e. points where someone using an interface can't accomplish a task because something is wrong with the design of the interface. This means a button is missing or mislocated, or information is poorly worded, or things are hard to read, and etc.

    Often we're stuck with UT, because we have a prototype that's already been built without a sufficient understanding of context or needs, and then we're brought in to see whether it works - from a usability perspective. But we're trained to be problem-spotters. What happens, often, is that we start to see gaps in implementation where specific needs or use cases are not addressed. As you've pointed out, usability tests are sorely lacking in context.

    The ability to draw out these types of insights is great - it's one of the cool things about UT - but it really shouldn't be the primary goal of usability testing as an exercise. If we're looking to understand social context of product use, there are better tools to use than usability testing - ethnographic methods, contextual inquiry, interviews, shadowing, diary studies, etc etc. Your last post to Randolph started to get at this, and I agree with course correction.

    So maybe your post should / could be a call for a new hybridized research method, or diversification of backgrounds in our field. The problem here isn't that usability testing is broken - the problem (which may be illustrated by your post) is that usability practitioners, especially those from a software engineering background, don't understand the range of tools available to them, and tend to view UT as a panacea. We need to be broadening our toolkits and knowing when to apply specific methods - or at least understand the limitations of the methods we're required to use given the constraints of a project or situation.

    ReplyDelete
  21. you know what, the simplest things often leave the most profound of impacts on my nerve cells .... when people say what is social, i usually look the other way, knowing fully well, that i am being an anti social, cause the answer is so damn simple .... next time, i am going to quote 'emails are social' .... simple you know, usually does the trick. thanks for writing this piece.

    ReplyDelete
  22. Fascinating thread...rather than either / or thinking in terms of both / and is useful. We won't throw the baby out with the bath water (since we are using metaphors loosely). But, the initiating question is awesome - as others have stated. Foremost - it acknowledges that no matter how individualistic we think we are - we are always reacting / responding / interacting to and with others in our environment. (Including the 'anti-social' comment.) It is an interesting question to ponder how our information gathering systems (aka - research) approaches can / should evolve to gather insights from this 'strange new world.'

    ReplyDelete
  23. Agree with Michelle, quite an interesting post and thread of responses. Indeed, users are constantly changing our UI. It's important to stay on top of how these changes affect us, and try to stay a step ahead with user research. Great post, look forward to more from you Dana.

    ReplyDelete