logo
HomeTrainingPublicationsUsability ToolsAbout
Evidence-Based Information, Training and Tools for Optimizing the Usability of Computer Systems

‘Concurrent’ vs. ‘Retrospective’ Comments

by Dr. Bob Bailey

July, 2003

 

Introduction

Verbal reports by performance test participants provide a “window” into certain aspects of human information processing. Participants can verbalize information that might otherwise be unavailable to testers, such as their level of motivation, their feelings, and their chain of reasoning.

There are two ways to have test participants provide verbal observations for testers. They can report incidents when they happen (concurrent), or they can report observations after having completed all tasks (retrospective). In the retrospective condition, participants are usually reminded of usability issues by watching a video of the testing session. The video shows selected screen images with the participant’s face overlaid in one corner, and includes all room noises, such as mouse clicks, typing sounds, and any spontaneous participant utterances.

There are two potential problems with concurrent reporting. First, concurrent reporting forces users to closely examine a certain portion of the interface, which may affect their performance on subsequent tasks. Second, having participants report problems immediately may interfere with the collection of performance data, such as the time to complete a task. However, Wright and Converse (1992) compared a ‘verbalization’ condition with a ‘silent’ condition, and found that concurrent verbalizations did not affect either the time to perform an activity or the number of errors made. They suggested that verbalizations directly available in working memory may not affect task performance.

Studies

A few years ago, Victoria Bowers and Harry Snyder (1990) at Virginia Polytechnic Institute compared the comments made while performing (concurrent) and the comments made while viewing a video of their performance (retrospective). They found no performance or preference differences. However, designers judged the retrospective statements as being more useful because they were more explanatory than concurrent observations.

Ken Ohnemus and David Biers (1993) at the University of Dayton also studied user verbalizations during testing. They had participants make comments while performing (concurrent), immediately after completing a test while viewing a video (retrospective), or 24 hours after the test while viewing a video (retrospective-delayed). They found no reliable differences in user comments among the three conditions. Again, the designers judged the retrospective observations as having more value.

Miranda Capra (2002) at the Virginia Polytechnic Institute evaluated the quantity and quality of comments made by test participants. Half made their comments while working on test items (concurrent), and half commented while watching a video of their performance after completing the task (retrospective). Consistent with previous research, she found no reliable performance or preference differences between the current and retrospective reporting.

She also reported that many participant observations tended to be positively biased, suggesting that participants were making their comments after successfully figuring out a difficult task, not while struggling to solve a problem.

Page and Rahimi (1995) conducted a study that compared concurrent and retrospective observations by test subjects. They suggested that the observations derived from concurrent reporting were most valuable for issues such as link naming or determining the best wording for page instructions or error messages. On the other hand, as participants had more experience with a system during testing, they eventually began to understand what they were doing wrong. The authors suggested that retrospective reporting was probably best for better understanding more complex problems. This was because users were unable to articulate why they were having problems, while they were trying to resolve the problems.

Conclusions

What can we learn from these studies?

  • There are no differences in the majority of comments made concurrently and those made retrospectively.
  • Collecting concurrent comments does not appear to slowdown test participants, or cause them to make more errors.
  • Valid comments can be collected up to 24 hours after a test.
  • Participants tend to comment on their successes and not on their ‘struggles’ (positively biased).
  • Concurrent comments are better for certain design problems, such as link naming.
  • Retrospective comments are more valuable for helping to resolve complex usability issues.


References

Bowers, V.A. and Snyder, H.I. (1990), Concurrent versus retrospective verbal protocol for comparing window usability, Proceedings of the Human Factors and Ergonomics Society 46th Annual Meeting, 1270-1274.

Capra, M.G. (2002), Contemporaneous versus retrospective user-reported critical incidents in usability evaluation, Proceedings of the Human Factors and Ergonomics Society 46th Annual Meeting, 1973-1977.

Ohnemus, K.R. and Biers, D.W. (1993), Retrospective versus concurrent thinking-out-loud in usability testing, Proceedings of the Human Factors Society 37th Annual Meeting, 1127-1131.

Page, C., and Rahimi, M. (1995), Concurrent and retrospective verbal protocols in usability testing: Is there value added in collecting both? Proceedings of the Human Factors and Ergonomics Society 39th Annual Meeting, 223-227.

Wright, R.B. and Converse, S.A. (1992), Method bias and concurrent verbal protocol in software usability testing, Proceedings of the Human Factors Society 34th Annual Meeting, 1285-1289.

Home|Training|Publications|Usability Tools |About

Contact Dr. Bob Bailey at (801) 201-2002 or bob@webusability.com
Copyright 2002 - 2005