Heuristic evaluation as described here (Nielsen and Molich 1990; Nielsen 1992c, 1993b), however, is a systematic inspection of a user interface design for usability (Mack and Nielsen 1993; Nielsen and Mack 1993). The goal of heuristic evaluation is to find the usability problems in a user interface design so that they can be attended to as part of an iterative design process. Heuristic evaluation involves having a small set of evaluators examine the interface and judge its compliance with recognized usability principles (the "heuristics").
In principle, individual evaluators can perform a heuristic evaluation of a user interface on their own, but the experience from several projects indicates that any single evaluator will miss most of the usability problems in an interface. Averaged over six projects (Molich and Nielsen 1990; Nielsen and Molich 1990; Nielsen 1992c, 1993b), single evaluators found only 35% of the usability problems in the interfaces. However, since different evaluators tend to find different problems, it is possible to achieve substantially better performance by aggregating the evaluations from several evaluators. Figure 1 shows the proportion of usability problems found as more and more evaluators are added. The figure clearly shows that there is a nice payoff from using more than one evaluator, and it would seem reasonable to recommend the use of about five evaluators, and certainly at least three. The exact number of evaluators to use would depend on a cost-benefit analysis, and more evaluators should obviously be used in cases where usability is critical or when large payoffs can be expected due to extensive or missioncritical use of a system.

In a user test situation, the observer (normally called the "experimenter") has the responsibility of interpreting the user's actions in order to infer how these actions are related to the usability issues in the design of the interface. This makes it possible to conduct user testing even if the users do not know anything about user interface design. In contrast, the responsibility for analyzing the user interface is placed with the evaluator in a heuristic evaluation session, so a possible observer only needs to record the evaluator's comments about the interface, and does not need to interpret the evaluator's actions.
Two further differences between heuristic evaluation sessions and traditional user testing are the willingness of the observer to answer questions from the evaluators during the session and the extent to which the evaluators can be provided with hints on using the interface. For traditional user testing, one normally wants to discover the mistakes users make when using the interface, and the experimenters are therefore reluctant to provide more help than absolutely necessary. Also, users are requested to discover the answers to their questions by using the system rather than having them answered by the experimenter. For the heuristic evaluation of a domain-specific application, it would be unreasonable to refuse to answer the evaluators' questions about the domain, especially if non-domain experts are serving as the evaluators. On the contrary, answering the evaluators' questions will enable them to better assess the usability of the user interface with respect to the characteristics of the domain. Similarly, when evaluators have problems using the interface, they can be given hints on how to proceed in order not to waste precious evaluation time struggling with the mechanics of the interface. It is important to note, however, that the evaluators should not be given help until they are clearly in trouble and have commented on the usability problem in question.
Typically, a heuristic evaluation session for an individual evaluator lasts one or two hours. Longer evaluation sessions might be necessary for larger or very complicated interfaces with a substantial number of dialogue elements, but it is likely that it would be better to split up the evaluation in several smaller sessions, each concentrating on a part of the interface.
During the evaluation session, the evaluator goes through the interface several times and inspects the various dialogue elements and compares them with a list of recognized usability principles. These heuristics are general rules that seem to describe common properties of usable interfaces. In addition to the checklist of general heuristics to be considered for all dialogue elements, the evaluator obviously is also allowed to consider any additional usability principles or results that come to mind that may be relevant for any specific dialogue element.
In principle, the evaluators decide on their own how they want to proceed with evaluating the interface. A general recommendation would be that they go through the interface at least twice, however. The first pass would be intended to get a feel for the flow of the interaction and the general scope of the system. The second pass then allows the evaluator to focus on specific interface elements while knowing how they fit the larger whole.
Since the evaluators are not using the system as such (to perform a real task), it is possible to perform heuristic evaluation of user interfaces that exist on paper only and have not yet been implemented (Nielsen 1990d). This makes heuristic evaluation suitable for use early in the usability engineering lifecycle.
If the system is intended as a walk-up-and-use interface for the general population or if the evaluators are domain experts, it will be possible to let the evaluators use the system without further assistance. If the system is domain-dependent and the evaluators are fairly naive with respect to the domain of the system, it will be necessary to assist the evaluators to enable them to be able to use the interface. One approach that has been applied successfully is to supply the evaluators with a typical usage scenario (Carroll and Rosson 1990; Clarke 1991; Nielsen 1990d), listing the various steps a user would take to perform a few realistic tasks. Such a scenario should be constructed on the basis of a task analysis of the actual users and their work in order to be as representative as possible of the eventual use of the system.
The output from using the heuristic evaluation method is a list of usability problems in the interface, annotated with references to those usability principles that were violated by the design in each case in the opinion of the evaluator. Heuristic evaluation does not provide a systematic way to generate fixes to the usability problems or a way to assess the probable quality of any redesigns. However, because heuristic evaluation aims at explaining each observed usability problem with reference to established usability principles, it will often be fairly easy to generate a revised design according to the guidelines provided by the dialogue principle that was violated. Also, many usability problems have fairly obvious fixes as soon as they have been identified.
For example, if the problem is that the user cannot copy information from one window to another, then the solution is obviously to include such a copy feature. Similarly, if the problem is the use of inconsistent typography in the form of upper- and lower case formats and fonts, the solution is obviously to pick a single typographical format for the entire interface. Even so, even for these simple examples the designer has no information to help design the exact changes to the interface (for example, how to enable the user to make the copies or which of the two font formats to standardize).
One possibility for extending the heuristic evaluation method to provide some design advice is to conduct a debriefing session after the last evaluation session (Nielsen 1993b). The participants in the debriefing should include the evaluators, any observer used during the evaluation sessions, and representatives of the design team. The debriefing session would mostly be conducted in a brainstorming mode and would focus on discussions of possible redesigns to address the major usability problems and general problematic aspects of the design. A debriefing is also a good opportunity for discussing the positive aspects of the design, since heuristic evaluation does not otherwise address this important issue.
Heuristic evaluation is explicitly intended as a "discount usability engineering" method (Nielsen 1989b, 1990a). Independent research (Jeffries et al. 1991) has indeed confirmed that heuristic evaluation is a very efficient usability engineering method, and one recent case study found a benefit-cost ratio for a heuristic evaluation project of 48, with the cost of using the method being about $10,500 and the expected benefits being about $500,000 (Nielsen 1993b). As a discount usability engineering method, heuristic evaluation is not quaranteed to provide "perfect" results or to find every last usability problem in an interface.