![]() |
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
User-Derived Impact Analysis as a
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Attribute | Measuring Technique |
Metric | Worst Case |
Planned Level |
Best Case |
Now Level |
|---|---|---|---|---|---|---|
| Initial performance |
Windowing benchmark task |
Work speed |
Same as V1 |
20% > V1 |
3 times V1 |
Same as V1 |
| Initial evaluation |
Attitude questionnaire |
Semantic differential evaluation |
0 | .25 | 1 | -0.5 - 0.5 |
Note: Now levels for evaluation based on existing systems such as reported by Whiteside et al. (1985).
This particular table shows that the planned level for initial performance was a 20% improvement over the V1 interface, as measured by a work-speed metric (Whiteside et al., 1985) on a benchmark task of our devising. The benchmark task involves creating windows, attaching the keyboard to windows, printing the contents of windows, and moving, pushing, and popping windows. Work speed is a measure of the percentage of the benchmark task completed per unit of time.
The planned level for initial evaluation was 0.25 on a scale of -3 to +3, using a semantic differential attitude questionnaire that we devised. The questionnaire was administered after users performed the benchmark task.
The performance (work speed) goal is expressed in terms of improvements over V1, whereas the evaluation goal is expressed in terms of an absolute score on the evaluation questionnaire.
After defining usability and setting planned levels, there were six weeks left in which to make changes to the V2 software. In order to meet the planned levels for V2, the software development group needed timely and useful information about the usability of the existing V1 software.
Impact analysis (Gilb, 1984) is a method of estimating the probability that a set of proposed design solutions (activities) will result in successfully meeting the engineering goals of a project. Impact analysis is a technique for estimating which solutions will be most effective for meeting planned levels for various attributes, as well as estimating the likelihood that the solutions will be sufficient for meeting these attributes. It is an aid for deciding how to allocate scarce engineering resources.
Here we show how an impact analysis may be performed, based not on subjective estimates of the effectiveness of certain solutions, but on data derived from user behavior. User-derived impact analysis provides a method for analyzing user behavior and presenting the results in a form that is useful for engineering groups. User-derived impact analysis involves:
Since the user performance goal for VWS Version 2 was expressed in terms of performance on VWS Version 1, the first step was to measure V1 user performance. The initial usability of the V1 interface was measured by testing six experienced VAX/VMS users, who had never used mouse-oriented windowing software, on a simple windowing benchmark task. Table 2 shows their performance and evaluation scores.
Table 2: VWS V1 Initial Usability
| Time taken |
% of task completed |
Work speed |
Evaluation rating |
|
|---|---|---|---|---|
| S1 | 21:02 | 100 | 14.3 | 2.0 |
| S2 | 54:34 | 87 | 4.8 | 1.2 |
| S3 | 25:20 | 94 | 11.1 | 1.6 |
| S4 | 54:38 | 47 | 2.6 | 1.6 |
| S5 | 40:26 | 94 | 7.0 | 2.0 |
| S6 | 14:40 | 100 | 20.5 | 1.8 |
| Mean | 35 06 | 87 | 10.0 | 1.7 |
| SD | 17:19 | 20 | 6.6 | 0.3 |
Performance was measured using the work-speed performance score introduced by Whiteside et al. (1985, p. 187). In this case, the performance is shown in terms of percentage of the task completed per 3-minute period (3 minutes is the time it takes a well-practiced expert to complete the task). It is computed by multiplying the percentage of the task completed by the 3-minute constant and then dividing by the time taken on the task. High scores represent fast performance.
On the average, these users were able to complete 10% of the task per 3-minute period. Thus for V2, an average rate of 12% of the task per 3-minute period would represent the specified 20% improvement in initial performance.
The mean evaluation score of 1.7 was more favorable than the planned level of 0.25 for V2, indicating that no further work needed to be done in this area. Usability engineering requires that if a planned level of an attribute is demonstrably met, then further work on that attribute is not only unnecessary, but undesirable, because such work would involve taking resources away from product attributes whose planned levels were not yet met.
By reviewing videotapes of the users' sessions, a list of 13 user/computer interaction problems was compiled. These are shown as the left-hand column of Table 3.
Table 3: Time Spent Due to Problems, in Minutes and Seconds
| Problem | S1 | S2 | S3 | S4 | S5 | S6 |
|---|---|---|---|---|---|---|
| Window positioning | 0:38 | -- | 0:21 | 6:51 | 0:10 | 1:42 |
| Menu choice off by 1 | 0:17 | -- | 0:14 | -- | -- | |
| Confused between 2 menus | 0:19 | 0:37 | 0:58 | |||
| Print origin off | 0:39 | 0:23 | 0:22 | |||
| Moving window before login | 0:46 | -- | 0:11 | |||
| Attaching keyboard | 6:41 | 4:44 | ||||
| Obscured help window | 0:30 | 3:02 | ||||
| Pressing near border | 7:39 | 0:40 | ||||
| Inside/outside window | -- | 0:23 | ||||
| Click/press confusion | 1:30 | |||||
| CTRL/S does not light LED | 3:46 | |||||
| Deleting windows | 6:53 | |||||
| Get menu when moving window | 0:55 | |||||
| Total problem time | 2:09 | 14:50 | 3:44 | 18:46 | 8:58 | 2:44 |
| Total task time | 21:02 | 54:34 | 25:20 | 54:38 | 40:26 | 14:40 |
After identifying the problems, the videotapes were viewed again to estimate how much time was spent in each problem. Table 3 also shows these estimates for each subject. Dashes indicate that the subject encountered the problem, but that the amount of time spent due to the problem was negligible.
Table 4 shows the estimated amount of increase in the work-speed performance score for each subject if all of the problems were solved. The estimate is made by subtracting the total problem time from the total task time and computing a new estimated work speed based on the reduced time. This assumes that there would be no interactions among the individual problems and their solutions. That is, it is assumed that solving one problem, say window positioning, would not interact with another problem, say moving window before login. Gilb recommends this simplifying assumption in order to make the impact analysis calculations more practical.
Table 4: Effect of Solving Problems on Work Speed
| Measured work speed |
Est. increment if all problems solved |
New (estimated) work speed |
|
|---|---|---|---|
| S1 | 14.3 | 1.6 | 15.9 |
| S2 | 4.8 | 1.7 | 6.5 |
| S3 | 11.1 | 1.9 | 13.0 |
| S4 | 2.6 | 1.3 | 3.9 |
| S5 | 7.0 | 1.9 | 8.9 |
| S6 | 20.5 | 4.6 | 25.1 |
| Mean | 10.0 | 2.2 | 12.2 |
Overall, these calculations predict a 22% increase in VWS usability as measured by initial work speed. This 22% predicted improvement corresponds closely to the 20% planned level of improvement shown in Table 2.
Table 5 shows the thirteen windowing problems ranked by their impact on the improved work-speed scores when totaled across all six subjects. The window positioning and pressing near border problems have the largest effects on initial usability, and together they account for more than 50% of the total impact.
Table 5: Windowing Problems Ranked by Impact
| Problem | Relative impact on initial use |
V2 software change implemented? |
|---|---|---|
| Window positioning | 32 % | yes |
| Pressing near border | 23 % | yes |
| Attaching keyboard | 7 % | yes |
| Print origin off | 7 % | no |
| Deleting windows | 6 % | yes |
| Click/press confusion | 6 % | no |
| Confused between 2 menus | 5 % | yes |
| Get menu when moving window | 5 % | no |
| Menu choice off by 1 | 3 % | yes |
| CTRL/S does not light LED | 2 % | no |
| Inside/outside window | 2 % | no |
| Obscured help window | 1 % | yes |
| Moving window before login | 0 % | yes |
This rank-ordered list is an user-derived estimate of the percentage impact of various design solutions on the goal of improved initial work speed. It corresponds to a single row in a Gilb impact analysis table.
This completed the user-derived impact analysis portion of the project.
Once the evaluation of the V1 software was completed and the ranked list of V1 windowing problems was produced, the information was presented to the VWS engineering group. The engineering group then evaluated the merits of various technical solutions to the specified windowing problems, based on both the cost and potential impact of implementing the solution. Once this analysis was completed, those solutions with the highest relative impact and the smallest demand on engineering resources were incorporated into the V2 software. The right-hand column in Table 5 shows those problems for which specific software solutions were made. Notice that while not all of the recommended changes were made, four of the top five were.
Usability engineering requires demonstration that the planned levels of usability attributes have indeed been achieved. We tested an early version of the revised V2 software to see if the software met the planned levels of usability. Six subjects were run on VWS V2 using the same procedure as for the V1 subjects. Again, all subjects were experienced users of the VAX/VMS operating system, but none had used the VAXstation I or other mouse-oriented systems before.
Table 6 shows the work.-speed and attitude scores for the VWS V2 subjects, and the means and standard deviations for both the V1 and V2 subjects. The mean performance score for the V2 subjects is 37% better than the mean for the V1 subjects: 10.0 for VI., 13.7 for V2. All of the VWS subjects, both for Vi and V2, gave the system a positive evaluation, with mean evaluation scores of 1.7 for V1 and 1.3 for V2.
Table 6: VWS V2 Initial Usability
| Time taken |
% of task completed |
Work speed |
Evaluation rating |
|
|---|---|---|---|---|
| S7 | 18:06 | 100 | 16.6 | 0.8 |
| S8 | 25:22 | 93 | 11.0 | 1.0 |
| S9 | 15:48 | 94 | 17.8 | 1.4 |
| S10 | 21:00 | 93 | 13.3 | 1.8 |
| S11 | 40:20 | 100 | 7.4 | 2.0 |
| S12 | 18:26 | 100 | 16.3 | i.0 |
| V2 Mean | 23:10 | 97 | 13.7 | 1.3 |
| V2 SD | 9:01 | 4 | 4.0 | 0.5 |
| V1 Mean | 35:06 | 87 | 10.0 | 1.7 |
| V1 SD | 17:19 | 20 | 6.6 | 0.3 |
The 37% improvement in initial work speed from V1 to V2 is almost twice the amount of improvement planned. While evaluation scores declined slightly from V1 to V2, the initial evaluation of 1.3 is still much higher than the planned level of 0.25. Both initial usability goals for the software were met before field test.
Software changes were made in response to eight of the thirteen interface problems reported in the V1 software. Since not all suggested changes were made, we would have estimated that these changes would lead to a 17% improvement in initial work speed. This is less than half of the 37% improvement actually observed. The discrepancy is not surprising given that this was our first attempt to quantitatively estimate the effect of software changes on usability. One possible explanation is that we estimated the change in time needed to complete the task, but did not estimate changes in the percentage of the task completed.
Most of the problems that had bothered V1 users, and had been targets of change in the interface, did not recur in the V2 testing.
This paper has described a method for evaluating the predicted effectiveness of a set of solutions on a set of goals. The method is based on the impact analysis of Gilb, but seeks to ground the estimates in actual user-performance data.
The VWS development organization responded very favorably to the user-derived impact analysis. It presents proposed changes in a rational way, defusing many of the issues of personal taste that often cloud user-interface design efforts. Initial goal-setting allows a shared definition of success with respect to usability, and impact analysis indicates what steps appear to be necessary to achieve success. The developers traded off usability against ease of implementation: not all the changes were made. However, this was done with full awareness of the likely impact on the usability goals. Using usability engineering with user-derived impact analysis, the developers met these goals for user performance and user satisfaction, on schedule.
Alana Brassard recruited the subjects and administered the experimental sessions. Stan Amway participated in setting the usability goals.
1The views expressed in this paper are those of the authors and do not necessarily reflect the views of Digital Equipment Corporation.
2VAX, VAXstation, and VMS are trademarks of Digital Equipment Corporation.
Bennett, J. L. Managing to meet usability requirements: establishing and meeting software development goals. In Visual Display Terminals, J. Bennett, D. Case, J. Sandelin, and M. Smith, Eds., Prentice Hall, Englewood Cliffs, NJ, 1984, pp 161-184.
Butler, K. A. Connecting theory and practice: a case study of achieving usability goals. In Proc. CHI '85 Human Factors in Computing Systems (San Francisco, April 14-18, 1985), ACM, New York, pp. 85-88.
Carroll, J. M. and Rosson, M B. Usability specifications as a tool in iterative development. In Advances in Human-Computer Interaction, Vol. 1, H. R. Hartson, Ed., Ablex, Norwood, NJ, 1985, pp. 1-28.
Gilb, T. Design by objectives. Unpublished manuscript, 1981. Available from the author at Box 102, N-1411 Kolbotn, Norway.
Gilb, T. The "impact analysis table" applied to human factors design. In Proc. Interact '84, First IFIP Conference on Human-Computer Interaction (London, September 4-7, 1984), Vol. 2, pp. 97-101.
Shackel, B. The concept of usability. In Visual Display Terminals, J. Bennett, D. Case, J. Sandelin, and M. Smith, Eds., Prentice-Hall, Englewood Cliffs, NJ, 1984, pp. 45-87.
Whiteside, J., Jones, S., Levy, P. S. and Wixon, D. User performance with command, menu, and iconic interfaces. In Proc. CHI '85 Human Factors in Computing Systems (San Francisco, April 14-18, 1985), ACM, New York, pp. 185-191.
Copyright © 1986 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org.
This is a digitized copy derived from an ACM copyrighted work. ACM did not prepare this copy and does not guarantee that is it an accurate copy of the author's original work.
Home - Music - Software - MusicXML - Events - eConcertBand - Search - Store - About Us - Publications