Performance, Acquisition, and Training Methods - LEKULE

Breaking

12 Nov 2015

Performance, Acquisition, and Training Methods

Performance is an idea easily grasped by users and managers. One can have a clear sense of work being accomplished and tasks being completed. But often performance is difficult to measure objectively. Furthermore, since performance tends to improve over time, one cannot merely assess a system on the basis of first time exposure. Instead performance must be measured at different stages of adoption of the system by the users.
The improvement of user performance is of great concern to all. With respect to users, one may ask, "How much do users improve with practice? What are the benefits of different training programs?" With respect to the machine, software and hardware developers are interested in the improvement of system performance. Ultimately it must be remembered that performance is a function of both the user's ability and the system's capability. It is the synergistic combination of the two that determines overall performance.
This chapter first considers overall performance as the function of user ability and system power. Measures of human performance, system performance, and overall performance will be discussed. Changes in user performance on menu selection systems over time and implications of the learning profile are discussed. Finally, studies that have compared different training methods using documentation and online help are reviewed.

7.1 Performance
Performance is considered the bottom-line measure of a system. For the computer itself, performance may be defined in terms of technical specifications such as system response time and rate of transmission of menu frames. However, when it comes to human performance, it is not always clear what is meant. Performance may be defined along different factors and at different levels of analysis. Table 1.1 in Chapter 1 lists several factors pertaining to system productivity and to human performance. System productivity refers to the overall performance due to the individual performance of the user and the machine.

7.1.1 Measures of User Performance. User performance on a menu system may be assessed from a micro to a macro level. In terms of speed, one may look at (a) the time that it takes for the user to look at a menu frame and make the choice, (b) the time to locate a target in a menu tree, up to (c) the total time to complete a whole task using a menu selection system. Accuracy must also be considered. One may look at (a) the probability that a single menu choice is correct, (b) the total probability that the user locates the desired target in the menu tree, up to (c) the probability that the task is correctly completed. The level analysis depends on the researcher's focus on component processes involved in the task.
Performance may also be divided along the factors of quantity and quality. It must be remembered that performance on the menu is, in general, not the end in itself. Rather the menu selection system is in support of some other task such as word processing, file management, telecommunications, etc. Ultimately, the issue of performance must relate to productivity in the task domain rather than merely speed and accuracy in menu selection. It is assumed that improved performance on menu selection will facilitate that task; however, it is quite possible that it may do so in a way that cannot be directly assessed through menu performance measures. For example, the menu may convey an effective model of task structure that may slow menu performance but improve quality of work in the task domain. This possibility is discussed in Chapter 8.

7.1.2 Overall Performance = User Proficiency X System Power. Overall performance of a system is assumed to be a function of the level of proficiency of the users and the power of the system. However, it has not been clear just what the function was for combining the two factors. Some have emphasized the power of the computer to solve problems and the limitations imposed by unskilled users. For example, Licklider (1960) discusses the idea of a "man-machine symbiosis" and identifies obstacles to this symbiotic relationship such as the inability of the user to formulate questions and use the command language. Others have proposed a more synergistic relationship in which overall performance is greater than the sum of the parts contributed by the human and by the computer. It is assumed that there exists some unique combination of the two such that the power of the system and the performance of the user are enhanced by each other. Dehning, Essig, and Maass (1981), for example, distinguish "objective operating complexity" which is the actual complexity of a system from "subjective operating complexity" which is the user's perception of the operating complexity that must be overcome. In their view, "...an optimal man-computer interface design can be regarded as an optimization problem between a maximal flexibility of use and a minimal operating complexity" (p. 6). If system power is synonymous with objective operating complexity and user proficiency is inversely related to subjective operating complexity, then for any level of user proficiency there should be an optimal level of system power.
Quite a different idea is expressed by Nelson (1970) in discussing the factors of manpower output and equipment output. On the basis of the idea of marginal productivity, he assumes a substitution function between manpower output and equipment output. For a constant level of productivity, the slope of the function is the substitution ratio between the two factors. If user proficiency is directly related to manpower output and equipment output is a direct function of system power, then performance should be equal to the product of system power and user proficiency.
In order to compare such ideas, Norman and Singh (in press) formulated five alternative models of overall performance as a function of user proficiency and system power. These are shown in Figure 7.1 for four levels of user proficiency and four levels of system power.

Matching model. It may be assumed that a user of low proficiency will perform best with a simple tool and a user of high proficiency will perform optimally with a more powerful tool that matches his or her ability. Thus, optimal performance is expected when there is a perceived match between user proficiency and system power as suggested by Dehning, et al. (1981). The first panel of Figure 7.1 shows the predicted pattern for this model when each level of user proficiency matches the corresponding level of system power. Each line peaks at the point of match.
Averaging model. A symbiotic relationship may be assumed in which expected performance is the average of user proficiency and system power. In this model, there is no interactive effect between the levels of user proficiency and system power. The two work together but one does not enhance or limit the effect of the other on performance. The averaging model is compensatory in that a deficit in one factor can be made up for by a credit in the other factor. For example, a proficient user may compensate for limited system power by using the system more effectively. On the other side, a powerful system may compensate for low user proficiency with online-help and menu selection. The second panel in Figure 7.1 shows the pattern of expected performance for the averaging formulation. Parallel lines are expected since the effect of each factor is the same for all levels of the other factor.
Multiplying model. A synergistic formulation may assume that combined performance is equal to the product of the levels of user proficiency and system power. In this case, the effect of system power is enhanced by increased user proficiency. The proficient user is able to gain higher performance with powerful systems. On the other hand, increased system power is of little effect when the user is of low proficiency. The multiplying model is characterized by a diverging fan of lines as shown in the third panel of Figure 7.1.
Human/computer ratio model. Another interesting possibility is a ratio model in which expected performance is determined by the ratio of user proficiency over the total effort expended. The result is that for a user of either very low or very high proficiency, system power has only a small positive effect. In the intermediate range, however, system power has a substantial effect. The idea is that for a user of low proficiency, increases in system power will have little or no impact. At the other extreme, highly proficient users are already performing at such a high level that increased system power again shows only a small effect. This sort of pattern might occur with an expert knowledge system. An expert in the knowledge domain using the system will perform nearly as well with or without increases in system power. It is in the middle range that a user will benefit from the computer. The computer in this model is an aid. The first step is to gain enough proficiency to use the computer; the second step is to gain proficiency in the knowledge domain so that the person can perform without the help of the computer.
Computer/human ratio model. As a complement to the last model, performance may be determined by the ratio of computer power over total effort. A system of very low power or of very high power, user proficiency has only a small effect. In the intermediate range of system power, user proficiency has a substantial effect. A computer of very low power will result in low performance for all users. Low computer power limits performance. At the other extreme, for a system of very high power, user proficiency will not matter. The machine is not limited by the low proficiency of the user. Systems using natural language comprehension, menu selection, and artificial intelligence techniques may provide examples of this model.
Although actual performance data is difficult to get and validate, expectations about performance can be assessed from users. In order to compare the models, Norman and Singh (in press) presented scenarios representing the 16 combinations shown in panels of Figure 7.1 to students and managers. Ratings of expected performance followed the multiplying model. It is possible that in specific applications the function may be expected to be different; however, the multiplying model appears to be quite general and can be taken as a good first approximation. The implications are that users of lower proficiency are expected to nullify increases in system power; systems of low power are expected to nullify increases in user proficiency; but a synergistic combination is expected such that increase in either factor enhances the other. Both user proficiency and system power are extremely important. The remainder of this chapter will discuss increases in user proficiency.

7.2 Acquisition and Learning
A common misconception is that the user does not need to learn anything when using a menu selection system. The implications of this are that (a) one should not expect improvement with practice, (b) prior experience and familiarity with other systems is of little effect , and (c) training and documentation is not necessary. Although these may hold in some isolated cases, for the most part users engage in substantial learning when using menu systems.
A number of studies have documented the fact that performance changes with practice using a menu selection system. Having established that users learn something, a number of questions become important. At what rate do they improve and to what level of performance? In some systems the change from novice to highly experienced is substantial. For example, as users gain experience with complex system such as CAD/CAM systems, they will begin to access high frequency menu functions rapidly . In others sorts of systems, there may be little improvement with experience, either because performance is already at a peak or because the system is so vast that the user can remember little about the menu structure that would improve performance. For example, in large information retrieval systems, experienced users may not be able to traverse the tree much faster than novices, particularly when the speed of transmission is slow. However, the experienced user may be able to take advantage of shortcuts and plan more effective search strategies.
Designers of systems must take into consideration the characteristics of the users. For infrequent users, the system should be designed for best early performance. The rate of improvement and asymptotic performance are not important because users are not expected to get to that level. Improvements in performance must be designed into the system itself rather than expected of the user.
For frequent users the system should be designed for best asymptotic performance. Although early performance will suffer, the user will pass that stage with experience into a high level of efficiency. Consequently, user proficiency grows into the system and is expected to combine with system capability in a multiplicative fashion. Menus must allow fast and versatile interaction as well as efficient shortcuts to frequently accessed items.

7.2.1 Components Acquired by Practice. One of the first questions to ask is what is acquired when performance improves with practice. It was suggested in Chapter 4 that menu performance is a function of a number of component tasks. Improvement is due to the mastery of these component tasks. Performance on a component may itself improve with practice or it may be eliminated by a short-circuiting process. Figure 7.2 lists some of the components discussed in Chapter 4 and proposes ways in which they may be facilitated or eliminated.

Association of Functional Requirement with Menu Object. Several studies compared the results of menu performance when users were searching for "explicit" versus "definitional" or "implicit" targets. The difference is that for an explicit target the user is shown the exact, verbatim menu item that he or she is to find. For a definitional target the user is given a functional requirement in terms of a definition or a situation that is to be satisfied by the selection of some item. The second case seems to be more applicable to everyday use of menu systems. Users may acquire an association between definitions and menu items. For example, the user may be asked to find "a red fruit" for which apple is the correct target. With practice the user acquires the knowledge that "a red fruit" always refers to an apple. Or in a more realistic vein, the user acquires the relational knowledge between a set of desired functions and a set of menu items.
Location of Menu Item within a Frame. Users acquire search procedures for scanning menu frames. When the organization of items is known, users may develop efficient methods for search. Studies referred to in Chapter 6 dealt with the organization of menus and demonstrated the powerful effect of organization. Furthermore, the exact location of an object within a menu frame may be learned by users with repeated exposure to the menu frame. If the location of an object in a frame is remembered, users will be able to fix their gaze on the item faster.
Location of Menu Item in the Structure. When the menu system has a complex structure, users must acquire some knowledge about the organization of frames and develop strategies for effective search. However, with repeated access of the same items, users will begin to remember the exact path to the items. Rote memory of the path eliminates the need for a search strategy.
Association of Menu Item and Response Code. Once an item is located in the frame, the user must then enter the response code in many systems. In many systems the user must simply enter the code listed next to the menu item. However, as users begin to associate menu items with their response codes, they may recall and enter the response without having to locate the item within the frame.
Motor Production of Response Code. Motor skill is required for entering the response code, whether typing, using cursor keys, or moving a mouse. Users may be expected to become more proficient in entering responses. Furthermore, users may learn the motor sequence and short circuit the translation of the response code into the motor response. The motor response may become so habitual that it is automatic in the same sense that a skilled touch typist does not process individual letters and typewriter keys.

Studies of menu performance with practice shed some light on the relative contributions of these components. Systems differ greatly in the demands and difficulty with each of these components.
7.2.2 Frame Search Time. Do users improve with practice when they are searching for an item in a single menu frame? Parkinson, Sisson, & Snowberry (1985) found that the time to search a menu frame and enter the response did improve with practice. In their experiment subjects were exposed to 128 trials on the same menu frame. On each trial they were shown a target and were required to enter its associated two digit numeric code. Menus contained 64 items and were organized either categorically or alphabetically. Parkinson et al. report the mean response times for 8 blocks of 16 trials. Figure 7.3 shows this learning curve.

The greatest improvement was evidenced within the first few blocks. Small but consistent improvements continued out to the eighth block. From these data it is not clear that the subjects reached the asymptote even after 128 trials.
Improvement in response time when searching a single menu could be due to a number of different factors: repeated exposure to the system, repeated exposure to the same menu frame, or repeated trials on the same item. Familiarity with the system, keyboard, display, etc. and warm-up may account for early reductions in search time. With repeated exposure to the same menu, subjects probably gained a familiarity with the organization of items within the frame. Interestingly, Parkinson et al. note that improvement across trials did not interact with the type of organization of the menu. Similar improvement was found for both categorical and alphabetical organization of menus. In both cases, familiarity with the organization probably aided search by helping subjects to anticipate the spatial location of an item. Finally, when subjects repeatedly searched for the same target, they may have learned the location of specific items in the menu and/or remembered the numeric code for the item. Unfortunately, it is not clear to what extent these three factors contributed to improved performance across trials. However, the results do suggest that designers should carefully consider the components of the system that are acquired by the user over time in predicting the extent of improvement attainable on the system.
McDonald, Stone, & Liebelt (1983) investigated the effect of target type and organization on response time and errors in 64 item menus across 5 blocks of 64 trials. Subjects searched for items given either in an explicit target or given a single line definition of the target. Since there is a certain de-coupling between definitions and objects, one might suppose that responses given definitions would be longer than for explicit targets. This was the case on the first block of trials. However as shown in Figure 7.4, after practice the difference due to target type disappeared. Apparently, subjects learned the pairing of definitions and objects to a sufficient degree as to be as fast as subjects given explicit targets. This finding is encouraging for designers to the extent that they are dealing with a less than explicit environment. Users in general are faced with definitions of items that they wish to select. When those definitional situations are repeated over trials, performance improves to the level of a one-to-one item match. Acquisition of this knowledge appears to occur rather rapidly, in this case after only one exposure to each item.

With practice users may also acquire rules for how items are positioned in the frame despite the fact that actual items may change from frame to frame in a dynamic system. For example, users may learn that if a particular item does appear, that it will be in the same position. To test this, Somberg (1987) compared menus having positional constancy with menus having other rules such as alphabetic ordering, probability ordering, and random ordering. Menus of 20 items were generated from a pool of 2000 words. Forty words were chosen as targets. Subjects were tested on 6 blocks of 82 trials. For positional constancy, 100 words were assigned to each of the 20 list positions and appeared in the same position on each trial (five words per position). For alphabetic ordering, the words were assigned to position according to ascending alphabetic order. For probability ordering, the words were arranged such that the target had highest probability of appearing in the first position (.170), second highest probability of appearing in the second position (.141) and so on down to the smallest probability (.001). Finally, for random ordering, the words were arranged in a different random order on each trial.
Positional consistency resulted in substantial improvement in response time while the other menu ordering rules resulted in virtually no change from the first to the last block of trials (see Figure 7.5). It is clear that item position is learned and used in a relatively short time even with a fairly large number of target items (40) and a large base of items (2000).

This result should be of particular interest to designers of dynamic menu systems. Although the occurrence of items in a menu may vary from frame to frame, their positions should remain constant. Consequently, the method of graying out unavailable items in a list would be superior to deleting them and closing up the list. The overriding guideline is to maintain absolute positional constancy, not just relative position.

7.2.3. Menu Tree Search Time. If performance on a single frame improves with practice, we would expect even greater improvement in complex menu trees. As users become more and more familiar with a hierarchical menu system, they should acquire knowledge about the location of items in the tree. Indeed, such improvement has been reported. Seppälä and Salvendy (1985) looked at improvement from one repetition to another in a study on menu depth. Subjects participated in a simulation task of supervising a flexible manufacturing system. Subjects monitored the functional variables of simulated machines organized in a hierarchical data base. Access to a variable, such as temperature of machine 5, was available through different levels of a hierarchical arrangement of machines within stations within production lines. Subjects were required to check the levels of four variables. Specific variables were selected so that they came from (a) the same machine (Distance 1), (b) different machines within the same station (Distance 2), (c) different machines at different stations within the same production line (Distance 3), or (d) different machines at different stations in different production lines (Distance 4).
Seppälä and Salvendy found a main effect of practice on the performance time. It would appear that subjects were learning to move around through the data base with greater speed as they acquired knowledge of how the data base was structured. A significant interaction with distance within the hierarchy was also found as shown in Figure 7.6. For longer distances between nodes accessed in the hierarchy, there was a greater reduction in time due to practice than for shorter distances.

Moving through a menu tree requires knowledge not unlike that of a cognitive map of geographic locations. The ability to move through the menu rapidly and without error will depend on familiarity not only with items but also with paths. It is maintained here that users may (and should) acquire two types of knowledge: rote knowledge of path traversals (automatic or procedural knowledge) and conceptual knowledge of tree structure and linkages (declarative knowledge). For rapid, repetitive access to menus, such as in the Seppällä and Salvendy study, automaticity will be important. For information retrieval in large data bases declarative knowledge about the structure of the tree will be central. Acquisition of procedural knowledge is accomplished by extensive repetition. Acquisition of declarative knowledge is best accomplished by formal training. As seen in the next section, declarative knowledge about the menu system may come from transfer of training.

7.3. Transfer of Training
As users are exposed to more and more systems using menus, what is acquired on one menu system may transfer to another. This is particularly true in "integrated" software packages and environments. Common menus and functions may exist across word processing, file management, and electronic mail systems. Transfer will be positive to the extent that menus are similar in meaning and structure. On the other hand, negative transfer would be expected if menu organization is grossly different and similar menu terms refer to different functions. Positive transfer has been observed in the empirical literature. Negative transfer has been reported in anecdotal accounts.
Dray, Ogden and Vestewig (1981) found both practice and transfer effects in a study on multiple item line menus and menus calling sub-menus. In multiple-item line menus, users had to select a menu line (out of 6 lines) and then a menu item along that line (3 to 5 per line). Selections were made by using cursor control arrow keys to select the item and the "menu enter key" to effect the choice. In the second condition users selected the menu line which then called a sub-menu with 3-5 items. The two conditions were counterbalanced so that half of the ten subjects worked on multiple-item line menus first and then switched to menus calling sub-menus second. The other half received the conditions in reverse order. Users were given 23 practice trials before each condition and then 3 blocks of 46 trials on each menu. Consequently, after receiving practice on one type of menu, users were transferred to a different type of menu. Figure 7.7 shows the mean response times.

Substantial practice effects were observed within conditions across blocks for both groups. In the first three blocks response times dropped by about 1 sec. Training in one condition transferred to the other condition resulting in reduced times on blocks 4 through 6. Dray et al. note that neither group nor condition differences were statistically significant. However, the transfer from one condition to the other was significant (p < .01). Not only was there acquisition across practice on the same menu, there was also transfer from one condition to another. Subjects may have learned the relative positions of items from one condition to the other.
It is important to be able to characterize what transfers and what does not when users learn new systems or new versions of older systems. Foltz, Davies, Polson, & Kieras (1988) note three ways in which menu systems may be changed:
Deletions. Frequently used items may be shifted to higher levels of the menu tree thereby deleting intermediating menu levels. For example, in the original Display Writer(TM) word-processor, the options create and revise a document were nested under the item "Typing Tasks." In a later version (Display Writer III(TM) on the IBM PC(TM)), create and revise were moved to the top level menu thus deleting part of the menu path.
Additions. Items that are not used very often may be moved to lower levels of the tree. Consequently, additional menu selections must be made in order to access those items. Furthermore, as new features are added, additional levels of the menu hierarchy may be required thus moving items to lower levels in the tree.
Lexical Changes. Item names may be changed to make it easier for users to associate the function with the name. For example, a command "discard file" may be changed to "delete file." Although the change may benefit new users, it may not help users that have already had prior experience with the system.
If it is assumed that users learn the menu system by encoding a set of production rules, then transfer of learning from one system to another depends on the number of rules that stay the same and the number of new rules that are added. Foltz et al. tested this idea by training subjects on one word processor and then switching them to another. Eight tasks were selected for study that tapped the three types of changes. Foltz et al. developed a predictive model of the time to learn tasks based on the number of new rules required. The results indicated that when tasks required fewer steps (deletions) no new rules needed to be learned. If additional steps were needed to perform a task, the cost of learning was equal to the time required to learn those new rules. However, the model significantly underpredicted the learning time for lexical changes. Subjects did not generalize the rules for a task learned in one system to another even though they had similar names. Instead, they treated the two tasks as if they were independent and had to learn entirely new tasks.
Prior experience is a powerful factor in the acquisition of performance on menu systems. Systems should attempt to capitalize on what the user already knows. Users are much more likely to adopt a system that is most similar to systems they have used before. Although this may be attributed to brand preference, to a large extent users may be wisely opting for maximal transfer of training. It should be a comfort to designers that additions and deletions to the menu hierarchy are not particularly upsetting in terms of the amount of training to master the new system. However, it is most disconcerting that a change in the wording of items that are really the same may totally throw the user. The advise to the designer is to use the right wording in the first version and to stick with it.

7.4 Methods of Training
One of the presumed advantages of menu selection systems is that no formal training is necessary before using the system. Although this is true in many systems, in other more complex systems training is required. This is particularly true in hierarchical menu structures in which the clustering of alternatives is not immediately apparent to the user. Training on such systems may be by way of technical documentation of the menu structure, instruction in cookbook routines, and trial-and-error exposure. A casual survey of documentation reveals four major types of training:
Command Sequence Training. Users are typically shown sequences of choices that lead to particular target items. For example, one file management program gives the following sequence for finding restaurants in St. Louis: (1) Choose the Find command from the Organize menu, (2) Click the Clear button, (3) Click in the box next to "city," (4) Type St. Louis, (5) Click the Find button or press the Enter key.
Such documentation generally lists important software functions and gives "cookbook" procedures for accessing them. Consequently, the menu system may be learned in a piecemeal fashion rather than as an overall structure. Cognitively, one would expect users to encode the menu system as a list of rote associations between targets and menu choices rather than forming a mental map of the menu tree.
Menu Frame Documentation. Users may be shown listings of all the frames appearing in the menu system. Time sharing systems often provide manuals with such listings. Documentation on personal computer systems often show pictures of pull down menus, menu boxes, and screen displays of function key options. Such documentation merely reproduces the screens in a printed form and stresses the visual layout of choices at a single level rather than sequences of choices across levels. It allows the user to glance through all the frames and options, but it does not relate one frame to another in the overall menu structure. Cognitively, the user may be expected to encode the system as a set of visual images of frames. Any mental organization of the frames would be the result of inference on the part of the user.
Global Tree Documentation. Users are given a diagram of the menu tree showing all of the menu frames and the links from one to another. A number of systems provide large tree diagrams but often such documentation takes the form of a functional specification of the system and is buried in an appendix rather than used as a means of training. When it is specifically designed for user training, this type of documentation stresses the hierarchical structure of the menu and the links among the items rather than tracing specific paths to accomplish particular functions. When the menu tree is large, it may be subdivided into meaningful subtrees so as not to overwhelm the user. Cognitively, the user is given a visual map of the system. Command sequences for accessing target nodes would be the result of plans formulated by the user by starting at the target node on the map and tracing a route back up the tree to the root or current node.
Trial and Error Training. Users are often told to explore the menu system to find out where it goes. The documentation for one such system reads, "The best way to learn this system is to use it!" An advantage of trial and error training is that the user gets hands-on experience, develops motor and visual skills at an early stage, and participates in active learning by discovery. On the other hand, such training tends to be unsystematic. The user may end up having never explored large areas of the system. Additionally, the user may become frustrated by having found an important function at one time and later never be able to find it again. Systems which encourage trial and error training tend to be oriented toward discovery and exploration as a central concept of the software. Cognitively, users participate in active learning. Menu organization and command sequences to access targets are the result of discovery and inference on the part of the user. Unfortunately, many users are not as interested in self-discovery and exploration as software designers would hope.
Many systems use a combination of documentation techniques and methods of training in the hope that one or another method will get the point across. However, it is of interest to know which method proves to be the most effective for different types of systems. A series of studies by Billingsley (1982), Schwartz, Norman, and Shneiderman (1985) and by Parton, Huffman, Pridgen, Norman, and Shneiderman (1985) investigated the effect of training method on performance. The four types of training listed above were chosen as being representative of both formal and informal methods used in the field. The meaningfulness of the menus was varied across the studies in order to see if training had different effects at different levels of familiarity.

7.4.1 Training on Content Free Menus. Schwartz et al. (1985) investigated methods of training on menus that provided no initial clues as to what choice would lead to what target. This was achieved by using meaningless terms as labels for the alternatives. Consequently, the menu was content free. The rationale for using a content free menu was to be able to observe the effect of training and practice in the absence of prior knowledge about the menu structure. Consequently, users had to acquire the associative links between pointers and objects in the menu tree. The menu tree used in this experiment is shown in Figure 7.8.

Groups of 20 subjects each were assigned to the four training methods (command sequence training, menu frame documentation, global tree documentation, and trial and error training). Subjects were allowed to study the documentation materials for 5 minutes before being tested on their ability to locate targets. The command sequence training group studied a set of 27 cards containing the choice sequences to all 27 targets (e.g., ZUREN-DAJ-HOUSE). The menu frame group studied a set of 13 index cards containing all 13 menu frames. The global tree group studied a large diagram of the menu tree as show in Figure 7.8. In the case of the trial and error group, subjects studied the menu by exploring the system online. Study materials were then removed and subjects were then tested on the 27 targets in a random order. Each target was displayed at the top of the screen. Subjects were to locate the target by selecting an alternative from the top menu and then from the second level menu. Each trial was scored as either correct or incorrect depending on whether they had located the target. Subjects were not allowed to move back up the tree. Following the test on the 27 targets, subjects spent another 5 minutes studying the documentation. They were then retested.
Figure 7.9 shows the results for the proportion of correct responses for the four groups across the two tests. Trials were blocked into groups of 9 to be able to observe practice effects during testing. Overall the number of targets found differed significantly with type of training documentation. The group studying the global tree out-performed the other three groups. By the second test, the order of the groups clearly indicated that the global tree documentation was superior. The trial and error group was a distant second. Menu frame documentation and command sequence documentation resulted in the fewest number of targets found.

Practice across the three blocks of trials did not result in significant increases in performance. On the other hand, the improvement due to study between the first and second test was quite dramatic. Overall, subjects nearly doubled their average number correct from Test 1 to Test 2. This finding underscores the importance of training and especially the benefits of refresher courses after the user has begun to work with the system.
One very important question about training has to do with what it is that the users learn, that is, how do they encode information about the menu system in a way that they can usefully retrieve it in the future. One way of tapping this information is to ask the subjects to recall the terms or to reconstruct the menu tree. Following the second test, subjects in the Schwartz et al. study were asked to write down all the terms used at the first, second, and third level of the tree. Figure 7.10 shows the proportion recalled at each level. Quite different patterns of recall occurred depending on the method of training. The command sequence group showed approximately equal percent recall across the three levels. Equal attention was paid to each level. On the other hand, the menu frame and trial and error groups displayed very poor recall of the terms used at Levels 1 and 2 although they showed good recall at Level 3. Clearly, these subjects attended to the target items but not intermediate terms. Finally, the global tree group displayed the best recall all around. A "level effect" is evident for this group in that recall was best for the bottom of the tree, next best at the top, and worst in the middle. One may object that recall has little to do with performance on menu selection. However, the correlation between number of terms recalled and number of targets found was rather large (r = .78) indicating a strong but not necessarily causal relationship between the two.

After recalling the items, subjects in the Schwartz et al. study were given the list of all terms and a box diagram of the tree and asked to write the terms in their correct locations. Figure 7.11 shows the percent of items correctly placed in the tree diagram. All groups placed about 80 to 90 percent correctly at Level 1. The command sequence and menu frame groups did much worse at Levels 2 and 3 showing confusion about the organization of items. The global tree group did the best at Levels 2 and 3 compared to other groups. Finally, the trial and error group did well at Level 3 but extremely poor at Level 2. It would seem that subjects in this group concentrated on the targets but not the path to targets. Finally, it is interesting to note that a "level effect" is evident for all groups except for the command group that did so poorly at Level 3. The greatest confusion about the structure of a menu occurs at the middle levels of the tree.

The Schwartz et al. study clearly demonstrated the superiority of global tree documentation and training. One might have supposed that with content free menus, the subject's only recourse would be to memorize command sequence. This was not the case. Even with a content free menu, subjects studying the overall organization of the tree were able to locate the most targets. It is possible, however, that the superiority of the global tree training was limited to content free menus.
7.4.2 Training on Meaningful Menus. Parton, Huffman, Pridgen, Norman, and Shneiderman (1985) conducted a parallel experiment using meaningful terms in the menu system. The menu which they used consisted of a hierarchy of job titles and is shown in Figure 7.12 The training methods used were similar to the Schwartz et al. experiment. Parton et al., however, allowed the users to study information on the menu selection system for 12 minutes rather than 5 and then try to find as many targets as possible within a limited period of 10 minutes. The experiment was conducted on an interactive time-sharing system.
The results for number of targets found, number of selections to reach the target, number of menu items recalled from memory, and ratings of ease of learning are shown in Figure 7.13.

Although the group trained on the global tree found the most targets within the time limit, the difference was not statistically reliable probably owing to the small number of subjects used (16 per group) and the relatively large variability among subjects. The global tree group also required the fewest selections to reach the target, but again this did not reach significance.
Similar to the Schwartz et al. (1985) study, the global tree group recalled the highest number of menu items and correctly placed them in the tree diagram. A relatively high correlation was found between recall of items and the number of targets found (r = 0.77) indicating that subjects who found more items also tended to recall more items from memory. It would be predicted then that training methods that help the users to recall menu items facilitate performance.
A quite strong finding was that subjects given the global tree gave higher ratings to the ease of learning of the system. The validity of the subjective ratings was supported by the fact that subjects who gave higher ratings also tended to find more targets (r = .46).
An earlier study by Billingsley (1982) confirms the benefits of studying a map or global tree diagram. Her study compared a control group using trial and error practice, a group studying command sequences, and group studying a diagram of the structural organization of the menu system. Subjects searched a menu system for names of target animals. Subjects who studied the diagram displayed superior performance in terms of time per search and number of frames traversed per search. The command sequence group also did better than the trial and error group. One additional finding was that subjects studying the global tree showed better retention of menu structure than the other groups. Following training, subjects searched for 9 targets. When they were switched to 9 different targets, the map group did just as well as before, but the performance of the trial and error and the command sequence groups declined.
The striking conclusion is that documentation that gives the whole tree structure is superior to the other three modes of training tested. This result is perhaps a little stronger for menus of low meaningfulness (Schwartz et al., 1985) than for menus of high meaningfulness (Parton et al., 1985). However, the recommendation to designers of training methods would be to include the global tree in documentation for hierarchical menu systems. Menu diagrams ought to be presented in the training materials for the user rather than in an appendix merely for technical specifications. Just how such diagrams should be laid out is still a matter of speculation. In the studies reviewed here the diagrams proceeded from left to right, whereas in the documentation for many systems, they proceed from top to bottom down the hierarchy. Another issue has to do with how to break large diagrams into manageable sections. How should the submenus be divided and organized?
7.4.3 Methods of Training as a Function of Types of Menu Systems. In practice, documentation and training often combine methods or alternate between methods. It may be that each type of training adds an important component to what is learned by the user. For example, the trial and error method allows the user to have hands-on practice with the mechanics of screen display and the selection response. Frame documentation familiarizes the user with the array of frames and alternatives. Command sequence documentation provides explicit pathways to items. Finally, the global tree displays the overall structure. A good training program may well employ a "shot-gun" technique of introducing the user to all methods.
On the other hand, different types of systems and different levels of users may require different types of training. Sometimes it is advantageous to have no documentation and to encourage users merely to try the system. In other cases where heavy memory demands are placed on the user, training should emphasize the learning of terms and pathways. In still other cases, where motor skill is required in selecting responses, training should involve repeated practice. In developing a training method one needs to assess the critical components to be learned by the user and then identify methods to facilitate the acquisition of those components.
Streitz (1987) suggests that part of the design process is to come up with a way of describing and naming things to convey to users the function and structure of the system in a way that is cognitively consistent with their existing mental models acquired from experience with other systems. Training often involves the use of a metaphor world. In a way the metaphor helps to create a schema for the user or a global picture of how things operate. Streitz reports a study by Lieser, Streitz, and Wolters (1987) in which subjects were trained in either the "desk top/office" metaphor or the abstract "computer" metaphor. As a second factor in the experiment, one half of the subjects used menu selection and the other half used control commands. Performance was recorded in terms of time per task. The results indicate that the "desk top/office" metaphor facilitates learning and performance only when it was combined with the menu selection. As shown in Figure 7.14, the group using menu selection in the "desk top/office" metaphor performed tasks nearly 25% faster than all other groups. The guideline for design is compatibility. The training method should be compatible with the user's conceptualization of the system and with the mode of interaction between the user and the system.

7.5 Methods of Help
In more and more instances systems are providing online documentation and training. The advantage of this approach is that users can begin productive work at an early stage and progress until they require help. The disadvantage is the lack of structure and discipline in training. A second problem is that online help may add yet another level of complexity for the user to deal with. The literature on the effect of online help is inclusive due to the type of performance measures used to assess the benefit of help and due to the added time and effort required by users to learn how to use such aids.
Menu selection systems are inherently online help systems. However, the issue is how much help in terms of additional information should be given to optimize performance. Help in menu selection is generally provided in one of two ways. The first is to give the user longer definitions or descriptions of the alternatives. The second is to give a look-ahead feature to allow users to see upcoming alternatives before making a selection. Other types of help may provide additional information to reduce memory demands on the user.
Using a deep hierarchical menu (2 x 2 x 2 x 2 x 2 x 2), Snowberry, Parkinson, & Sisson (1985) provided three different types of help information. They surmised that low accuracy in deep menus could be due to three factors. First, users may forget what they are looking for. Second, users may forget the path they have already taken. Third, users may not remember what the upcoming items are. Consequently, the three types of help provided a display of either (a) the desired target, (b) the list of previous choices, (c) the list of upcoming items. A control condition was also included which did not receive any additional help information. Two blocks of trials were given so that the effect of help fields could be seen at different levels of practice. Figure 7.15 shows the results for search time and percent error.

Overall subjects tended to do better with practice. Furthermore the type of help information had an effect on both error rate and search time. The group receiving information about upcoming selections produced significantly fewer errors than the other three groups which did not differ significantly from each other. It was found that the reduction of errors occurred primarily at the top two levels of the menu tree where there may be weak associations between general category descriptors and targets. The group receiving the list of previous selections was significantly slower than the other three groups. The added information presented on the screen may have slowed the subjects down. Although the group receiving upcoming selections did not differ in search time on the first block, on the second block it was the fastest.
In a second experiment, Snowberry et al. (1985) investigated the effect of help fields after increased practice on the menu. After 128 trials on the menu, no significant differences occurred among the groups on error rate. This finding confirms the notion that help fields have decreasing value with increased experience.
Some tentative guidelines may be drawn regarding additional help in menu selection. First, help should be optional. The user should have the option of getting help, but it should not be routinely presented. Excess help information may distract the user. Second, help information should be aimed at critical components in menu selection process rather than at superficial aspects. Figure 7.2 displays such critical components. Finally, the type of help needed changes with experience. At the early stages of learning, users may need help remembering subsequent choices; at later stages they may desire help in finding shortcuts.
7.6 Summary
Performance is a function of user proficiency and system power. Expectations are that these two factors combine multiplicatively to determine overall performance. Consequently, improvements in one increase the gain in the other. On the one side system designers are creating more powerful and efficient systems. On the other side, users are trying to come up to speed. Fortunately, when users access a menu selection system on a repeated basis, their performance improves. Studies have shown that both the speed and accuracy of menu search increases with practice. Exactly what users learn is not entirely clear; however, a number of critical components of learning have been identified. Such components are acquired with practice and may transfer from one menu system to another depending on the similarity of the systems. Integrated software takes full advantage of the transfer process, by maintaining common menus across different applications.
Although some menu systems are self-explanatory, most require training and documentation. A number of methods have been developed in order to facilitate the learning process on the part of the user. Four methods have been explored in experimental research, training on: (a) on command sequences that give the menu selections to find specific targets, (b) on menu frames that show the screen displays of menus, (c) on a global diagram of the menu system, and (d) trial and error practice. While each method has its merits, providing users with a diagram of the overall menu tree proves to be the most effective. Users seem to be able to locate more items with fewer errors after studying a diagram of the global tree. In addition, subjective evaluations favor the use of the diagram.
Online help information has also been provided to facilitate menu selection performance. The question is how much help should be provided, what type of help, and when. Research indicates that information about upcoming choices is beneficial. On the other hand, information about previous selections made may prove to distract the user. Help systems should probably be optional. The type of helpful information is expected to change with experience. Finally, help should concentrate on information that is critical to performance rather than on superficial aspects.

No comments: