Abstract

The keystroke-level model (KLM) is the simplest model of the goals, operators, methods, and selection rules (GOMS) family. The KLM computes formative quantitative predictions of task execution time. This paper provides a systematic literature review of KLM extensions across various applications and setups. The objective of this review is to address research questions concerning the development and validation of extensions. A total of 54 KLM extensions have been exhaustively reviewed. The results show that the original keystroke and mental act operators were continuously preserved or adapted and that the drawing operator was used the least. Excluding the original operators, almost 45 operators were collated from the primary studies. Only half of the studies validated their model’s efficiency through experiments. The results also identify several research gaps, such as the shortage of KLM extensions for post-GUI/WIMP interfaces. Based on the results obtained in this work, this review finally provides guidelines for researchers and practitioners.

1. Introduction

Human-computer interaction (HCI) simplifies reality with models of human behaviour to design and evaluate computer systems [1]. Within HCI, models of motor behaviour lie on a continuum of analogy and mathematical equations. Generally, the models are categorised as either descriptive or predictive. Descriptive models present a framework to describe a phenomenon by identifying its features within a computer system. At the other end of the continuum, predictive models are commonly used to provide analytical a priori estimations of human performance without user participation, thus reducing time and resource consumption.

A family of predictive models (GOMS) were developed to compare and evaluate goals, methods, and selection rules of skilled, error-free user performances [2]. GOMS techniques model goal hierarchies of defined unit tasks rendered as a composition of action and cognitive operators [3, 4]. The simplest member of the GOMS family is the keystroke-level model (KLM), which predicts the execution time of specific tasks in a desktop environment using a mouse and keyboard. The KLM has been widely utilised to evaluate expert performances of various desktop interfaces, and its aptitude and usefulness have been well demonstrated.

The challenges of designing and developing computer systems and the emergence of new technologies have revealed a need for updated quality assessments. Revising predictive models for these challenges can help evaluate human performance a priori and reduce the need for time- and resource-intensive human studies. The KLM was developed from and intended for desktop systems but has continually been extended to model systems designed for other computer setups in various domains. These extensions involve adapting the original KLM operators, introducing or inheriting new operators, revising heuristics, and presenting new execution calculations or techniques to satisfy the extension’s purpose.

A systematic review of KLM extensions provides an objective procedure for identifying the extent of the research that is available; to the best of the authors’ knowledge no prior systematic review exists that focuses on KLM extensions. This paper extensively reviews KLM extensions between 1980 and 2016. The goal of this review is to summarise, analyse, and assess the empirical evidence regarding the purpose for each extension, the extension’s application domains and setups, and the research methods used to create and validate extensions. Most importantly, this review investigates how the KLM has been extended within the new models by examining operators, heuristics, equations, and domain-specific metrics. The results of this review also outline relevant issues for designers, developers, and researchers who apply or extend the KLM.

The rest of this paper is organised as follows. Section 2 presents the background for the KLM by introducing the topic and its seminal publications. Section 3 describes the methodology and protocol used to systematically review the KLM extensions. Section 5 describes the results of the review. Section 6 discusses the principal findings, limitations, and implications for research and practice. Finally, Section 7 concludes the paper and suggests future directions.

2. Keystroke-Level Model: An overview

KLM [6] is the simplest and most practical GOMS method for evaluating the time performance of user-computer system interaction. Underlying the KLM is the assumption that user employs a series of small and independent unit tasks. These tasks support the decomposition of larger tasks into manageable units. The sum of the durations of these small units equals the time it takes to complete the task. Each unit task has two phases: task acquisition and task execution: the total time to complete a unit task is the sum of these two parts: First, in the acquisition phase, the user conceptualises and develops a mental representation of the unit task. Then, during execution, the user invokes the appropriate system commands required to accomplish the unit task. The KLM predicts only the execution time of a unit task because that is the only phase over which a system designer has direct control.

Unit tasks in the KLM are described with a set of physical-motor, mental, and response operators (see Table 1). Operators are identified by a letter and include: K keystroking, P pointing, H homing, D drawing, M mentally preparing, and system response. K is the most frequently used operator and represents a keystroke or a button press. The operator is the act of pointing to a target on a display with a mouse. P would typically be computed as a function of the distance to a target and its size (Fitts’ law, [7]); however, for simplification it is assigned a constant time. In a typical computer setup, H is the action of moving the hand between keyboard and mouse and includes any fine hand adjustments on those devices. The physical operator, D, is restricted to the mouse and refers to manually drawing a set of straight-line segments within a constrained 0.56 cm grid. Before carrying out a physical action, the user has to mentally prepare for its execution. This preparation is represented by the operator and a constant value of 1.35 seconds. The final operator, R, refers to the time it takes for the system to respond to a user’s actions.

Unlike physical and system operators, M is not an observable user behaviour, yet it comprises a substantial fraction of the prediction. The occurrence of is based on specific knowledge of user skills, and their placements are governed by a set of heuristics that embody psychological assumptions about users. Methods are a sequence of system commands that form a compiled segment of a user’s behaviour when executing a unit task. A user cognitively organises a method according to cognitive chunks, and typically occurs between chunks rather than within them. In Table 2, while Rule 0 identifies possible decision points within the methods, Rules 1 to 4 attempt to identify these method chunks.

Execution time is predicted by decomposing a unit task into a list of operators and then computing their summation:

is an operator’s total time, e.g., , where is the number of keystrokes and is the duration of each . To illustrate how the KLM’s equation and rules can be applied to predict user performance, consider the following example of a user renaming a folder to “klm” on a desktop. The user homes the hand on the mouse, H; points the mouse cursor at the object, P; double-clicks on the folder icon to allow for renaming, KK; homes hands on keyboard, H; keys new name “klm”, KKK; and presses Enter, K

The KLM model without M and R (assuming an instantaneous response from the system) is HPKKHKKKK. Applying the heuristic rules for placing the operators results in the final model MHPKKHMKKKK, where the first is the time spent by user searching for the folder on the computer display, and the second is the time the user requires to mentally prepare for typing. Therefore, (assuming is 0.28 for average nonskilled typist):

KLM was validated against observed values to determine how well the model predicted performance times and was subsequently used to model typical tasks in various systems (text editors, graphic editors, and executive subsystems). K’s value can be determined from a typing test prior to the test tasks. After a practice period, each expert user carried out test tasks and their keystroke times were logged. These times were then compared against the modelled predictions. The root-mean-square percentage error (RMSPE) was calculated as 21%. The developers of the KLM reported that this accuracy is the best that can be expected from the KLM and that it is comparable to the 20-30% previously obtained from more elaborate models [2, 6].

KLM inherits several limitations from GOMS. It assumes the user is an expert and does not account for user errors. This makes the model ill-suited for predicting average or novice system users. The model also assumes that the task is performed linearly; however, users often multi-task and are frequently interrupted. The KLM also does not consider individual differences in performance, such as mental workload and fatigue. In addition, the KLM predictive model is usually not generalizable because it is constructed to fit and evaluate a given interface.

3. Methodology

This systematic review of KLM extensions was carried out following the procedure given by Kitchenham and Charters [5, 8]. The review process consisted of three stages: planning, conducting, and reporting (see Figure 1). The review protocol was established after several meetings and discussions to reduce the risk of research bias. The rest of this section describes the research questions and the subsequent steps undertaken to conduct the review.

3.1. Research Questions

The goal of this review is to examine the current extensions of the KLM from the point of view of the following research question: “What extensions have been applied to the KLM and how have these extensions been developed and evaluated?” The question aims to summarise the current practices around extending the KLM to shed light on gaps in the current research, suggest areas for further investigation, and provide knowledge on the adoption of the KLM and its extensions to measure the performance of prototypes. Table 3 lists all the research questions and their motivations.

3.2. Data Sources

The main electronic database sources used to search for primary studies included the ACM digital library, IEEE Xplore, Springer Link, Elsevier Science Direct, Web of Science, Scopus, Taylor and Francis online, and Google Scholar.

3.3. Search Strategy and Terms

The search string consisted of two main parts: the KLM and its extensions (see Table 4). The first part relates to studies utilising the KLM for extension or evaluation, and the second part relates to extensions. The terms were extracted from textbooks and research papers on the KLM. The search string was formed by incorporating alternative terms and synonyms using the Boolean “OR” expression. The two main search terms were then combined using “AND”.

The search was conducted by applying the search string to collections of article meta-data. The string syntax was adapted for application to each digital library and its restrictions. This review was restricted to the period from July 1980 (the first time KLM was presented in “The Keystroke-level Model for User Performance Time with Interactive Systems,” [6]) to December 2016.

In addition to the primary search strategy, backward and forward searches were conducted. For each selected paper, the references were examined for a backward search, while the “cited by” links provided by some of the digital libraries were analysed for the forward search. Finally, publications citing the original KLM paper were also searched.

3.4. Study Selection Criteria

Each primary study was evaluated for relevance against inclusion and exclusion criteria. A study was selected when it satisfied one of the following inclusion criteria:(i)Studies explicitly extending the KLM.(ii)Studies reporting evaluations that employed the KLM or its extensions in post-WIMP interfaces or new application domains.(iii)Studies combining the KLM or its extensions with other models.

Studies were excluded from the review when they met one of the following exclusion criteria:(i)Studies presenting KLM-like extensions that did not extend the KLM or an extended version.(ii)Studies presenting extension recommendations.(iii)Studies presenting KLM testing processes that were focused on determining the effectiveness of the KLM for evaluation.(iv)Studies modifying the KLM to create composite operators.(v)Studies presenting duplicate reports of the same study and that did not present new material.(vi)Studies not written in English.(vii)Unpublished studies, excluding technical reports and theses.

3.5. Quality Assessment

The selected primary studies were assessed for relevance and strength using a three-point Likert-scale questionnaire consisting of the following subjective and objective closed-ended questions (see Appendix A):(i)The study presents a clearly stated purpose for extending the KLM.(ii)The study extends the KLM with new operators or modifications.(iii)The study clearly defines the research method used to extend the KLM.(iv)The extension methodology is adequate and repeatable.(v)The extension results and findings are clearly stated.(vi)The study clearly defines the research methods used to validate the extended KLM.(vii)The extension validation methodology is adequate and repeatable.(viii)The validation results and findings are clearly stated.(ix)The study presents a comparative analysis of the extended KLM against the original KLM.(x)The paper has been cited by other authors and/or contributes to the literature.

Each question is ranked 1 (yes), 0.5 (partly), and 0 (no). The final quality score is the sum of these values. The maximum score is 10 and the minimum score is 0. The quality of each primary study was ranked by two researchers. After thorough reviews, discussions were conducted to reach a final decision about the inclusion of each study in the review.

3.6. Data Extraction

The data extraction strategy was used to provide answers to the research questions in Table 3. An extraction form was developed to ensure that consistent extraction criteria were used (see Appendix A). The information extracted included:(1)Title, author, year, and type of publication.(2)RQ1: the purpose of the extension, device setup, application domain, and the intended users.(3)RQ2: the research method used to extend the KLM.(4)RQ3: how the KLM was extended, including adapted operator unit times, new operators or equations, updated heuristics, and domain-specific metrics.(5)RQ4: the research method used to validate the viability of the KLM extension, and the performance metrics used for validation, including any comparison of the extension’s performance against KLM.

3.7. Data Synthesis

The objective of this step was to accumulate and combine facts and formulate responses to the research questions. The extracted information was grouped, summarised, and tabulated based on the six separate tables according to the elements identified in the data extraction process: general information, quality assessment, and RQ1–RQ4. Each primary study was assigned a code to identify the reviewed studies. Furthermore, extracted information concerning how the KLM was extended (RQ3) was collected in a table that lists and collates all the operators utilised.

4. Conducting the Review

Applying the review protocol yielded the preliminary results shown in Table 5. In total, 149 studies were selected. Using the defined inclusion and exclusion criteria, 62 primary studies (based on 66 articles) were identified. During this stage several issues were identified:(i)Some studies document different stages of the same research; for this reason, we refer to a smaller number of studies based on a larger number of articles.(ii)Some studies appeared in more than one source; these were considered based on the adopted search order (ACM, Scopus, Springer Link, Science Direct, Web of Science, Taylor and Francis on-line, IEEE Xplore, and Google Scholar).

The forward and backward search of the selected studies yielded only two relevant papers that were included. This low number indicates the thoroughness of the search terms used. The number of papers reviewed totalled 64 primary studies, based on 68 articles.

5. Systematic Review Results

This section summarises the results obtained after conducting the review and synthesis. First, an overview of the primary studies and their corresponding quality marks is presented. Next, the answers for three of research questions are addressed in separate subsections (RQ1, RQ2, and RQ4). Because research question RQ3 (see Table 3) is considered the most important, it is addressed in a separate section. Finally, a discussion and interpretation of the results is presented.

5.1. Descriptive Statistics

Table 6 shows the unique identifier assigned to each study and lists the associated reference. These identifiers will be used throughout the remainder of this review to refer to the primary studies.

5.1.1. Quality Assessment

Each quality assessment question was assigned a score of 1 (yes), 0.5 (partly), or 0 (no). The maximum score is 10 and the minimum score is zero. The quality scores were divided into categories:(i)Very High: 8 quality score 10(ii)High: 5.5 quality score 7.5(iii)Medium: 3 quality score 5(iv)Low: 0 quality score 2.5

Figure 2 demonstrates the quality-wise distribution of the primary studies. Of the 68 studies, 15 (22.06%) were assessed as Very High, 20 (29.41%) were assessed as High, 19 (27.94%) were assessed as Medium, and 14 (20.59%) were assessed as Low. The primary studies PS3, PS14, and PS56 ranked at the top had quality scores of 10. Studies PS3 and PS56 were published in the journal of Human-Computer Interaction and Human Movement Science, respectively, and PS14 is a technical report from the University of Michigan Transportation Research Institute (UMTRI). The lowest score was 0.5 for PS48. The primary studies with low quality scores were excluded from analysis (PS2, PS12, PS19, PS23, PS25, PS33, PS36, PS40, PS48, PS50, PS51, PS54, PS58, and PS60), resulted in 54 primary studies with Medium, High, or Very High quality scores.

5.1.2. Publication Year

Figure 3 illustrates the distribution of published studies from July 1980 to December 2016. In the first decade after July 1980, only three (5.56%) journal papers were published in Communications of the ACM (PS1), Journal of Human–Computer Interaction (PS3), and the International Journal of Man-Machine Studies (PS4). The publication rate increased in the following decade (1990s), with 10 publications (18.52%, PS5-11, and PS13-15). This increase continued from 2000 to 2010, with 17 (31.48%) published studies (PS16-18, PS20-22, PS24, PS26-32, PS34-35, and PS37). A significant rise followed through the end of 2016 with 24 (44.44%) published studies (PS38-39, PS41-47, PS49, PS52-53, PS55-57, PS59, and PS61-68). This post-2010 spike in publications coincides with the resurgence of touch interactions and post-GUI configurations, which signalled a need for updated performance assessors.

5.1.3. Publication Sources

Table 7 summarises the details of the top publications for KLM extensions. Seven primary studies (12.96%) were published in ACM Human Factors in Computing Systems (CHI), this includes PS9, PS17, PS29-31, and PS38-39. Fewer than 10% of the primary studies were technical reports (PS13-14, PS18, PS52, and PS62) published through University of Michigan Transportation Research Institute (UMTRI). Just under 15% of the publications were journal articles: Human-Computer Interaction and Personal Ubiquitous Computing; the former had four primary studies published (PS3, PS5-6, and PS22) and the latter also had four (PS16, PS27, and PS34-35). HCI International published four studies for 7.41% (PS29, PS32, PS37, and PS67). Three studies (5.56%) were sourced from the IFIP Conference on Human-Computer Interaction (PS11, PS46, and PS59).

Figure 4 illustrates seven of the eight digital sources used to search for primary studies and the number of publications retrieved. Overlaps existed between the various sources, and the studies were considered based on the search order. The results from Web of Science were encountered in other sources and are not shown. The majority (40.74%) of the studies were retrieved from Google Scholar. Fourteen studies (25.93%) were found in the ACM digital library. Springer Link produced 10 studies (18.52%), followed by IEEE Xplore with 3 studies (5.56%), Scopus and Taylor & Francis On-line with 2 studies (3.70%) each, and finally, a single publication from Science Direct (1.85%).

5.1.4. Publication Type

Publications were categorised as journal articles, conference proceedings, technical reports, extended abstracts, or theses. Figure 5 illustrates the distribution of primary studies across the five publication types. Statistics show that 50% (27 studies) of the studies were conference proceedings and 29.63% (16 studies) were in journals. The remainder of the studies were technical reports (6, 11.11%), theses (4, 7.41%), and one extended abstract (1.85%).

5.2. RQ1: What Was the Purpose of Extending the KLM?

The motivation behind this research question was to identify why the KLM or one of its extensions was revised. This question encompasses several areas of interests: main purpose, hardware technology (i.e., device setup), application domain, and intended users.

5.2.1. Main Purpose for Creating Extensions

Several purposes for creating extensions were identified in the review. These were collated into six main reasons:(P1)To fulfill a need for an updated quality assessment for new technologies, applications, techniques, or representations.(P2) To test the applicability of the KLM or its extensions.(P3) To extend the KLM to evaluate new technologies, applications, techniques, or representations.(P4) To revise the original or extended operators and heuristics.(P5) To extend the KLM or its extensions to describe additional interactions.(P6) To integrate KLM or its extensions with other models.

Studies with (P1) as a purpose were conducted to extend the model, and in some instances validation studies were also conducted to confirm the viability of the model. For (P2), research methods were often utilised to extend and validate the extended model. In (P3), the studies typically extended the KLM or one of its enhancements and used experiments to determine the performance of a certain device or application. The fourth purpose, (P4), revised operators and heuristics using research methods; however, validation was infrequent. (P5) extended the model to describe new interactions, and utilised research methods to extend and validate the enhancement. The final purpose, (P6), utilised short studies to extend the KLM or one of its extensions to better incorporate it with a larger model.

Table 8 summarises the results of analysing the number of publications for each purpose. The table shows that publications with purposes (P2) and (P5) had the highest median quality scores (7.75 and 7.50, respectively). This is because these studies usually carried out experiments to extend and validate KLM. Publications that extended the KLM for purpose (P3) obtained medium quality scores (median of 4.24), since validation was often not considered.

5.2.2. Device Setup

Table 9 summarises the device setups collated from the review of 54 primary studies. The majority of extensions (20, 37.04%) modified the KLM or one of its extensions to model mobile or tablet interactions. These extensions were further categorised as key-based (12, 22.22%) or touch-based (8, 14.81%) mobile devices, smartphones, or tablets. Fourteen studies (25.93%) extended the KLM for traditional configurations. The KLM was also extended for In-vehicle Information Systems (IVIS), which were categorised as either traditional (with knobs and dials, 11.11%) or touch-based (7.41.26%). Specialised configurations add features such as a digitized pad (PS8-9), Braille display (PS28), mouth-stick (PS10 and PS49), Leap Motion sensor (PS28), and specialised controls (PS64). Post-GUI configurations addressed extensions for natural user Interfaces (PS50, PS54, and PS68) and immersive projection (PS66). The KLM was also extended for web navigation on a television and for remote setup (PS47). Note that the percentages do not add up to 100% because one study (PS47) combined two setups.

5.2.3. Application Domain and Target Users

Several application domains were identified from the primary studies and grouped into high-level categories. Table 10 summarizes the recurrent domains. The most frequently examined domain relates to mobile or tablet applications (13 studies). Text and/or spreadsheet editing was the domain used to validate KLM [2, 6]; these studies were mainly conducted in the late 1980s to the early 1990s. Accessible interfaces were also examined to extend the KLM for interaction by blind users and users with motor disabilities. Navigating the web from various setups was also considered in the literature. IVIS setups were relatively popular (see Table 9) as a domain and considered tasks such as radio tuning, navigating lists, and using a global positioning system (GPS) for map navigation.

5.3. RQ2: What Was the Research Method Used to Extend the KLM?

This research question examines the research methods used to modify the KLM or any of its extensions. The question is addressed in two ways:(1)What was the research method used to extend the KLM operators and heuristics?(2)What was the research method used to modify or compute the KLM operators’ unit times?

Figure 6 demonstrates various research methods used to extend the KLM; these include experimentation, previous literature, observations, and several combinations of these methods. Twenty-seven (50%) of the studies did not use research methods to extend operators and modify heuristics. Observations were commonly conducted to identify or examine interactions (9 studies, 16.67%). Operators and heuristics were also extracted from previous literature (7 studies, 12.96%), and some studies used experiments (6 studies, 11.11%). Research methods were also combined. Several combinations were noted, including literature and experimentation, literature and observational studies, and observation and experimentation.

Figure 6 also illustrates the research methods used to modify the unit times of KLM operators. Of the 54 studies, only 12 (22.22%) did not utilise research methods. Over 50% of the primary studies (30 studies, 55.56%) conducted experiments to modify unit times. Eight studies (14.81%) relied on previous literature to adjust unit times. Additionally, research methods were also combined to extend unit times. These combinations include combining a literature search with either experimentation (4 studies, 7.41%) or observation (1 study, 1.85%).

5.4. RQ3: How Was the Original KLM Extended?

The reviewed extensions demonstrated numerous ways in which KLM was revised:(i)Adapting the original KLM operators: K, P, H, D, M, and R.(ii)Inheriting operators from other KLM extensions or previous literature.(iii)Introducing new operators.(iv)Formulating new equations to calculate execution time.(v)Updating heuristics.(vi)Computing domain-specific metrics.

It was also of interest to consider the operators that have been explicitly preserved in the extensions. The rest of this section discusses the operators, equations, heuristics, and metrics based on their intended device setup (see Section 5.2.2). Figure 7 collates the operators reported in the primary studies to identify their frequencies among the selected studies and device setups.

5.4.1. Traditional Setup

Thirteen studies were categorised as having a traditional setup: PS1, PS3-7, PS11, PS17, PS32, PS37, PS41, PS63, PS65. Figure 8 shows how the primary studies extended the KLM. The following sub-sections elaborate further on these operators, equations, and heuristics.

(1) Preserved Original Operators. was the most popularly preserved of the original operators. PS6, PS32, PS11, PS65, PS41, and PS17 used the unit times associated with various typing skills. The majority of these used the time related to the speed of an average skilled typist (0.2 seconds), while others utilised the value 0.28 seconds (average non-skilled typist). H was preserved by PS1, PS11, PS32, and PS65, while was used in three studies: PS1, PS17, and PS41. Four studies (PS1, PS32, PS41, and PS68) preserved the value of . PS17 aimed to increase the accuracy of the operator by utilising Fitts’ Law. Operator is system dependent and was often not utilised in the studies, yet it was still conserved.

(2) Adapted Original Operators. Some of the KLM operators were adapted through unit time adjustments or decomposition into finer tasks. PS7 updated the unit times of H, K, M, and . PS63 dissected into two actions: homing from the keyboard to the mouse and homing from the mouse to the keyboard. P’s unit time was updated in PS4, PS5, and PS7. A specialized P, PM(l), was introduced in PS11 to indicate pointing to the ith menu item. K and were the two most frequently updated operators. K was updated in PS1, PS3, PS5, PS7, and PS63, while was revised in PS3, PS5, PS6, PS7, and PS32. In PS5, M was decomposed into three mental actions: retrieval from memory, choosing among options, and executing a mental step. K was decomposed in PS3, where the unit times were 0.36 and 0.23 for two different spreadsheet tasks, respectively.

(3) Inherited Operators. PS11 inherited ten operators from previous extensions and prior literature: pressing a button B [45]; executing a mental step [17]; retrieving from memory, dragging to a menu item, and pointing to a menu item [77]; perceiving an image, reaction time of choosing an image, and eye movement [2]; menu search slope, intercept, and an overall value from an investigation into history tools for user support [52]; pressing a button and performing a button click [45].

(4) New Operators. New operators were introduced in 8 of the 13 traditional setup studies. PS1 identified several operators, including acquiring a task by looking at a certain manuscript and using the arrow keys to point to a location Ps. PS7 introduced RW as the time required to read a word from the screen. PS4 identified several new operators: choosing a target, planning a route, moving to the next window, and clicking the mouse. The symbol was introduced in three studies (PS3, PS41, and PS51) to represent two different operations: mentally scanning/searching the display and pressing a keyboard shortcut. The time it takes to listen to a spoken word was utilised in PS64. PS41 also established a new operator for the time it takes to press a navigation key when navigating websites.

(5) Updated Heuristics. Heuristics are commonly updated when new operators are introduced to revise placement. PS3 argues that commands issued through a series of menu choices involve a single rather than one for each menu choice, because the command forms a single cognitive unit. Using a history tool, PS6 stated that switching from typing to using the history tool includes an additional long-term memory retrieval. For another history tool studied in PS11, M placement was extended for formula tasks. The study also offered guidance for placing the new mental scanning operator.

(6) New Equations. In the KLM, task time is computed from the summation of the operators’ unit times (see (2)). Some studies modified these equations to consider additional elements that may affect execution time. Both PS17 and PS37 introduced new equations in their extensions. PS17’s authors formulated equations of various tasks that impact email archiving and retrieving. For word selection tasks, PS37 introduced equations to compute the time it takes to select a word given several variables, including scanning and scrolling time, word length, and the index of the selected word.

5.4.2. Key-Based Mobile

It was in the new millennium that interest in extending the KLM for mobile interaction and text entry became most evident. Twelve of the 54 selected studies modified the KLM to accommodate key-based mobile interactions (PS16, PS20, PS21, PS24, PS26, PS27, PS30, PS34, PS35, PS39, PS44, and PS46). Figure 9 illustrates these studies and the approaches they proposed to extend the KLM. The following subsections describe the changes made to extend the KLM.

(1) Preserved Original Operators. Several operators were preserved from the KLM: H, K, M, and . PS16 utilised the original H, K, and for predictive text entry on mobile phones. The KLM was further extended by the same authors in PS30 for five predictive text entry methods that preserved the original and R. R was also used as is by PS21 and PS46. An extended KLM for modelling speech navigation and text entry preserved K.

(2) Adapted Original Operators. PS26 extended the KLM for SMS input by dissecting into nine operators for various keys and repetitions. K was also decomposed in PS39 to reflect unique interactions with a Pinyin keyboard, an input method for Chinese text using the Pinyin method of romanisation. K’s unit time was revised in PS21, PS27, PS30, PS35, and PS44, the majority of which dissected the unit times based on the type of key and repetition. It should be noted that PS27 approached the KLM differently, assigning each key or repetition a score rather than a unit time. P has also been adapted and at times redefined. For instance, PS30 and PS46 modified to reflect pointing with a device to perform an action. PS44 considered for pointing to a keypad. While PS21 preserved P’s original meaning, M was adapted in both PS21 and PS44 and decomposed in PS34 to represent time delays during text entry and recognition. TPER from PS20 is also an adaptation of for text entry perception. H was revised for PS30 to consider the time needed to switch between listening/speaking on the phone and reading from the screen.

(3) Inherited Operators. Only four operators from three studies were inherited from the literature or another extension. Two of these replaced the value of K, the third updated the unit time for M, and the last re-used a value from a previous model for complex actions. PS24 enhanced the KLM to evaluate Korean text entry on a mobile phone where the values for and were inherited from Kim, Kim, and Myung [78] and John and Newell [79]. K’s unit time was also inherited from Silfverberg, MacKenzie, and Korhonen [80] to extend the KLM for message-text entry with a Greek corpus. Mobile KLM (PS30) was revised in PS46, which inherited the complex action operator to reflect tag-reading interactions.

(4) New Operators. Several new actions were recognised by half of the studies that extended the KLM for key-based mobile phones. PS20 introduced two new operators: waiting for the cursor to process when successive letters are entered from the same key in multi-tap text entry and the action of moving to another key. Similarly, PS26 utilised a wait operator for multitap entry. It also introduced MPHAlphaK (press and hold key), RPHAlphaK (repeat press and hold key), and InsertWord (insert word into corpus dictionary). Mobile KLM (PS30) extended the KLM with several operators: attention shift for various focus shifts, complex actions, gesturing with phone, finger movement, initial act, and a multiplicative factor for distraction. PS34 extended the KLM for speech text entry and introduced an action that reflected the time needed to consider/recognise a command and utter a syllable.

(5) Updated Heuristics. A number of studies updated the placement of and other perceptive actions to reflect interactions with a mobile phone. For Korean text entry (PS24), an is expected to occur both before and after entering a syllable. Moreover, an should not be placed before the next key since finger movement and the mental activity overlap. PS46 declared that should appear before cognitive chunks and that an is unnecessary before pointing at longer distances with respect to shorter ones.

(6) New Equations. When extending the KLM for text entry, new equations were commonly formulated based on the text entry techniques. Of the 12 studies, six modelled text entry on mobile phones (PS16, PS20, PS24, PS34, PS35, and PS39). PS16 introduced three new equations for traditional (multitap), predictive, and word-completion text entry with an English corpus. In PS35, the equation for predictive text entry from PS16 was reused for various word look-up techniques. For Greek text entry, PS20 formulated two equations to compare typical phone text entries against a newly developed approach. PS39 established an equation to compare the performances of two types of Chinese Pinyin input by integrating the KLM with other models. PS34 evaluated speech text entry compared with multitap and predictive text entry, in which several equations were constructed to consider time-out delays, number of words, and word options in predictive entry.

Two other studies formed equations in contexts other than text entry. The mobile KLM in PS30 proposed a new equation that took distractions of various severities into account. PS27 approached the KLM differently, presenting unit times as scores used to calculate the relative average efficacy, where the sum of the scores for each task is first divided by the number of tasks and finally multiplied by 100 to obtain a percentage.

5.4.3. Touch-Based Mobile or Tablet

Touchscreen interactions were considered in several KLM extensions as early as 2003. Eight studies from the selected 54 were identified (PS22, PS29, PS37, PS38, PS45, PS53, PS55, and PS56). Figure 10 illustrates how the new models were modified for extension. The following sub-sections describe the various updates applied to the KLM to represent touch-based mobile or tablet interactions.

(1) Preserved Original Operators. PS29 extended the KLM to measure the performance of a new interaction technique on a touch tablet under various conditions and with various styles. The model preserved R, as did PS53 and PS56. PS53 revised the KLM to accommodate a modern touchscreen interface, but preserved H, K, and . The original operators D, H, and were utilised by PS38 as an extended KLM for touch phone mobile interactions. The primary studies PS53 and PS56 both preserved M, and PS53 also utilised the and operators.

(2) Adapted Original Operators. One KLM extension, developed to model the performance of a new interaction technique, decomposed P, D, and R. P was subdivided as follows: point stylus at segment, point to command, and point to end the mark. Dc and Dm symbolise drawing a circle around a dot and drawing a mark, respectively. R was divided into switching modes and the time it takes the system to respond. K was adapted by PS22 to consider both key repetition and movement between keys. In testing a new keyboard design for Chinese text input, 1 Line (PS45), K was dissected into a key for each finger on both hands. Similarly, M was modified in PS38 to reflect mentally initiating a task, deciding or choosing, retrieving, finding, and verifying. The extended model also adapts into two actions: homing either a stylus or a finger to some location. PS56 modified to reflect a relatively long movement from one position to another on a touch mobile phone in network gaming.

(3) Inherited Operators. PS53 inherited two operators from mobile KLM (PS30): initial act and distraction. Gesture actions were inherited but adapted to reflect the time needed to physically form specialised gestures with one or more fingers. The same operator was also used by PS38 to represent holding a gesture for a certain application.

(4) New Operators. Numerous operators have been created to represent touch interaction. PS53 extended the KLM to form a touch-level model (TLM) for touchscreen and mobile devices and introduced several new operators: tap, pinch/zoom to zoom in/out, swipe, rotate, drag element, and tilt device. Tapping is a common interaction in touch interfaces that was also introduced in PS38, PS55, and PS56. Swipe, zoom, and drag actions were also identified in PS55 and PS56. PS22 utilised two new operators that consider the decision and recovery times for data entry using a soft keyboard. Flick was established in PS56 to identify quick, short dragging actions. This action was decomposed in PS45 to distinguish between flick down and flick up. New operators introduced for finger/stylus touch mobile devices extended the model to include flipping or sliding a keyboard, continuously holding a key down, pressing a key on the side of the device, and plugging and unplugging other devices.

(5) New Equations. A quarter of the studies formulated new equations to compute the execution times of various tasks. PS22 formed a new equation for text entry on a soft keyboard that considers the number of characters, shifted characters, and a transition between keys. PS37 introduced a new equation that computes the time it takes to select a word from a list given several factors: scanning and scrolling time, word length, and the index of the selected word in the list.

5.4.4. Traditional In-Vehicle Information Systems

Traditional In-Vehicle Information Systems (IVIS) typically consist of a screen surrounded by a series of keys, buttons, and knobs indented to perform tasks such as: turning the radio on, road navigation, navigating music lists, etc. Of the 54 primary studies, six were categorised as traditional IVIS (PS13, PS15, PS18, PS31, PS42, and PS43). Figure 11 shows how the operators were extended in the new models. The following subsections elaborate further on these operators, heuristics, equations, and metrics.

(1) Preserved Original Operators. Four of the five original KLM operators were preserved in the IVIS extensions. PS31 and PS42 document different stages of the same research that enhanced the original KLM for traditional IVIS systems; both studies preserved K, M, and . One other study, PS43, provides a model for rapid user interface prototyping in the IVIS context and incorporates a modified KLM for that purpose. Of the original operators, only was utilised from the KLM.

(2) Adapted Original Operators. was modified from its original values in PS31 and PS42 to reflect new homing interactions between the IVIS and the steering wheel. Similarly, PS15 decomposed into two operators, Rn and Rf, for reach-near (from the steering wheel to other parts of the wheel) and reach-far (from steering wheel to IVIS). It also presented age-adjusted unit times for older drivers. The study also dissected and into refined operators and replaced the original value of with 1.50 seconds and an age-adjusted value of 2.70 seconds. M was also modified by PS43 with two new values based on its placement after and their new turn operator. PS13 adapted for an enter keystroke along with a down keystroke. K was also modified by PS43 and PS18 by decomposing the original operator into specified actions.

(3) Inherited Operators. PS43 incorporated their extended the KLM with a prototyping model in which three operators were inherited from other extensions: H [69], F (move finger between controls), and attention shift from PS30. PS15’s reach-far operator was inherited by PS31 and PS42.

(4) New operators. Only two new operators were introduced in two studies, PS13 and PS43. A reading/decision operator was identified by PS13 to represent the time needed to read an IVIS menu and decide upon actions based on the menu’s depth and breadth. A turn operator was introduced in PS43 for tuning a dial (clockwise or counter-clockwise) at various degrees.

(5) Updated Heuristics. The placement of was revised to incorporate the turn operator introduced in PS43. The new heuristic dictates that should be placed in two different scenarios with two different values: after and both before and after the user turns a knob.

(6) New Equations. The equation to compute the time required to execute a unit task was revised in PS43. Their new equation considered age as a factor as well as visual and non-visual periods that are characteristic of driving and IVIS interactions. PS13 updated the original equation to consider the number of menus encountered and the number of downward scrolls required to read a list item.

(7) Domain-Specific Metrics. The occlusion technique is used to simulate common driving distractions that occur when using IVIS systems. In occlusion, users are asked to conduct tasks with an IVIS while wearing computer-controlled goggles that open and shut at regular intervals. This condition imitates the glancing behaviour of drivers who cycle between looking at the IVIS (vision) and driving (non-vision or occlusion) periods. Two metrics are computed by PS31 and PS42:(i)Total shutter open time (TSOT): the number of visual periods during occlusion trials with an IVIS.(ii)Resumability ratio (R): the degree to which an IVIS task can be performed without looking.

PS31 and PS42’s approach to modelling a unit task involved developing the model traditionally using their extended KLM, and then reassessing the sequence of operators by considering the vision/no-vision intervals.

5.4.5. Touch-Based In-Vehicle Information Systems

Touch-based IVIS systems feature a touch screen for navigating the IVIS. Of the selected studies, four extended the KLM for touch-based IVIS (PS14, PS52, PS61, and PS62). Figure 12 illustrates the various changes made to extend the KLM. These extensions did not explicitly preserve any of the original operators; thus, the following subsections discuss the adapted and original operators, inherited actions, new operators, and new equations.

(1) Adapted Original Operators. PS14 extended the KLM to revise the unit times previously measured in the literature. They argue that the values misrepresented the evaluated IVIS because the original values were based on a QWERTY keyboard. Therefore, in their study, K and were revised. For K, several values were considered: letters, numbers, cursor keys, enter, shift, and space. The revision also considered key repetitions. M was adapted to 2.22 seconds in their extension method. K was divided in PS61 to represent function key actions and their repetition. PS61 also considered age-adjusted unit times for these operators. R was adapted by PS62 for wait-while-loading and wait-after-loading, each of which were age adjusted.

(2) Inherited Operators. PS15 was revised in PS61 to model interactions with a touch-based IVIS. PS61 inherited and revised the following actions: cursor key pressed once, cursor key after first press, letter key pressed once, letter key after first press, number key pressed once, and number key after first press. The unit times were also adjusted for age. A flicking operator was inherited by PS62 to represent the act of moving a finger in the flick direction (this operator was inherited and revised from PS52).

(3) New Operators. PS52 considered the new flick operator in the context of navigating lists of contacts or albums, each of which are age adjusted. Several new operators were introduced by PS61 to re-evaluate the traditional IVIS model, including scrolling through a list, pressing and holding a key, dragging, and first and subsequent slider actions. PS62 developed an extended KLM that overcome a noted shortcoming of the occlusion methods used by PS31 and PS42. A variety of operators were introduced: flick/scroll return, pressing an on-screen button, quick flick, reach for button, reach for console, read instructions, reposition hand on knob, scroll, search, stop screen, turn knob, and wait for goggles in known and unknown locations to represent the time the user waits for a vision period.

(4) New Equations. PS14 determined the retrieval time of a destination from an IVIS, that involved keying in part of the destination name, scrolling through a list of names, or a combination of these approaches. Destination entry tasks were also considered that involved keying in a destination name or a longitude and a latitude. To aid in modelling the KLM extension, the study created a spreadsheet for both tasks in which predicted times were adjusted for age, lighting conditions, and destination. These spreadsheets were used to construct formulas used with equations to calculate the total predicted times for destination retrieval and entry tasks.

5.4.6. Specialised Setup

Specialised setups enhance a traditional device with domain-specific controls or involve more than one screen. Seven of the 54 primary studies were categorised as specialised setups (PS8, PS9, PS10, PS28, PS49, PS64, and PS67). Of these, PS8 and PS9 involve the same continuing study. These studies preserved/discarded/adapted original operators, inherited operators from the KLM or its extensions, introduced new operators or equations, updated the heuristics, and identified domain-specific metrics (see Figure 13). The remainder of this section describes the modifications applied to the KLM.

(1) Preserved Original Operators. Several operators were preserved from the KLM: H, K, P, and M. H was utilised by PS8, PS9, and PS28. PS28 also preserved the value of an average non-skilled typist (0.28 seconds). In PS9, M was used as is; however, it was not considered in their earlier work (PS8). P was utilised from the original KLM in both PS8 and PS9.

(2) Adapted Original Operators. The majority of studies in this section adapted operators from the KLM. PS64 developed a GOMS-HRA to dynamically assess the reliability of human operators in nuclear plants. The study introduced two operators, Dp and Dw, that are analogues of and represent the acts of making a decision based either on an existing procedure or without an existing procedure, respectively. H was adapted by PS67 to consider homing actions in hybrid interfaces—particularly for in-air devices such as the Leap Motion sensor. K was modified in PS8, PS9, PS10, and PS49. The first two of those studies also divided to identify the time needed to select a new function from a command menu and the time it take the system to close a polygon in a manual map digitising task. PS49 adapted for keyboard navigation by individuals with motor disabilities. An operator was identified by PS8 and PS9 that adapted to quantify two specialised pointing actions.

(3) Inherited Operators. The button click-and-release BB operator was inherited by PS49 from PS19, a study that was excluded due its low score in the review’s quality assessment phase. This was also the case for the operator used in PS28 and PS49.

(4) New Operators. PS64 extended the KLM to assesses the reliability of nuclear plant operators and introduced several new operators: performing a physical action on the control board or in a field, looking up required information on the control board or in a field, obtaining required information on the control board or in a field, producing or receiving verbal or written instructions, and selecting or setting a value on the control boards for fields. A Braille operator was introduced by PS28 to evaluate blind users’ interactions during web navigation. A new operator was identified by PS8 and PS9 to represent a button press on a specialised 16-button cursor used for map digitisation.

(5) Updated Heuristics. PS8 and PS9’s KLM extensions for map digitisation updated the placement of the operator to reflect their modifications to the original KLM operators. They suggested placing an prior to digitising with a snap function before deciding on the next vertex to digitise as well as when deciding whether the digitising task should be ended. For a zooming task, they recommended being careful with the operator because some users may require extra time.

(6) New Equations. PS47 provided a basis for an early comparison between keyboard navigation systems (including their newly devised KeySurf system)—particularly when used for tabbing and ID navigation—for people with motor disabilities. PS47 modelled the navigation system using updated equations that reflected the unique navigation requirements for such systems.

(7) Domain Specific Metrics. Human error probability (HEP) was used in PS64 to quantify the KLM operators instead of unit time. Their model approached the KLM differently by arguing for what they declared to be a more important measure of the performance of nuclear plant operators.

5.4.7. Post-GUI

In this category, two studies (one of which involves three primary papers) modified the KLM for post-GUI systems. In particular, it extends the KLM for a natural user interface (NUI) (PS57, PS59, and PS68), and for an immersive interface with a projector and mobile navigation (PS66). The rest of this section describes the modified actions, updated heuristics, and new equations in the reviewed studies.

(1) Preserved Original Operators. was preserved from the KLM by PS66 for modelling an immersive interface. The extended KLM for a NUI (PS59 and PS68) utilised the original R; however, during experimentation this value was ignored.

(2) Adapted Original Operators. D, while commonly discarded in other setup categories, was adapted by PS57 and modified to reflect drawing gestures in the air, as a user would in a NUI. M was adapted by PS59 and PS68, where its values were retrieved from earlier extensions [2, 45]. PS66 modified depending on various user and mobile tracking devices.

(3) Inherited Operators. A number of operators were adapted in PS59 and PS68 from prior literature. Two operators (Ms and Mp) were inherited from MacKenzie [81]. Both operators represent the mental act of preparing to execute subsequent physical actions in response to a stimulus or physical matching event. PS68 inherited from their previous work in PS57. The value of was inherited from Zeng, Hedge, and Guimbretiere [82] in PS59 and PS68 to denote the act of pointing to a target in a NUI.

(4) New Operators. Both main studies, as expected, introduced several operators to reflect the new interactions associated with their post-GUI interfaces. PS66’s immersive interface required several new operators to represent tasks such as asking questions while using the interface and included start and end of task, question, gap between questions and mentally preparing a response, searching for an answer, reading, and physical movement operators. The NUI KLM also introduced several new operators, some of which were shared in two studies (PS59 and PS68), including holding a hand position, tapping by pushing or moving the hand towards the front, swiping and preparing to swipe, grasping, releasing an open hand, preparing to move the hand from a resting position to the position where a drawing stroke begins, and retracting the hand from the position where the stroke finishes. PS68 later introduced two new operators to reflect the act of pulling and a hand-preference factor.

(5) New Equations. The extended model of PS59 and PS68 describes the execution of a NUI task using g-units. G-units are gesture units that identify the time between a hand movement and returning to rest. A single G-unit can contain several gesture phrases (g-phrases) as the hand moves into various position to achieve a stroke. The execution task of the model is the summation of the g-units, each of which is defined in several new equations. PS57 also introduced a new equation from the same study to represent the act of drawing gestures in the air.

(6) Updated Heuristics. PS68 updated the original heuristic rules for placing . Rule 0 was updated from the original to consider preparation and operators. Rule 2 was adapted to reflect that when a string of M’s belong to a g-phrase, all subsequent Ms excluding the first one should be deleted. Their updated heuristics also suggest that when a P follows a preparation action, then should be deleted (updated from Rule 4). Finally, t new rule was introduced (Rule 5) that stresses that when the model developed is unsure of placement, the number of operators should be emphasised over the placement of Ms.

5.4.8. Television

A single reviewed study (PS47) involved web navigation and text entry (both traditional and predictive) on a television set using a remote control. This study preserved three of the original KLM operators: K, H, and and considered two different keyboard layouts for text entry. P was adapted to represent the different layouts. A finger movement and a dynamic mental operator were introduced into the extended KLM; the latter considers the additional cognitive load of using a word prediction system. To formulate these text entry tasks, two equations were introduced for the two text entry methods, traditional and predictive, respectively.

5.5. RQ4: What Was the Research Method Used to Validate a KLM Extension?

The purpose of this research question is to identify the research methods used, if any, to validate the performance of an extended model. The original KLM publication conducted a user study to compare observed data and predicted the KLM’s results [2, 6]. The model’s performance was evaluated using root-mean-square percentage error (RMSPE), which was calculated as 21%. Of the primary studies, 51.85% (28 studies) conducted user experiments to validate their extended models.

Performance evaluations were commonly statistically analysed using several metrics (excluding PS41 and PS47). Table 11 summarises the statistics used to evaluate the performance of predicted data versus data observed from users. Correlation analyses were applied in 11 studies (39.29%), while RMSPE was adopted by 6 studies (21.43%). Other statistical measures utilised included contrast weights, mean absolute percentage error (MAPE), percentage difference, percentage change, ratio, regression analysis, and t-tests. Some studies combined more than one statistical method to confirm their results.

Performance measured via correlation analysis ranged in value from 0.48 to 0.98 among the eleven primary studies. RMSPE values were generally within the suggested KLM bound of 21%, excluding one instance in PS7 where the RMSPE was 31%. The percentage change ranged from -15% to 11% in studies utilising this measure.

6. Discussion

This section summarises the principal findings of this systematic review of KLM extensions. It also addresses the limitations of this review that may threaten its validity. Finally, a discussion of the implications of this review for research and practice is presented.

6.1. Principal Findings

The goal of this systematic review was to examine the purposes for extending the KLM, the methods used to extend the model, how the KLM model was modified, and the techniques used to validate the extended models. The principal findings of this review are as follows:(i)This review found diverse studies related to extending the KLM for various domains and device setups. However, the extent to which the KLM was rigorously extended varied based primarily on the purpose of the study.(ii)Some studies exhaustively applied research methods for the prime purpose of extending the KLM to new domains or setups or to adapt the models to current situations and technologies. Other studies applied the original KLM to evaluate their applications or devices and included new operators to modify the KLM.(iii)Many of the primary studies used controlled experiments to extend the unit times of the KLM or to create new operators.(iv)The majority of the studies did not include any type of validation for their extended models. From the studies that did report model validation, controlled experiments were often reported. Performance measures varied; however, the majority utilised correlation analyses, and RMSE (the measure originally used to validate the KLM) was the next most common.(v)Only a small number of papers compared the performances of their extended models against the original KLM to determine their effectiveness.(vi)The majority of the primary studies were categorised as mobile or tablet, followed by traditional setups and IVIS systems.(vii)Several software domains were modelled with extended KLMs; nevertheless, the majority were classified as mobile programmes.(viii)K and were two of the most commonly preserved and adapted operators, followed by P. D was almost entirely discarded by most extensions.(ix)There is a shortage of studies that address the accessibility needs of disabled users, post-GUI, and Windows-Icons-Menus-Pointer (WIMP) interfaces.(x)In the key-based mobile category, half the studies utilised the KLM to calculate text entry with various techniques such as multi-tap or predictive.(xi)Two of the selected primary studies substituted the unit times with other measures. PS27 replaced them with scores for each operator and PS64 utilised a domain-specific measure, HEP.

6.2. Limitations

As with other systematic reviews, this review was limited by the search terms and digital databases used. The review was also impacted by selection bias, publication bias, improper or inaccurate data extraction, and data misclassifications. Efforts were taken to alleviate these limitations including the following:(i)Setting a wider net with the search terms and digital databases. Database selection was influenced by the inclusivity of the databases, popularity, and recurrences of previous work related to predictive modelling.(ii)Publication and selection bias was overcome to some extent by including technical reports and MSc/PhD theses, which comprised the selected primary studies.(iii)Data extraction was repeatedly re-evaluated in weekly meetings by the reviewers to guarantee consensus and mitigate inaccurate data extraction and misclassifications.

6.3. Implications

The findings of this systematic review have implications for researchers who plan on refining current extensions or developing new extensions as well as for designers and developers who are considering using the KLM or one of its extensions to evaluate their computer systems.

For researchers, several gaps have been identified in the literatures that lend themselves to future revisions and investigations. Despite the spike in KLM extensions in the past two years (see Figure 3), much of the work done previously requires authentication and revisions for traditional setups and mobile phones. It is unlikely that the unit times measured in the early 2000s would still hold true with current processors and memories. Efforts should be made to re-evaluate useful models with the traditional setups utilised today as well as with mobile phones and tablets that are commonly used.

Tables 9 and 10 summarises the device setups and application domains of the 54 primary studies. While the summaries show a varied selection, several weak areas were identified. Device setups primarily focused on traditional setups, mobile, tablet, and IVIS systems. Despite efforts to develop post-GUI KLM extensions, a shortage still exists in studies that address new setups, including virtual and augmented reality, tangible user interfaces, physical interfaces, tabletops, large touch displays, and malleable interfaces. All of these setups have been in existence for at least a decade and are costly to develop; thus, they would certainly benefit from predictive models to determine performance in early design phases. While a reasonable array of application domains were investigated, the distribution of studies across these domains was uneven. Concentrated efforts were directed toward mobile applications and IVIS systems, leaving considerable room for further research into domains such as medical IT setups.

Extensions to the KLM commonly occur as a result of experiments to extract new actions and unit times. Figure 6 illustrates the research methods utilised by the reviewed studies to extend the KLM operators, heuristics, and unit times. However, when extending operators and heuristics, the majority of studies did not conduct experiments. While this could be expected for setups similar to the one used to extend and validate the original KLM, it is not ideal for new domains or device setups. Operators determined from normative actions could be useful but may fall short of detecting actions (particularly those relating to mental acts M) that are best observed. It is essential for an appropriate research method to be adopted to develop operators to measure human behaviours. When extending operator unit times, the majority of studies conducted experiments to empirically assign values to their adapted or new operators. This approach is also advisable for future researchers because it ensures accurate and up-to-date measurements. It should also be noted that combinations of research methods strengthen the findings by taking full advantage of their combined benefits.

Of the 54 primary studies selected, only half conducted validation studies to confirm the efficacy of their proposed extended models (see Table 11). At times the same experimental results were used for both extending and validating the models, which clearly lends itself to bias. When a new extension is proposed, it is vital that experiments be conducted to provide empirical evidence of the extension’s effectiveness. This calls for more controlled experiments to determine how well the proposed extensions perform. For extensions developed for traditional setups, a comparative assessment against the original KLM could be used to determine how well the new models perform against a stable usability model. Such a comparison might even be possible with setups that rely heavily on the original operators in the KLM.

A further finding was that the majority of reviewed extensions do not provide guidance or suggestions to help designers and developers apply the altered model to their product or computer system. Despite its simplicity, the application of the KLM or any of its extensions requires skill to ensure correct measurements of execution. Several tools (e.g., CogTool) have been developed to automate this process, but these are typically limited to traditional setups. Another observation from the review was that the expert level of users, in the case of most reviewed paper, was not disclosed; the users were merely declared as experts. However, what makes a user an expert? The answer to this question is highly subjective and depends on the perspective of the model developer. This in itself impacts the unit times collected for the operators and thus, the validity of the validation results. For researchers, we find that this issue could be mitigated by a clear definition of expertise that could be consistently applied across domains and device setups.

For designers/developers, we recommend the use of Table 12 to select an appropriate model given their products’ domain and device setup. All the studies listed in the table were ranked as Very High or High during the quality assessment phase and conducted experiments to extend and validate their models. It is also important to compare results from different extensions to determine the one best suited to the target users’ actions and perceptions. It should be noted that, at times, the KLM or one of its extensions may be unable to address all the human behaviour anticipated in a product. In this case, combining two or more models is possible but not recommended without thorough investigation.

7. Conclusion and Future Work

KLM is popularly used in the literature to evaluate system design early in the development phase to determine probable performance times for skilled error-free tasks. Over the years, several extensions have been created that modify the original KLM to consider revisions of the original operators, varied device setups, and varied domains. This paper presented a systematic review that summarises the existing KLM extensions developed in the literature. From an initial 2,444 studies, 68 unique publications were selected for the review. Information was extracted from the selected studies, which allowed the reviewers to obtain conclusions to identify common techniques, find research gaps, and construct guidelines.

In future work, we intend to extend this systematic review and plan for future research in various ways:(i)Perform a systematic review that addresses the research question “What publications have utilised the KLM or one of its extensions to evaluate the efficiency of their designs and how?" We intend to apply the information obtained from this review.(ii)Develop a methodology with a formal protocol for extending the KLM that ensures an exhaustively assessed model.(iii)Offer a guide for applying the KLM and its various extensions to guarantee correct application.(iv)Review the term “expert" in an attempt to provide a unanimous definition for skilled user behaviour in the KLM and its extensions.

Appendix

A. Quality Assessment and Data Extraction Form

Tables 13, 14, and 15 demonstrate the respective forms used to extract general information, quality assessment, and data from the primary studies.

B. Practitioners Guide to KLM Extensions

Table 12 presents a summary of the high-ranking (based on this review’s quality assessment) primary studies that utilised research methods to extend and validate their extensions in various domains and with various device setups. This list can help practitioners determine the best extension to use with their own products.

Conflicts of Interest

The authors declare that they have no relevant or material financial conflicts of interest that relate to the research described in this paper.

Authors’ Contributions

Shiroq Al-Megren is an assistant Professor of Information Technology at King Saud University. She holds a Ph.D. in Computing Science from the University of Leeds and an MSc from Newcastle University. She has published numerous research papers on human-computer interaction and participated in several conference and workshops. Her areas of interest include accessibility, tangible user interfaces, touch interaction, and smart distributed systems. Joharah Saeed Khabti is a lecturer in the Information Technology Department with a specialisation in Security and Web Development at King Saud University. She is a Ph.D. student holding a Master’s degree from George Washington University. Her research interests include image clustering, natural language processing, and human-computer interaction. Hend S. Al-Khalifa is a Professor at the Information Technology Department, King Saud University. She has contributed more than 120 research papers to symposiums, workshops, international conferences and journals. Moreover, Professor Hend has served as a program committee member at many national and international conferences and as a reviewer for several journals. Her areas of interest include semantic web technologies, computers for people with special needs, and Arabic NLP.