During conditionally automated driving (CAD), driving time can be used for non-driving-related tasks (NDRTs). To increase safety and comfort of an automated ride, upcoming automated manoeuvres such as lane changes or speed adaptations may be communicated to the driver. However, as the driver’s primary task consists of performing NDRTs, they might prefer to be informed in a nondistracting way. In this paper, the potential of using speech output to improve human-automation interaction is explored. A sample of 17 participants completed different situations which involved communication between the automation and the driver in a motion-based driving simulator. The Human-Machine Interface (HMI) of the automated driving system consisted of a visual-auditory HMI with either generic auditory feedback (i.e., standard information tones) or additional speech output. The drivers were asked to perform a common NDRT during the drive. Compared to generic auditory output, communicating upcoming automated manoeuvres additionally by speech led to a decrease in self-reported visual workload and decreased monitoring of the visual HMI. However, interruptions of the NDRT were not affected by additional speech output. Participants clearly favoured the HMI with additional speech-based output, demonstrating the potential of speech to enhance usefulness and acceptance of automated vehicles.

1. Introduction

1.1. Motivation

Partially automated vehicles (SAE Level 2; [1]) are already market ready by different automobile manufacturers, and the introduction of conditionally automated driving (CAD, SAE Level 3) is rapidly approaching. However, several recently published surveys conclude that there is still resistance to buy or use automated vehicles [25]. As potential benefits of automated vehicles on the environment and the traffic system can only become reality when the technology is actually used, increasing the acceptance of automated vehicles should be a primary concern of human factors research.

Acceptance is strongly related to the perceived usefulness of new technology [6, 7]. The aim of Advanced Driver Assistance Systems (ADAS) has traditionally been to assist the driver in the primary task of driving; however, the purpose of automated vehicles is to relieve humans from driving the vehicle altogether. During CAD, the driver is no longer required to monitor the traffic situation continuously. Driving time can thus be used for non-driving-related tasks (NDRTs), such as entertainment or office work. For example, Naujoks et al. [8] found in an on-road study that drivers engaged more heavily in NDRTs as the level of vehicle automation increased. Pfleging et al. [9] conducted a web-based survey on activities that drivers would like to perform during automated driving. The most often mentioned NDRTs were talking with passengers, looking out of the window, texting, eating/drinking, and surfing the Internet. Schoettle and Sivak [10] also reported that drivers would like to spend their time during an automated ride on activities like reading, texting/talking to friends, watching movies, and working. Similarly, König et al. [4] found that the possibility of engaging in other activities than driving was reported as the second most valued benefit of automated vehicles in a large-scale online survey.

However, to increase the safety and comfort of an automated ride, it may be necessary to present status information such as upcoming automated maneuvers (e.g., lane changes or speed adaptations) or the confidence level of the automation [1114]. As the driver’s primary task will likely consist of engaging in NDRTs, drivers might prefer to be informed in a nondistracting way because interruptions of ongoing NDRTs may be perceived as a nuisance (e.g., when drivers are required to retrieve information from attention demanding displays). In this paper, we explore whether balancing the driver’s need of being informed about the automated vehicle’s status, actions, and intentions with the desire to engage in NDRTs can be realized by adding speech output to the HMI and how this HMI alternative influences the perceived usefulness and acceptance of the HMI.

1.2. Background

CAD provides the opportunity to disengage from the task of driving for longer time periods without the need for continuous monitoring of the driving situation. However, unlike fully automated driving (SAE Level 5), occasional manual intervention may be necessary as a result of operational system limits (such as missing lane markings) or system failures (such as sensor malfunctions, cf. [15]). Successful human-automation cooperation requires fast and effective communication of the need for manual intervention in these cases (e.g., [1623]). Therefore, getting the driver back into the loop as fast as possible has been the focus of a large body of research (see [24], for an overview), as this function of the in-vehicle HMI can be viewed as a key element for the safety of automated vehicles.

Recently this focus has moved away from imminent take-over requests to exploring the potential of providing drivers with on-trip information related to the vehicle automation. For example, it has been proposed to display the time left in automated mode and the confidence level of the automated vehicle to further enhance the safety and efficiency of transitions to manual driving [2528]. Furthermore, conditionally automated vehicles may also be capable of managing certain noncritical driving situations without any driver intervention, such as overtaking or adaptation of the host vehicle’s speed. In such situations, it may be necessary to communicate the actions initiated by the automation by suitable HMI elements in an unambiguous way to avoid distrust and unnecessary manual interventions [29]. Increased understanding of how the automation works could eventually lead to increased trust [30], which in turn could improve acceptance of automated vehicles [31]. Accordingly, it is agreed upon by human factors experts that automated vehicles should inform their occupants about the vehicle’s capabilities and status [32]. However, there has been relatively little experimental research in this area apart from the above-mentioned design of take-over requests. For example, Beggiato et al. [11] report that drivers would like to stay informed about current and upcoming driving manoeuvres conducted by the automation. Forster et al. [33] conducted a pilot study with a newly developed HMI for CAD that explicitly displays the intentions and actions of the automated vehicle to the driver and reported a high usefulness of the HMI. Walch et al. [34] even suggest to involve the driver into the decision making process whether an automated manoeuvre should be carried out or not.

As promising as these approaches may be, notifying the driver about upcoming automated manoeuvres may also be perceived as a nuisance if it interferes with ongoing NDRTs that are carried out during the conditionally automated drive, such as office work or entertainment. This may be especially the case if the driver’s attention has to be directed away from the NDRT to perceive and interpret the HMI [35]. Research on detrimental effects of task interruptions in the workplace has a long tradition in work psychology and human-computer interaction [36, 37]. It has been shown repeatedly that task interruptions worsen primary task performance, for example, by increasing the time needed to accomplish the primary task, burdening working memory, or increasing error rates [38, 39]. They also cause affective discomfort, for example, by increasing subjectively experienced annoyance and anxiety [40, 41]. We thus argue that designing the HMI of automated vehicles with the changed primary task of the driver in mind may be a crucial factor for the perceived usefulness and thus acceptance of automated driving technologies and that not being able to engage uninterruptedly with NDRTs may prevent the potential benefit of automated vehicles to become reality.

At this point, it should be emphasised that the assumption that one of the main goals of using automated vehicles is being able to perform NDRTs during the automated ride is not undisputed. It could be the case that drivers prefer not to engage in any task at all, which would probably render a carefully considered way of presenting on-trip information unnecessary. However, as long as fully automated driving (SAE Level 5) is not reached, human drivers will be needed to occasionally guide the vehicle. Not being engaged in any task at all during the automated ride will very likely lead to drowsiness [42, 43] and make the driver unavailable for manual intervention. During CAD, it could thus be necessary to even involve the driver in some sort of activity to keep him/her in a suitable arousal state [44, 45]. Consequently, supporting that the driver stays involved in NDRTs by a suitable HMI could eventually even become relevant to driving safety.

1.3. Study Overview

It appears that the challenge in the design of a suitable HMI for CAD consists of balancing the need of the driver to be informed about the automation’s status with the desire to engage in NDRTs without being constantly interrupted. In view of this challenge, the current study investigated whether the usefulness of CAD can be improved by means of speech output that was added to the automated vehicle’s HMI. We expected that presenting information about upcoming automated manoeuvres would be less intrusive when semantic information is presented by speech output in addition to a visual-auditory HMI that only uses generic auditory output (i.e., standard information tones).

Visual-auditory HMIs have traditionally been used in the design of warnings as the multimodal presentation of warning signals usually speeds up the cognitive processes involved in the selection and execution of an appropriate response, such as braking or steering [46, 47]. The advantageous effect of presenting more than one stimulus at once that requires a reaction, the so-called redundancy gain, has been demonstrated repeatedly in cognitive psychology research [48, 49]. Another goal of multimodal warnings is to draw the driver’s attention to a visual display on which relevant information is presented if the driver’s gaze is not oriented towards that direction [50]. Multimodal take-over requests have consequently been shown to be superior to unimodal ones [19, 21]. However, it may be precisely these advantages of multimodal HMIs that would likely interfere with ongoing NDRTs during automated manoeuvres in which the driver is not supposed to intervene. We thus hypothesised that adding semantic output to the HMI would lessen the need to retrieve information from the visual HMI and offer the driver the opportunity to continue the NDRT without interruption.

Using a motion-based driving simulator, participants completed a conditionally automated drive while performing a common NDRT. During the drive, several situations either required manual intervention or were carried out by the automation independently. The driver’s engagement in the NDRT during system-initiated manoeuvres as well as the drivers’ subjective evaluations of the HMI was analysed. The visual HMI was designed and evaluated in a previous study [27, 33] and consisted of a visual-auditory interface that either used generic auditory output (i.e., standard warning and notification tones, condition: “generic”) or additional speech output (condition: “speech + generic”). It was expected that additional speech would enhance the human-automation cooperation and that the participants would be less inclined to interrupt the NDRTs. The main objective was to investigate whether drivers would benefit from the additional speech output during the automated manoeuvres in a way that they would have to interrupt the NDRT unnecessarily to a lesser degree. However, the additional speech output could also be perceived as unnecessary and annoying [51]. The design of the study is presented in the next section.

2. Method

2.1. Participants and Driving Simulator

The sample consisted of 10 male and 7 female participants adding up to a total of drivers (age: ; SD = 8.1; Min = 22; Max = 56). All participants had taken part in a driving simulator training that aims at improving handling quality of the simulated vehicle and reducing motion sickness.

The study was conducted in the motion-based driving simulator at the Wuerzburg Institute for Traffic Sciences (WIVW, see Figure 1) using the simulation software SILAB. The integrated vehicle’s console contains all the necessary instrumentation and is identical with a production type BMW 520i with automatic transmission. In order to simulate a realistic steering torque, a servomotor based on a steering model is used. The motion system uses six degrees of freedom and can briefly display a linear acceleration up to 5 m/s2 or 100°/s2 on a rotary scale. It consists of six electropneumatic actuators (stroke ± 60 cm; inclination ± 10°). Three LCD projectors are installed in the dome of the simulator and provide the projection. Three channels provide a 180° screen image. LCD displays serve as exterior and interior mirrors. The driving simulation software SILAB developed at WIVW was used for environment visualization as well as for simulation of assistance systems, traffic, and vehicle dynamics.

2.2. Human-Machine Interface

The visual part of the HMI is shown in Figures 2 and 3. Blue lane symbols in the centre of the HMI indicate that the lateral guidance is carried out by the CAD function. The length of a blue rectangle shows the set distance to vehicles ahead. This part of the proposed HMI resembles that of existing HMI solutions for ACC with additional steering assistance (e.g., [20]). The set speed (1a) and current speed (2) are displayed. If the driver changes the set speed, the new set speed is depicted. If a traffic event, such as an upcoming speed limit, requires speed adaptation, this is displayed to the driver in advance by a message box on top of the HMI (3) that includes a symbolic representation of the traffic event (4) and the distance to the traffic event (5). Automated speed adaptation is depicted by marking a line through the set speed (1b) until the speed limitation event is over.

The HMI for displaying automated manoeuvres is shown in Figure 3. For example, these could signal an in-lane avoiding manoeuvre (upper part of Figures 3(a), 3(b), and 3(c)) or a lane change manoeuvre (lower part of Figures 3(a), 3(b), and 3(c)). When approaching the manoeuvre, the situation is announced to the driver (Figure 3(a)). To clearly communicate that no manual intervention is needed, the same blue colours used in the normal operating state are displayed. The type of traffic event [28, 52] and the remaining distance to the event are also announced to the driver. The preparation stage (Figure 3(b)) informs the driver about the specific manoeuvre the system plans to carry out. The cyan arrow and the text message above the main state indicate that the automation is planning to execute the manoeuvre. Subsequently, the execution of the manoeuvre is also communicated by a text message and blue colouring of the situation specific arrow (Figure 3(c)). Visual information is provided in a Head-Up Display (HUD). At this point, it is important to emphasise that the usefulness of the visual HMI was previously demonstrated and further improved in a prior study [27, 33].

In addition to the visual display, generic auditory output was presented together with the visual announcement of the traffic event. The generic auditory output (condition: “generic”) consisted of two tones (duration: 150 ms; frequency: 1000 Hz; interval: 150 ms). In another experimental condition, the generic auditory output was accompanied by speech output (condition: “speech + generic”). The speech output followed the presentation of the generic auditory feedback in this condition. Instead of generating machine-based speech output (e.g., text-to-speech), a female voice (cf. [53]) was recorded using a dictaphone. The speech output verbalised the information about upcoming system manoeuvres. Speech output was recorded in German. The exact wording translated from German into English is shown in Table 1.

2.3. Driving Situations and Experimental Design

Participants drove in the conditionally automated mode on a three-lane highway with moderate traffic density. The drive lasted approximately 15 minutes and included three driving events (Figure 4) that required communication between the CAD function and the driver:(i)Avoiding: lost cargo on the right lane. The CAD system adapts its lateral position and avoids the obstacle on the road(ii)Speed limit: adaptation of the host vehicle’s set speed (from 120 km/h to 80 km/h) due to a speed limit change(iii)Lane change: CAD system recognising highway intersection ahead and changing lane to the right in order to follow the route.

In addition, one take-over scenario was included in the drive. The results pertaining to the take-over scenario are not part of this paper and will be reported elsewhere [54]. A within-subject design was used. All participants completed the simulator drive twice, with and without speech output. The participants were randomly assigned to the test sequences. Eight drivers completed the condition with speech output first and nine drivers completed the condition without speech output first. Within the drives, the different driving situations were encountered in randomised order. As all participants took two test drives, they spent approximately 30 minutes in the driving simulator. Before the test drive, they were welcomed by the experimenter and gave informed consent. Between the drives, they were given the opportunity to rest. The whole test session took about 45 minutes.

Participants were instructed to complete the simulator drive using the CAD function. They were instructed on how to activate and deactivate the system, but they were not given any advanced information about the HMI. The participants were instructed that the automated driving function would carry out the driving tasks completely and that they would be informed by the automation if manual intervention was necessary. They were not given any information about the automated driving manoeuvres they were about to experience during the drive. They were also asked to read articles in a magazine during the automated drive that were selected from a weekly German news magazine. To increase participants’ motivation to carefully read the articles, they were told that their knowledge of the articles’ content would be tested after the drive.

2.4. Dependent Measures

The dependent measures are listed in Table 2. The amount of task interruptions was assessed by trained experimenters. Specifically, the behaviour of the drivers was assessed by observing whether they would interrupt the NDRT during the drive. The experimenters rated the amount of interference of the task on a previously developed rating scale as shown in Table 2. The experimenters were instructed on the use of the scale prior to the study, but they were not informed about the hypothesis that additional speech output leads to a decrease of NDRT interruptions. The amount of interference with the NDRT was rated directly during the test session by the experimenter. Our main interest was whether drivers interrupted the NDRT or not (categories: “interruption of NDRT and looking ahead, magazine in hand” and “interruption of NDRT and looking ahead, putting magazine aside”).

In addition, the driver’s glance behaviour was analysed as an indicator of how much they would interrupt the NDRT and monitor the vehicle automation. Monitoring behaviour was operationalized through the so-called monitoring ratio [55, 56]. The monitoring ratio is reflected by the total duration of glances scaled to the duration that drivers could possibly work on the NDRT during the driver-system interaction process (observation time). The observation time started with the announcement of the system manoeuvre and ended with its execution. The observation time varied between the scenarios, as each of them took a slightly different time to be completed by the automation (approximate duration: “lane change” = 25 s; “avoiding” = 20 s; “speed limit” = 12 s)According to Hergeth et al. [57], a high monitoring ratio reflects difficulty of information extraction. Due to the NDRT it was not possible to reliably assess gaze behaviour by eye tracking since the magazine covered a significant portion of the necessary field of view of the remote cameras. Therefore, video data was coded retrospectively according to a standardized manual by a data reductionist [58]. Video recordings had a resolution of 1280 × 720 Pixels. Therefore, glancing behaviour could be well observed by the reductionist. The video coding tool provided the possibility of slowing down playback speed to a minimum of 10%. Accurate temporal resolution of the coded data was therefore supported. The end of the execution stage marked the end of the coding episode.

In addition, participants were asked to evaluate the usefulness of the visual and auditory output of the HMI, as well as the visual workload of retrieving information from the HMI. Assessments of usefulness and visual workload were recorded after completing the respective scenario during the drive. Specifically, the drivers were asked to answer the questions shown in Table 2 after the automated manoeuvres were fully completed, so that answering the questions did not interfere with the measurement of the driver’s monitoring behaviour. A 15-point scale ranging from 1 (very little) to 15 (very much) with an additional category (0 = not at all) was used. Lastly, acceptance of both variants of the HMI was assessed by asking the drivers whether they preferred the HMI with or without speech output during a follow-up interview.

2.5. Inferential Statistics

The level of interference with the NDRT associated with processing the HMI outputs was analysed using aχ2-test on the frequency of behavioural observations in the two HMI conditions. Monitoring ratio as well as subjective assessments of the HMI (usefulness and workload) was analysed by full-factorial mixed between-within ANOVAs with the within-subject factors “HMI condition” (“speech + generic” versus “generic”) and “driving situation” (“avoiding” versus “speed limit” versus “lane change”). The order of the drives (“first speech + generic” versus “first generic”) was included as a between-subjects factor. Violations against the sphericity assumption were checked using Mauchly’s test. Effect sizes were indicated by computing partial , Cramer’s , and Cohen’s [59].

3. Results

3.1. Missing Cases and Violations of Sphericity Assumption

In one case, the driver stated that he could not answer the rating items on the usefulness of the visual HMI as well as the visual workload as he did not pay attention to it. Concerning the assessment of monitoring ratio, it was not possible to assess the drivers’ eye movements in two situations. Missing data were replaced by the cell mean.

Regarding monitoring ratio and the self-report measures, violations against the sphericity assumption were checked. As can be seen in Table 3, there were no violations against the sphericity assumption and thus no necessity to adjust the degrees of freedom of the ANOVAs that are reported in Table 3.

3.2. Interference with NDRT: Observations and Monitoring Ratio

Table 4 shows the frequency of behavioural observations for both “speech + generic” and “generic” auditory output during the system manoeuvres. In almost half of the cases (i.e., 45% of the observations), the NDRT was interrupted (coding category 5 or 6) at some point during the automated manoeuvres. A continuation of the NDRT was found in about one-third of the observations (31% of observations with coding category 2 or 3). The drivers alternated between carrying out the NDRT and looking ahead in the remaining cases. There was no difference in the frequency of the observed interruption levels between the “speech + generic” and the “generic” condition (, df = 5, , and ).

As can be seen in Table 5, the only statistically significant effect of the experimental factors on the drivers’ monitoring ratio was an interaction effect between the order of the drives and the HMI condition. Therefore, separate ANOVAs for both test sequences (“speech + generic first” versus “generic first”) with the within-subject factors “HMI condition” and “driving situation” were conducted. There was no difference in the monitoring ratio between the “speech + generic” and “generic” condition when drivers experienced the automated manoeuvres with additional speech output first (F(1,7) = 0.42, , and ; “speech + generic”: M = 0.56, SD = 0.27; “generic”: M = 0.51, SD = 0.25). However, they spend more time looking at the HMI in the condition without speech output when they experienced this condition first (F(1,8) = 7.05, , and ; “speech + generic”: M = 0.55, SD = 0.26; “generic”: M = 0.79, SD = 0.22). Taken together, the results can be interpreted in a way that drivers need less time to extract relevant information concerning the upcoming scenario when speech output is presented than when only generic information is presented via the auditory channel, but this effect is counteracted by familiarity with the driving situations.

3.3. Self-Reported Usefulness, Visual Workload, and Acceptance

Subjective assessments of the usefulness of the visual display were on a high level (Figure 5(a)) with significantly higher ratings in the “generic” condition (“speech + generic”: M = 10.28, SD = 3.24; “generic”: M = 11.59, SD = 3.37; main effect “HMI condition,” see Table 6). There were no statistically significant effects of the test sequence and the driving situation. It thus appears that the redundant information provided by speech output caused drivers to rely less on the visual information provided by the HMI.

Usefulness ratings of the auditory output (Figure 5(b)) were also on a high level (“speech + generic”: M = 12.92, SD = 3.50; “generic”: M = 11.18, SD = 3.29). However, the usefulness of the auditory HMI output of the two HMI conditions was rated differently in the three test situations (interaction “HMI condition” “driving situation”, see Table 6). At a descriptive level, participants reported a higher level of usefulness of the auditory “speech + generic” output in the avoiding (, df = 16, , and ) and lane change (, df = 16, , and ) situations than in the speed limit situation (, df = 16, , and , see Figure 5). Additionally, drivers that experienced the “speech + generic” condition first rated the auditory output to be less useful (M = 10.67, SD = 4.21) than those that experienced the “generic” condition first (M = 13.28, SD = 2.16; main effect “sequence,” see Table 6). There was no statistically significant effect of the driving situation.

The visual workload (Figure 5(c)) was rated on a higher level in the “generic” condition (M = 11.31, SD = 3.33) compared to the “speech + generic” condition (M = 8.68, SD = 4.58, main effect “HMI condition,” see Table 6), suggesting a relief of visual workload when presenting situation specific semantic information. There were no statistically significant effects of the test sequence and the driving situation.

Regarding the acceptance of the two HMI alternatives, participants clearly favoured “speech + generic” () over “generic” output (). The only participant in favour of the “generic” output indicated that speech output could be annoying over time when occurring too frequently.

4. Conclusions

The current study investigated whether speech output could improve human-machine cooperation in the area of CAD. While communication of transitions from CAD to manual driving has received considerable research interest, enhancing system transparency by communicating upcoming automated manoeuvres has not been studied extensively yet. A total of 17 participants completed the same driving simulator course twice while interacting with a system that only applied generic auditory feedback (“generic”) and with another one that incorporated speech output in addition to the generic auditory output (“speech + generic”). We investigated whether additional speech output would facilitate human-automation cooperation by effectively informing the driver about upcoming automated driving manoeuvres and would, therefore, cause less interference with the execution of a NDRT. It may be precisely the possibility of engaging in NDRTs without the need of interruptions that would make automated driving useful and attractive.

There was no difference in the frequency of observed interruptions of the NDRT between the “speech + generic” and “generic” condition. When analysing the participants’ glance behaviour, an interaction effect between the HMI condition and the test sequence was found. It became apparent that drivers spent more time looking at the HMI in the “generic” condition compared with the “speech + generic” condition when the HMI without additional speech output was experienced in the first drive. However, there was no increase in the time spent looking at the HMI in the “generic” condition when the participants were already familiar with the automated manoeuvres because they had experienced them during the first drive in the “speech + generic” condition. Independent of the test sequence, participants reported lower visual workload originating from reading and interpreting the visual HMI when speech output was presented compared with generic auditory output. Taken together, these results suggest that, with additional speech output, drivers can stay more focused on NDRTs and do not have to monitor the visual component of the HMI and the traffic situation as much as with “generic” auditory output. However, it also appears that the lower effort of information retrieval did not cause them to stay more engaged in the NDRT. When considering the self-reported usefulness of the auditory feedback of the HMI, the “speech + generic” output was rated as more useful compared to the “generic” auditory output in most of the driving situations. In contrast to the unspecific “generic” feedback, the semantic speech output apparently facilitated retrieving information that was relevant for understanding the system’s intentions and actions during the conditionally automated drive. Consequently, the visual component was considered less useful in the “speech + generic” condition because important information about the upcoming manoeuvre could be derived from the semantic speech output. Finally, a strong preference for the system with speech output was found, which emphasises the advantages of semantic feedback. Taken together, the application of semantic auditory output seems like a promising HMI alternative that can increase usefulness and acceptance of automated vehicles. However, significantly more research is needed in this area before such HMIs can be thoroughly recommended.

First of all, it should be noted that the benefits of speech output might have been caused by the study design. As the participants were engaged in a primary visual NDRT, speech output may have particularly facilitated processing the visually presented HMI. This study setup limits the generalisability of the results in two ways. First, the benefits of speech output may decrease when information retrieval from the visual HMI is facilitated. For example, it may have been particularly hard to retrieve information from the HUD which is reflected in the driver’s ratings of visual workload. Enhancing the ease of information retrieval (e.g., by using a bigger font size) could weaken the benefits of additional speech output. Furthermore, integrating the visual information related to the automated driving feature and the presentation of the NDRT into the same visual display may render the additional speech output obsolete, as there would be no more need to take the eyes off the display. The benefits of speech output could even turn into a disadvantage when drivers are engaged in other NDRTs than the one investigated in this study, especially when the NDRTs draw on the driver’s auditory attention, such as carrying on a conversation with another passenger or a person on the phone. In this case, speech output may be perceived as a nuisance and other ways of keeping the driver in the information loop that do not interfere with processing the NDRT have to be found. In this light, the results of the present study highlight the importance of avoiding sensory crosstalk between the execution of the NDRT and retrieving driving-relevant information of the HMI. From this point of view, the role of the driver has changed from previously manually controlling the vehicle while occasionally performing secondary activities to performing NDRTs and occasionally attending to relevant HMI outputs. The HMI design for CAD should take this change of the driver’s role into account.

Second, it should also be emphasised that a decreased monitoring ratio might turn into a disadvantage if drivers fail to notice system malfunctions during automated manoeuvres quickly enough as a result of insufficient system monitoring. During CAD, the driver is in theory no longer responsible for constant monitoring; however, when HMI outputs that do or do not afford manual control are not designed sufficiently different from each another, the efficiency of take-over requests could be reduced in a safety-relevant manner. As the safety of transitions from automated to manual driving should always be given the highest importance, unintended side effects of giving feedback about the automated vehicle’s intentions and actions should be thoroughly investigated before introducing them to the automated vehicle’s HMI.

Third, future studies should also investigate whether speech is still favoured over generic output after longer periods of system usage, especially when drivers are already familiar with the system. The study at hand only used a relatively short study time that cannot account for behavioural effects that would occur in the long term. For example, it is quite possible that the generic auditory feedback was not self-explanatory enough, but with more experience with the CAD function, drivers will possibly learn to interpret the initially abstract meaning of the generic feedback. The fact that we found no increased monitoring ratio in the “generic” condition when drivers were already familiar with the automated driving manoeuvres emphasises this argument. On the other hand, it is quite possible that the introductory period of automated vehicles will be of crucial importance to the acceptance of automated driving features and that the success of the technology will depend on whether inexperienced users will initially judge it to be useful or not.

Last, it should be emphasised that the relatively small sample size of the study may have limited the validity of the results as it may have caused a Type II error. Specifically, the fact that additional speech output did not lead to a significantly lower frequency of task interruptions may be due to the low power of the study. Assuming a medium effect size, more observations would have been necessary to be able to detect an effect with sufficient confidence (according to [60], 143 observations are needed to achieve sufficient power of a -test with 5 degrees of freedom). Consequently, the results of the current experiment should be treated with caution and replications with bigger sample sizes are needed before definitive conclusions can be drawn. Furthermore, the rating procedure used to determine the level of interference with the NDRT could possibly be improved. In this study, the experimenter rated the interference level during the drive. However, it may be possible that a more reliable rating could be obtained from video ratings.


This paper is based on an initially published workshop contribution [61].

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This work results from the joint project Ko-HAF-Cooperative Highly Automated Driving and has been funded by the Federal Ministry for Economic Affairs and Energy based on a resolution of the German Bundestag.