Motivated by the differences between human and robot teams, we investigated the role of verbal communication between human teammates as they work together to move a large object to a series of target locations. Only one member of the group was told the target sequence by the experimenters, while the second teammate had no target knowledge. The two experimental conditions we compared were haptic-verbal (teammates are allowed to talk) and haptic only (no talking allowed). The team’s trajectory was recorded and evaluated. In addition, participants completed a NASA TLX-style postexperimental survey which gauges workload along 6 different dimensions. In our initial experiment we found no significant difference in performance when verbal communication was added. In a follow-up experiment, using a different manipulation task, we did find that the addition of verbal communication significantly improved performance and reduced the perceived workload. In both experiments, for the haptic-only condition, we found that a remarkable number of groups independently improvised common haptic communication protocols (CHIPs). We speculate that such protocols can be substituted for verbal communication and that the performance difference between verbal and nonverbal communication may be related to how easy it is to distinguish the CHIPs from motions required for task completion.

1. Introduction

Human teams generally employ a variety of communication mechanisms to effectively complete cooperative manipulation tasks. Consider a scenario where two people are moving a table across a room. Intuition suggests people are capable of completing such a task using only haptic information; yet they often naturally employ voice commands to share their intentions, resolve conflicts, and gauge task progress. These two types of communication, haptic and verbal, differ in many respects. Sensed forces can be considered quantitative, high-bandwidth communication signals while verbal communications are sporadic and more qualitative—often contextual—in nature (e.g., “Move a little to your left”).

Cooperative manipulation is also a popular application for multirobot teams; yet, in contrast to human teams, engineered systems rarely employ sporadic, contextual, nonhaptic communication (verbal or otherwise). In one class of multirobot coordination work, communication between the robotic partners is strictly implicit, using only the forces transmitted through the object itself   [1, 2]. A similar approach is often used for human-robot teams. The human moved the object in a desired direction and the robot uses only haptic information, derived from a force sensor, to assist the human operator [35].

A second class of approaches to multirobot manipulation [6] requires a lead robot to constantly stream a high-bandwidth feed-forward signal to the followers to ensure stability during the acceleration and deceleration phases. However, it is apparent that humans do not rely on quantitative feed-forward acceleration signals when working together. Instead, human partners seem to rely on qualitative (verbal) information in addition to any available haptic information.

The sharp differences between natural and engineered cooperative manipulation strategies in their use of nonhaptic communication motivate the human behavioral study described in this paper. In particular we seek to quantify the impact of verbal communication on both performance and task loading of human-human teams as they engage in manipulation tasks. While several previous studies have investigated such tasks, none to our knowledge explicitly treats verbal communication as an experimental factor. Understanding the role of verbal communication may help robotics researchers develop more natural human-robot interaction protocols for such tasks.

The remainder of the paper is organized as follows. Section 2 reviews other efforts at understanding both haptic and nonhaptic communication in human-human teams. At the end of the section, specific research questions are posed. Section 3 describes our experimental design and procedures. Section 4 describes how the performance of the team was quantified and presents a statistical analysis of the team’s performance and workload assessment. Sections 5 and 6 describe a follow-up experiment. Finally, Section 7 provides some concluding discussion.

In this section we review work done on human-human cooperative manipulation establishing haptic communication as being beneficial (Section 2.1). We note the paucity of controlled manipulation studies involving the role of verbal communication (Section 2.2). The experimental design introduced in Section 3 was informed by the apparent ubiquity of the leader-follower architecture in both human-human and human-robot collaborative tasks (Section 2.3).

2.1. Haptic Communication in Human-Human Teams

If both individuals are physically connected to an object, the obvious modality of communication is haptic [7]. There have been numerous studies on the role of haptic communication in virtual tasks [811]. While the different studies analyzed various performance measures, the conclusions indicate that there is improved performance when haptic feedback is provided. Additionally, [12] provides further details and many references on haptic cooperation between two individuals. Groten et al. [11] also show in their work that intention integration benefits from haptic communication. As a result, we begin with the premise that haptic feedback is beneficial and, as such, is available in all of our experiments.

In [13], the authors estimated the impedance and passivity of two humans moving an object. They noted that there are significant differences between the gross transport phase and the fine positioning phase of the motion. The follower actively assisted in the gross motion phase but provided higher damping forces in the fine positioning phase. They speculated that an abrupt deceleration by the leader, prior to the fine positioning phase, might have been an attempt to signal the desired change in impedance. In our experiments, we also analyze abrupt changes in speed and direction. Our paper extends the notion that a teammate can provide haptic cues or gestures that represent an attempt to explicitly communicate with the follower. In our work we refer to them as common haptic improvised protocols, or CHIPs.

2.2. Verbal Communication in Human-Human Teams

The benefits of verbal communication have been illustrated for other types of tasks in which the partners do not have haptic information. For example, in tasks that do not involve manipulation [14, 15] or in virtual manipulation tasks lacking force feedback [1618], verbal communication can be used to signal transitions.

However, few works have specifically investigated the role of verbal communication in physical manipulation tasks. In [8], it is concluded that individuals use the haptic channel successfully to communicate with each other and integrate their intentions as they move a virtual ball on a path. In these experiments, individuals wore noise canceling headsets and were not allowed to speak with each other. They only had visual and haptic information available to them. In many studies either the subjects are not permitted to speak at all [11, 19] or verbal communication is not consistently controlled or recorded [20, 21]. There has been work done by Huang et al. [22] that specifically focuses on auditory (not verbal) cues. The auditory cues that were tested were different sounds that were coupled with specific actions. The experiment involves two individuals collaborating in a virtual environment. Half of the groups had audio, visual, and haptic information available while the other groups only had visual and haptic information. The study uses time for completion of the task as a performance measure and concludes that adding auditory cues makes task completion faster. However, in both conditions of the experiment, individuals were able to freely talk to each other. Thus, they asked clarifying questions and provided verbal guidance.

Thus, our experiments involve physical manipulation of an object by two people where haptic feedback is always available and verbal communication is isolated via different experimental conditions. Additionally, we record every verbal communication that occurs between the partners.

2.3. Leader-Follower Architectures for Cooperative Manipulation

In addition to the studies mentioned above, there is considerable research in the psychology literature on dominance in human-human teams suggesting that a leader-follower paradigm arises quite naturally in dyads. For example, the social relations model (SRM) [23] proposes a three-factor model in which dominance is determined both by the inherent tendencies of the individuals and by components specific to the particular dyad and task. The model has been studied in the context of a manipulation task [24], and it was found that humans naturally prefer to work in a team where one partner is slightly more dominant. Similar models have been implemented in shared control architectures as well [21, 25, 26].

When designing our experiment, we use an information structure analogous to the most common human-robot cooperative manipulation paradigm. Frequently, the human plays the role of the “leader” who dictates the direction of motion and possesses knowledge of the target position. The robot plays the role of the follower and usually has no a priori knowledge of the target location. In many of the cases, the robot is responsible for maintaining grasp and contact forces and helping support the mass of the object (see, e.g., [2729]). Based on the ubiquity of the leader-follower architecture in human-robot and human-human teams, we chose to design our experiments in such a way that only one member of the team was given the target information.

Perhaps the most closely related work is [30] in which the researchers use an experimental task very similar to our second experiment (Section 5), requiring the two participants to manipulate a rod in the vertical plane. Here, too, one participant is designated as the leader. These researchers recorded force data and concluded that, during vertical motions, the dyad more heavily relies on visual feedback rather than force feedback. For this reason we did not include force measurements in our experiments. One important difference between that work and our Experiment 2 is the role of orientation. In our experiment the goal is to control the object’s height and pitch angle. This renders the task uncontrollable by a single individual. We feel that this is one of the key differences and motivates the use of common haptic improvised protocol (CHIP), one of the key findings of this paper.

2.4. Research Questions

Based on this review of previous work, we choose to explicitly study the role of verbal communication in a two-person manipulation task in order to answer the following questions.Q1: How does the addition of verbal communication between the leader and follower change the performance of the dyad?Q2: How does the addition of verbal communication between the leader and follower change the perceived workload of each member of the dyad?

3. Experiment I: Description

We chose to study a task where two individuals moved a large object to a sequence of target locations. The task was designed to require close collaboration between the partners since the table was too large for a single person to manipulate.

3.1. Task Description

Figure 1 shows an overhead view the experimental setup. The experiments were done in a large open room, approximately 8 meters by 5 meters. Laser pointers were mounted on the ceiling and aimed straight down to create four different illuminated spots, representing target locations, on the floor. The targets were equidistant from each other (2.557 meters, with the exception of the path from 1 to 3 which was not used). A team of two people was asked to carry the table to a prescribed sequence of targets, as quickly as possible. The table was approximately 0.75 m by 1.2 m and weighed 18 kg. The target was considered “reached” once the projected laser spot appeared inside the black circle mounted on the table top (0.3 m in diameter).

3.2. Data Acquisition

The table was tagged with 14 retroreflective, 1 cm, spherical fiducials so that its position and orientation could be recorded via an 8-camera Vicon motion capture system. The cameras used infrared filters and strobes to robustly localize the markers at 20 Hertz and the system calibration file reported a precision of 8 mm (95% confidence). In addition, a time synchronized video recording was made of the experiment, with the audio track providing a record of the verbal exchanges between the team members.

3.3. Information Structure and Communication

One of the two individuals was randomly designated as the leader for the duration of the experiment. The experimenter gave the leader a sequence of 8 targets. The other individual was considered the follower. The experimenter did not provide the target locations to the follower; in addition, they wore a blindfold during the trials to prevent them from seeing the lasers, their partner’s facial expressions, or body language. We tested two different experimental conditions.(1)Haptic only: neither partner was permitted to speak.(2)Haptic-verbal: both partners were permitted to speak.Under both of these conditions, the follower experiences similar sensing and informational constraints as a robotic partner in a traditional human-robot manipulation architecture, and the Leader plays the role of the human partner. Haptic only mimics the traditional setup where the human does not issue verbal commands to the robot, while haptic-verbal represents what we feel would be a technologically plausible enhancement to the traditional human-robot manipulation architecture.

3.4. Participants, Procedure, and Methodology

24 individuals (5 females, 3 left-handed), ages 18–22, were recruited for this study. Human subject testing approval was obtained from the institutional review board.

Upon arrival, the two participants were seated in separate areas. They were instructed not to speak to their partner prior to the beginning of the experiment or during breaks. They were given a consent form, which was used to exclude individuals with injuries preventing them from safely lifting an 18 kg object. They also completed a preexperimental questionnaire which was used to prevent the pairing of roommates or close friends (self-described).

Participants then entered the room shown in Figure 1, and the investigator read instructions aloud from a script. They were told that their objective was to move the table to a series of targets as quickly as possible and that the specific locations would be provided later to one of the partners.

The participant that was randomly preselected to be the follower was then blindfolded; the lasers were turned on; the investigator showed the leader an annotated map of the room with a written sequence of targets (e.g., “Target 3, then Target 2, then Target 4, etc.”). Once the leader indicated she understood the instructions, the experiment began.

After each trial, the lasers were turned off and follower was permitted to remove his blindfold and rest while completing a postexperimental survey that assessed the individual’s perception of teamwork, physical demand, frustration, stress, and their own performance (discussed in Section 4.2).

In this counterbalanced, within-subjects experimental design, every dyad completed a block consisting of an 8-target sequence, twice, with each of the two experimental conditions. In total, each group participated for approximately 30 minutes.

Several steps were taken to mitigate learning effects. First, a dyad was never asked to execute a particular target sequence more than once. Second, counterbalancing was used for control for order effects.

4. Experiment I: Analysis

4.1. Performance

While there are a number of performance metrics that could be studied, we used time for completion. Recall that subjects were instructed to move as quickly as possible between the target locations. The time-stamped motion capture data made it easy to compute this measure.

The mean (and standard deviation) time for completion for each experimental treatment was as follows: haptic only 88.94 s (19.34 s) and haptic-verbal 102.28 s (40.91 s). The haptic only condition resulted in better performance than the haptic-verbal condition. Also, note that haptic-verbal has a larger coefficient of variation (standard deviation divided by mean) of 0.4 versus 0.22—implying that performance varied considerably under that condition. A repeated measures analysis of variance (ANOVA) test showed no significant difference between haptic-verbal and haptic only at the 0.05 level.

4.2. Subject’s Perception of Performance and Workload

At the conclusion of each block, both participants were separated and asked to complete a modified version of the NASA TLX survey [31]. The survey is used to rate their experiences along the dimensions presented in Table 1. Figure 2 depicts the results graphically. With the exception of the leader’s response to question 7 (overall satisfaction), the responses follow the same pattern. Participants rate haptic-verbal more favorably than the haptic only condition. None of the comparisons are statistically significant according to an ANOVA after applying a Bonferroni correction.

4.3. Additional Experimental Observations

There were several unanticipated phenomena worth discussing. First, upon reviewing and transcribing the audio recording, we realized the communication was very structured across all 12 groups. We were able to partition these exchanges into the following categories:(i)Direct commands (e.g., “Stop”): 314 occurrences,(ii)High level task descriptions (e.g., “Targets are in a triangular pattern”): 16 occurrences, and(iii)Confirmations (e.g., “Am I doing OK?”): 9 occurrences.

Regarding the direct commands, the most common interactions were single word instructions. The most frequent being “Stop,” followed in descending frequency by “back,” “slow,” and “left” or “right.” Only 4 of the 12 groups gave high level instructions like “we are going to move back and forth between these two targets 8 times.” Groups that gave instructions like this usually did so once at the beginning of the trial, while standing stationary. This may help explain why the haptic-verbal has a greater time for completion and a higher standard deviation as compared to haptic only. Participants did not always walk and talk simultaneously.

Finally, since the follower is permitted to speak during the haptic-verbal condition, we expected that some would occasionally request instructions or feedback. Yet only one of the 12 did.

Upon watching the video recording, we noted two interesting and possibly related behaviors. The most obvious was that 9 of the 12 teams developed an impromptu nonverbal communication protocol. Rather than holding the table parallel to the floor, the leader tilted the table downward in the direction that he or she wished to travel. We found it remarkable there was no a priori verbal agreement between the leader and follower about what this meant (recall that once participants signed the disclosure, they were not allowed to speak with each other). Furthermore, it is interesting that all 9 teams developed this concept seemingly independent of one another; the disclosure form also asks them not to discuss the experiment with those who have not yet participated. We have termed this type of behavior as a common haptic improvised protocol (CHIP).

A second observation is that the teams appear to avoid rotating the table about its yaw axis (i.e., zero yaw rate) but do not seem to exhibit a consistent preference on yaw angle. Clearly the task does not require controlling the yaw angle, but for some reason 10 groups (all 9 groups discussed above and one additional group) actively stabilized the orientation in this fashion. One might expect groups would prefer certain ergonomic orientations (e.g., so that neither member has to walk backward, or so that the leader always points toward the target, etc.) However, there was no obvious pattern of how groups selected the final orientation as seen in Figure 3. An ANOVA showed that there was no significant effect on the absolute heading difference. One possible explanation is that it is difficult for the follower to haptically distinguish lateral motion from rotational motion [2]. Since, rotations are not necessary to complete the task and the team may have been artificially reducing the number of degrees of freedom by stabilizing the yaw motion. Indeed several human-robot cooperative manipulation protocols employ alternate artificial constraints for the same purposes [2]. This seems to be an implicit type of common haptic improvised protocol (CHIP).

4.4. Discussion

We found it unexpected that verbal communication had no significant effect on completion time or workload. We offer several possible explanations. First, perhaps verbal exchanges are simply a social expectation and truly do not impact performance. Such expectations are not readily captured in the TLX survey. However there was no indication that this was the case in the exit interviews. Secondly, perhaps the task is too easy. We performed follow-up experiments with target sizes as small as 2 centimeters in diameter, with tables as light as 5 Kg, and varied the table length between 0.8 m and 2.0 m. None of these changes to the task resulted in a significant effect on verbal communication. Third, perhaps there is something in the nature of this task that obviates verbal exchanges. For example, in follow-up interviews, a few leaders told us that a follower’s active cooperation was only required in the gross motion phase. Once they stopped walking, all they needed the follower to do was to support the weight of the table while the leader engaged in the fine positioning. Fourth, perhaps we were not considering the correct performance measures. Several related performance measures were investigated but failed to yield additional significant differences such as walking speed, smoothness (e.g., fit to a sinusoidal or bell shaped velocity profile), and deviation from a straight line trajectory. Finally, perhaps the CHIPs were a satisfactory proxy for verbal communication. However, with 9-10 of the groups employing them there is no way to control for their effect post hoc.

5. Experiment II: Description

We developed a follow-up experiment intended to be more difficult. We eliminated the gross motion phase and the blindfold and made orientation a component of the task to reduce the dyad’s ability to use it to transmit CHIPs.

5.1. Task Description

Figure 4 shows the revised experimental setup. The participants must move a 2.5 kg board, measuring 0.1 m by 0.03 m by 1.2 m. Handles are mounted at either end, to promote uniformity across participants in their choice of grasping locations. Our intention was to reduce the likelihood that roll angle could be used as a CHIP, by making the object thin and forcing a one-handed grip.

The task requires them to move the board in the vertical plane to a “target,” a specified height between 0.50 and 1.5 meters measured from the ground and a specified pitch between plus and minus 20 degrees. A sequence of 10 randomly generated targets was created, with the constraint that any subsequent target be at least 10 cm apart. This same sequence was used for all dyads to facilitate comparison.

A target is considered “reached” when the board stays within plus or minus 0.005 m and 1 degree for at least 0.2 sec. The duration requirement was added to prevent groups from “swinging” the board through the target. Requiring both a height and an orientation means that it is difficult to use the pitch angle as a CHIP. We also hoped that it would require the follower to take a more active role in the task.

5.2. Data Acquisition

The height and orientation of the board were, respectively, determined via a Hokuyo URG scanning laser range finder and a VectorNav inertial measurement unit (IMU) mounted at the center of the board. The laser reports a precision of 2 mm (95% confidence), while the IMU’s orientation solution is precise to plus or minus 0.5 deg. (95% confidence). The sensor package updates at 15 Hz. In addition, a video recording was made of the experiment, with the audio track providing a record of the verbal exchanges between the dyads.

5.3. Information Structure and Communication

The two participants were unable to see each other, thanks to the hanging curtain shown in Figure 4, preventing them from seeing the facial cues or gestures of their partner but still allowing them to have general spatial awareness and see the board’s movements.

Each participant could see one of two computer monitors which displayed a schematic depicting the position and orientation of the board in real time. One of the two individuals was randomly designated as the leader for the duration of the experiment. The leader’s display (shown in Figure 5) depicted the target position and orientation of the board (dashed line). The follower’s display only depicted the current position of the board but did not contain information about the target (no dashed line).

The same two conditions were tested as follows.(1)Haptic only: neither partner was permitted to speak.(2)Haptic-verbal: both partners were permitted to speak.

5.4. Participants, Procedure, and Methodology

24 individuals (5 females, 6 left-handed), ages 18–22, were recruited for this study. Human subject testing approval was obtained from the institutional review board. Upon arrival, the two participants were seated in separate areas. They were given a consent form, which was used to exclude individuals with injuries preventing them from safely lifting a 2.5 kg object. A questionnaire was used to prevent the pairing of roommates or self-described close friends. They were instructed not to speak to their partner prior to or during the beginning of the experiment.

Before beginning the experiments, the subjects were asked to stand on different sides of the dividing curtain which hung from the ceiling (Figure 4). Then, the investigator read instructions aloud from a script and performed a demonstration of the experiment. During the demonstration, the subjects were able to observe their respective computer screens and received printed, annotated screen shots which allowed them to understand the display. The subjects were told that their task was to move the object to a series of heights and orientations. They were also told that the specific locations would be provided on the computer screen to only one of the participants and that their goal was to complete the task as quickly as possible. Once both participants indicated that they understood the instructions, the experiment began immediately.

Each block of 10 targets took approximately 5 minutes. After each block, the participants were permitted to rest while completing a postexperimental survey that assessed the individual’s perception of teamwork, physical demand, frustration, stress, and their own performance (matching the one in Section 4.2). In this counterbalanced, within-subjects experimental design, every dyad completed a block consisting of a 10-target sequence with each of the 2 experimental conditions. In total, each group moved the object to 20 targets during the 20-minute experiment.

Several steps were taken to mitigate learning effects. First, the target arrangement was randomly designed so that there was no obvious pattern (e.g., up-down-up-down…). Second, counterbalancing was used to control order effects. Finally, performance on the first 5 targets was considered “training” and is not used to compute a dyad’s completion time.

6. Experiment II: Results and Analysis

6.1. Performance Analysis

The mean completion time (and standard deviation) for each experimental treatment was as follows: haptic-verbal 6.64 s (2.56 s), and haptic only 11.25 s (6.00 s). As expected the haptic-verbal vondition resulted in a faster completion time. A repeated measures analysis of variance (ANOVA) test showed a significant effect () for communication type.

6.2. Subject’s Perception of Performance and Workload

At the conclusion of each block, both participants were separated and asked to complete a modified version of the NASA TLX survey [31]. The survey is used to rate their experiences along the dimensions presented in Table 1. Figure 6 depicts the mean responses graphically. On average, along all dimensions, both participants prefer the haptic-verbal communication type. For the leader, the difference between communication types along the 7L (overall satisfaction) and 8L (overall difficulty) dimensions is significant at the 0.05 level, after applying a Bonferroni correction. For the followers all comparisons are significant at the 0.05 level except for 3F (temporal demand) and 6F (performance appraisal), after applying a Bonferroni correction.

6.3. Additional Experimental Observations

While it was clear that the addition of verbal communication improved performance, the type or number of verbal exchanges has a less clear relationship with performance.

Regarding the types of exchanges, we transcribed the audio track and categorized phrases into one of multiple categories:(i)Commands (e.g., “up” and “stop”): 164 occurrences,(ii)High level (e.g., “the target is at eye level, tilted toward you”): 12 occurrences, and(iii)Confirmation (e.g., “yes” and “OK”): 22 occurrences.

Only three groups employed high level communication, with no apparent correlation to completion time. As Figure 7 shows, there is an overall positive correlation coefficient between completion time and the number of phrases exchanged. The Pearson correlation coefficient is 0.56; however, the relationship is not statistically significant (). This suggests that while communication is helpful, more is not necessarily better.

Upon examining the video recording, we noticed that when verbal communication was prohibited 11 of the 12 groups employed CHIPs. All the observed CHIPs in this experiment were “unnecessary” motions in the vertical plane by the leader such as abrupt changes in direction, overshoot, or shaking the board. No yaw or roll motions were observed. Figure 8 compares the hand motions of the leader and follower during a typical haptic-verbal trial with a haptic only trial where a CHIP was used. Both panels correspond to the same group traveling to the same target. Both partners are required to move down, but the leader must travel further to achieve the pitch specification. In the experiment depicted in the upper panel, verbal communication was permitted. The leader tells the follower: “down, down, slow, stop.” The leader never moved his hand in the opposite direction of his target position.

In the lower panel, no verbal communication was permitted. The leader moves down slightly quicker, but the follower is slower to move. The leader continues downward, past the desired hand position, in an effort to ensure the partner does the same. Once the follower is close to the desired position, the leader quickly jerks the board higher than the desired position. The partner briefly follows, but the leader quickly brings the board back down. We theorize this high velocity jerking motion is an attempt to nonverbally signal the follower to stop. It seems effective but takes them twice as long to reach the target. Primarily because it took the follower a while to realize a CHIP was being employed.

In this task, vertical motions are required for task completion. However, for whatever reason, the participants also use motions in this direction for CHIPs. This can result in confusion which dramatically increases completion time in some situations. Consider Figure 9. The same group from Figure 8 attempts to reach a target requiring both leader and follower to hold their hands at the same height (dashed blue line). Initially the leader (blue line) moves up but becomes frustrated by the fact that the follower is not holding the board high enough. The leader moves nearly 20 cms above the target. Finally the follower responds, reaching the correct height around 18 sec. However when the leader moves down abruptly to reach his own desired hand position, the follower fails to interpret this as a “stop” CHIP, instead mimicking his movement downward. This behavior repeats about 10 more times before the follower figures out that this gesture means “stop moving while I position my own hand.” In all, it took them approximately 10 times longer to reach the target as compared to when verbal communication was permitted. It is worth noting that this trial took place after the one depicted in Figure 8. It is unclear why the CHIP was not successfully interpreted.

7. Conclusions and Relationship to Existing Knowledge

In this paper we studied two person teams as they cooperatively manipulate a large object. Only one team member knows the location of the target. In one experimental condition the teammates cannot talk, while in the other they can. With regard to the research questions posed earlier, in the first experiment we found that verbal communication has a somewhat negative effect on performance and a small positive effect on workload. However, neither result was statistically significant. In the second experiment, we found that verbal communication had a significant positive effect on both performance and some dimensions of workload.

There are several commonalities across the two experiments in how the participants chose to communicate. When they were allowed to speak, the leader did the vast majority of the talking. Most verbal exchanges were brief commands (e.g., “stop,” “up,” etc.) rather than higher level plans. When the participants were not allowed to speak, we observed that many teams developed common haptic improvised protocols, termed as CHIPs. The two most common were tipping the table in the desired direction of travel and shaking the board. What was truly remarkable was that over 75% of the groups developed identical protocols without discussing the experiment in advance.

An unanswered question is why verbal communication improved performance in one task but not the other. Here we hypothesize that the nature of this difference and the role of CHIPs can be explained through an analysis of the task as a control system [32]. Consider the schematic in Figure 10 (left) depicting Experiment 1 and the control system written in state space form: Here is the mass of the object, is the rotational moment of inertia, and is the length. There are only two output variables that need to be controlled to complete the task, the and position of the table (body fixed frame). One may consider the inputs to be the leader and follower’s applied forces in the and direction. While there are four inputs, it is possible to control the two output variables with only two properly chosen inputs.

If the follower employs the implicit CHIP, , and , one can show that the output controllability matrix is now full rank. By making that substitution the yaw angle is no longer controllable (though it is stable); however, changes in the yaw angle are not required to complete the task. This may be an explanation for why the groups did not rotate the table. It may also explain why groups were able to complete the task without talking, when the target size was dramatically reduced.

On the other hand, consider Figure 10 (right) and the state space model for Experiment 2: There is no single “follower” protocol that will render the second task controllable, since controlling two independent output variables requires at least two independent inputs.

Regarding the explicit CHIPs, going back to Experiment 1, the leader can exert forces in the vertical direction resulting in small pitch or roll motions that have almost no impact on task completion. We speculate that a CHIP transmitted on an “unused” haptic channel may be as effective as explicit communication transmitted over an auditory channel because they are not confused with task motions. In contrast, for whatever reason, most groups felt the most intuitive way to transmit CHIPs in Experiment 2 was to use the vertical force channel. This is an unfortunate choice since forces along this direction are required for task completion. In the best cases, this results in inefficient motions (small overshoot) and in the worst cases this causes a cycle of confusion as illustrated in Figure 9.

Regarding lessons learned for human-robot interaction, there are few noteworthy points. First, humans do appear to exhibit a preference for explicit communication (though not necessarily verbal) during cooperative manipulation tasks. Second, when the communication is of the verbal form a surprisingly small vocabulary of one word commands is sufficient for the tasks described here. Lastly, we recognize that an individual’s perception of task difficulty is just as important as the actual performance when working with human-human and human-robot systems. Evidence shows that if an individual perceives a task to be easier, he or she is more likely to engage in that task.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.


This work was supported in part by the US Office of Naval Research under Grant N0001405WRY20391.