Abstract

One of the more debated issues regarding training simulators is their validity for transfer of skills to sensory environments that differ from the simulator. In two experiments, the advantages of three-dimensional (3D) and collocated (Col) visual displays were evaluated in a realistic and complex visuomotor task. The two factors were evaluated independently, comparing Col-2D with dislocated-2D (experiment 1) and with Col-3D (experiment 2). As expected, in both cases the more immersive presentation condition facilitated better performance. Furthermore, improvement following training in the more immersive condition carried over to the following less immersive condition but there was no carry over in the opposing order of presentation. This is taken as an indication for the differential development of skills conditioned by the level of immersiveness of the training environment. This further suggests that learning of complex realistic tasks is not carried over from less immersive simulator to the complex sensory environment of reality, due to the large gap in sensory patterns.

1. Introduction

Virtual environments of various levels of immersiveness are widely used for training. One of the more debated issues regarding training simulators is their validity for transfer of skills to sensory environments that differ from the simulator. The question of validity is composed of two underlying questions: what is the quality of learning using training simulators, and what is the quality of transfer of manual skills from a training simulator to real life tasks. The first relates to the characteristics of the simulator—whether it provides the sensory cues needed for optimal learning. The second is especially crucial: it is the newly acquired skills exported to new, less or more immersive sensory environments? For instance does training on surgical incisions in “Flatandia” type of world [1], a two-dimensional screen without touch, carry over to space-land (ibid), a 3-dimensional virtual world with haptics? The first being evolutionary alien, while the second is friendly and well experienced. The skills for successful performance in each do not overlap. An additional level of learning and transfer difficulty is added when haptics is dislocated from the visual cues. For instance, if playing virtual tennis, you feel the ball hitting your arm, but see the hit on your iphone.

Dislocation and flatness change drastically the sensory input in the active exploration cycle [2]. The active exploration cycle refers to the continuous cyclic process that involves sensory scanning, while interpretations and intentions are adapted, and motor actions are produced. In line with the active exploration cycle and manual task improvement can result from more efficient sensory processing (sensory skills), more efficient motor processing (motor skills), and/or more efficient cognitive processing (situation interpretation and intentions management). Here we will focus on performance enhancement based on improvement in sensory skills. We argue, and provide supporting evidence, using different kinds of virtual environments (VEs) for training, for differential development of skills, dependent on the sensory and sensorimotor schemes provided.

Beyond identifying if and what is transferred, we address the question of why in some cases transfer of skills is evident, and sometimes it is not. We attempt to identify the conditions that allow transferability of learning.

Virtual reality is ideal for studying motor learning simply since it allows full control of the sensory cues and high resolution of both motor and spatial performance. This is hard to achieve in the physical world. For instance, sensory integration has been recently studied using VEs [36]. This is due to virtual environments providing a fully controllable environment for manipulation of sensory cues and fine resolution measures of changes in human responses. VEs provide the ability to control and manipulate the data in each sensory channel individually and independently of the other channels. VEs are also common in working, environments that require teleoperation, telemanipulation and telepresence. Thus, it is a practical, as well as theoretical, question whether there are differences between skills developed under different kinds of VEs that provide different schemes of sensory presentation and integration. Hence transferability of learning is studied across virtual worlds, from a low immersive to highly immersive worlds that indicate transferability to the physical setup.

In a previous related study, results indicate a carryover of skills gained during a block of high immersiveness to the block of low immersiveness. In the opposite case, moving from low to high immersiveness, no carryover was measured when users moved from immersive experience to low immersiveness experience [7]. This study aims at investigating the contribution to learning of visual manipulation, haptic manipulation, and visuo-haptic integration manipulation.

Using different levels of immersiveness in haptically enabled virtual environments, (VEs) for training, we test the question of learnability and transfer by studying the interaction between characteristics of sensory patterns (display) and collocation. For instance, we ask whether a flat dislocated environment is superior to a flat collocated and superior to a 3D dislocated? We provide supporting evidence for: a greater carryover of learning of motor tasks from a stereoscopic environment to “flatland,” then from flatland to stereoscopic, suggesting that learning in “flatland” simulators is limited in it is power of generalization and transfer to real life situations. Hence, we argue for patterns of differential development of skills, dependent on the sensory and sensorimotor schemes provided. We used concurrent visual and haptic displays across conditions, and contrast collocated versus dislocated visuo-haptic presentation (experiment  1) and 2D versus 3D visual display (experiment  2). We kept the haptic feedback and motor requirements fixed across conditions.

1.1. Previous Studies: Performance in VE

Laboratory studies employing artificial tasks focused on the evaluation of the performance enhancement due to improved display. The logic of artificial task studies followed the experimental logic of perceptual adaptation experiments [8, 9]. Performance in two (or more) VEs is compared, the VEs differing with respect to one (or more) of its key aspects (what modalities are displayed, the dimensionality of each modality display, and sensory-motor integration). Past research results indicate that performance with, the more realistic, 3D vision is better than performance adapted to 2D vision [10] and that collocated display can introduce significant performance benefits over performance adapted to dislocated displays [11]. Still, the question remains regarding the underlying mental mechanisms that benefit from improved displays.

With simple visuomotor tasks, there seems to be no difference in performance whichever display is used. The simple tasks most commonly used are variations on Fitts paradigm [12]. In a Fitts task, it is required to move from a start point to a target point in a straight line as quickly as possible. Looking at the effect of collocated versus dis-located VE on performance in a Fitts-like task, it was found to be insignificant [13]. Interestingly, though the gap between the regression lines of the collocated and disjoint conditions increases as the index of difficulty (ID) increases (see Figure 5 in [13]), indicating that the benefit from co-located display seems to increase as the difficulty of the task increases.

Another study [14] contrasted a simple, Fitts-like, task [15] with a more complicated task, albeit on visuomotor mapping (VMM, compatible or reversed) rather than on visou-motor integration (collocated or dislocated). Even in the simple task, it was found that VMM has an effect on task performance, showing benefit for the compatible condition. But, there was only rudimentary effect of learning and the authors summarize that “… none of the conditions in this task seemed to have been particularly difficult or required significant coping activity.” [14, page 264]. Thus, we devised a complex and difficult task of moving from start to target in a nontrivial, three-dimensional path, modeled after a laparoscopic surgery task.

Many studies, oriented at laparoscopic surgery research, employed virtual reality techniques [14, 1620]. Most of these studies were explicitly occupied with the effect of visual display dimensionality (2D versus 3D) and visual display location on the operator (surgeon) performance [1820] or task difficulty [16, 17]. In one survey, expert surgeons indicated depth perception as an important technological challenge for improving laparoscopy systems although opinion is not conclusive regarding the importance of 3D display [17]. In a recent literature review, it was reported that for the inexperienced surgeon “loss of three-dimensional visual depth cues forms a major obstacle for the effective use of instruments, creating fundamental psychomotor problems of hand-eye coordination” [18, page 210]. On the other hand, empirical test of the benefit of 3D visualization showed significant decrease in performance, apparently due to technological insufficiency [19]. Zheng and colleagues, who conducted an experimental study of co-location presentation manipulation [20], conclude that collocated display could improve surgeon performance, relative to performance with dislocated presentation. Thus, looking at studies of laparoscopic surgery, it seems that both visual display dimensionality and visuo-haptic integration have an effect on task performance.

1.2. The Current Study

A real-world task was used, in an immersive virtual environment, to study manual learning. Specifically, this task replicated a laparoscopic surgical procedure with a complex model of a twisted tube that resembles the gastroenterological system, describing a specific intestine. The virtual intestine was embedded in a 3D spatial background that mimicked the internals of the abdomen (see Figure 1). The environment included a standard 3D force-feedback hapticdevice (PHANTOM Omni) that could have been presented either collocated with or dislocated from a 2D or a 3D visual display.

In an Initial study [7], we compared performance in the motor task with a 2D visual presentation, dislocated from haptics (Dis-2D), with performance in a highly immersive and engaging 3D visual display, collocated with 3D haptics (Col-3D). Performance under Col-3D was found to be better than performance under Dis-2D. Focusing on learning, we found that performance improved in all conditions, albeit differentially. Moving from the most immersive condition (Col-3D) to the least immersive condition (Dis-2D) facilitated the initial performance in the poorer condition, indicating a carryover of skills gained during a block of high immersiveness to the block of low immersiveness. In the opposite case, moving from low to high immersiveness, no carryover was measured when users moved from immersive experience to low immersiveness experience [7].

The difference between conditions was in the visual presentation and its relation to the haptic presentation (visuo-haptic integration).

In all experiments and across display conditions, the haptic aspect is fixed. It follows that the motor requirements of the task remain the same in all conditions. Therefore, from a purely haptic aspect of task execution, it should not matter which condition was practiced first and which was practiced second. At the motor level, the participants were asked to repeat executing the same hand movement sixty-two times, thirty-one times in each of two conditions. The fact that, in the Initial study, experience with one presentation condition (Col-3D) induced improvement in performance with the other presentation condition (Dis-2D) but not so in the opposite order, indicates the different development of some ability. This ability, differentially developed in the two display conditions, must depend on either visual dimensionality (2D or 3D), or visuo-haptic integration (collocated or dislocated), or both. Thus, one aim of the current study is to validate the phenomenon of an immersiveness-induced skill (IIS) by replicating the interaction results of the Initial study. Another aim is to investigate the sources of the IIS, being either visual dimensionality manipulation, or visuo-haptic integration manipulation, or both.

1.2.1. The Experimental Conditions

The comparison of Col-3D with Dis-2D, In the initial study [7], confounds two types of display contrast. One is a difference in visual dimensionality (3D versus 2D) and the other is difference in spatial congruence between the haptic and visual feedbacks (collocated versus dislocated). In the current study, the two were disentangled into two independent contrasts, one for each property of the display manipulated in the initial study. To this end, three visuo-haptic combinations were used: The two extreme conditions, and an in-between presentation condition, 2D visual display collocated with the 3D haptic presentation (Col-2D). In the first experiment, Col-2D was compared with Dis-2D, matching the two systems for a contrast of spatial-congruence effects (collocated versus dislocated). In the second experiment, Col-2D was compared with Col-3D, contrasting for the effect of visual feedback dimensionality (3D versus 2D). In the first experiment the new Col-2D condition is of higher real-world fidelity than its match and in the second experiment, Col-2D is of less real-world fidelity than its match.

1.2.2. Hypotheses

To validate the new task and settings, we make sure that previous findings are replicated in the current VEs and tasks. First it is expected that more immersive presentations (with 3D vision or visuo-haptic collocation) would foster better performance than presentation with lower real-world fidelity (2D vision or visuo-haptic dislocation). A second expected replication regards the independent measures. Two measures are used to evaluate performance, the total time to complete the task (the Total time measure) and the amount of time the operator was occupied with erroneous, potentially harmful, conduct (the Error time measure). In the initial study, there were no indications for a tradeoff between the two measures. The same pattern of effects was replicated across the measures, in some cases showing significance in both measures but sometimes only in one of them (see also upper part of Table 1). No reverse patterns were found and it is expected to be the same in the two experiments reported here.

Given the known superiority of high-immersiveness VEs in facilitating visuomotor performance, we hypothesize that the more immersive environments (with 3D vision or visuo-haptic collocation) should give rise to the development of an immersiveness-induced skill (IIS), at least at a higher level then environments of lower immersiveness. The superiority of the more immersive conditions in the skill development will be manifested in a pattern of interaction between presentation condition and presentation order group. In experiment  1, it is expected that performance in the less immersive Dis-2D condition, following training in the Col-2D condition, will result in enhanced performance relative to initial performance in the Dis-2D condition, and that this improvement will be higher than improvement (if any) to performance in the Col-2D condition following training in the Dis-2D condition. In experiment  2, a similar interaction is expected, this time with higher improvement in the less immersive Col-2D condition, following training in the Col-3D condition than any improvement (if at all) to performance in the Col-3D condition following training in the Col-2D condition.

2. Method

The two experiments reported below were identical, save for the VE aspects being manipulated (visual dimensionality and visuohaptic integration). To refrain from unnecessary repetitions, the details described below are characteristics of both experiments, unless stated otherwise.

2.1. Participants

There were fourteen participants (with an average age of 29, s.d. 3.7) in the first experiment and ten (4 men with an average age of 27, s.d. 2.2; and 6 women with an average age of 28, s.d. 2.1) in the second experiment. Participants were paid for participation.

2.2. Equipment and Setup

The virtual 3D intestine comprised of a trivial Torus Knot with parameters 𝑞 = 1 . 5 and 𝑃 = 1 . 0 . It is a complex tube with three 3D twists. The virtual intestine was rendered using proprietary application programming-interfaces (APIs) developed and validated in the Virtual Reality and Neurocognition lab (VRNL) at the Technion. The haptics API is based on OpenHaptics AE 3.0. The visual API is based on OpenGL 2.1 (Figure 1 shows the 2D visual rendering). 3D haptics was captured and rendered, in all conditions, by a Phantom Omni (“SensAble”) force feedback device. Dislocated 2D vision was provided by a “19 screen mounted on the wall 2 m above the floor and 2.4 m away from the manual effector, similar to the display in standard operating rooms. Collocated 2D vision was provided by a 17” screen projecting on a half reflecting mirror. The phantom and the arm were placed under the mirror, making them invisible to the operator. For Collocated 3D vision, participants used shutter glasses for stereographic perception, providing a 3D visual image virtually placed in the space under the half reflecting mirror.

2.3. Procedure

Participants sat in front of the system and were given a demonstration task of 3–5 minutes in order to familiarize themselves with the system. The experimental phase started with an explanation of the task and continued with two blocks of 31 trials each. Half of the participants started with the more immersive display condition (Experiment  1: Col-2D, experiment  2: Col-3D) in the first block and with the low immersiveness condition (experiment  1: Dis-2D, experiment  2: Col-2D) in the second block of trials. The other half of the participants were presented with the opposite order of blocks.

Participants were presented with the virtual intestine and were asked to use the Phantom Omni to move a small ball, a virtual cutting tool, through the virtual intestine and use it to dissect a lump in the other end of the intestine. The instructions were to do it as quickly as possible, but without touching the intestine’s wall, to avoid the risk of hurting it with the cutting tool.

2.4. Measures

Two performance measures were recorded and analyzed. (1) The Total time to complete the task: moving the cutting tool from the beginning of the intestine to its end, where the lump was, and touching the lump. (2) Error time: the aggregate of time periods the cutting tool touched the intestine wall during task completion.

3. Results

Because of theoretical unity and procedural identity, the results of the initial experiment [7] will be considered together with the results of the current two experiments.

3.1. Independence between the Performance Measures

To verify, participants did not trade off between being quicker (optimizing total time) and being accurate (optimizing error time) a series of correlations has been computed. For each participant in each block, we computed the correlation between total time and error time measures. Across the three experiments considered here (the initial experiment and experiments  1 and 2), only one correlation was found to be negative, that is, for only one participant (out of 34), we found an indication for speed-accuracy trade off and only in one block of trials. The correlation was very small ( 𝑟 = 0 . 1 2 ) and the same participant, in the other block of trials, showed a large positive correlation ( 𝑟 = 0 . 6 7 ). Thus, we conclude that there was no indication that the participants in the experiments discussed below applied speed-accuracy trade off strategy.

3.2. Analysis Design

For both experiments, two two-way mixed ANOVA were computed, one with the total time measure as the dependent variable and another with the error time measure as the dependent variable. In all analyses, the display condition (within participants) and the order of presentation (between participants) were used as the independent factors.

The Display condition included the two VE aspects manipulated in each experiment (Dis-2D versus Col-2D in experiment  1 and Col-2D versus Col-3D in experiment  2). The order of presentation contrasts the two groups of participants. In experiment  1 participants training first with a Dis-2D VE and then with a Col-2D VE (order D2C2) were compared with participants first trained in a Col-2D VE and then in a Dis-2D VE (order C2D2). In experiment  2 participants training first with a Col-3D VE and then with a Col-2D VE (order C3C2) were compared with participants first trained in a Col-2D VE and then in a Col-3D VE (order C2C3).

Thus, the two independent variables have four combinations: each display condition was first in one group of order of presentation and second in the other group of order of presentation. In this design, learning is indicated by an influence of training in the first block on performance in the second block. For example (with the design of experiment  1), if something is learned following training in a block of Dis-2D VE then performance in display condition Col-2D in the second block (following a block of Dis-2D, that is in order of presentation group D2C2) should be better than performance in Col-2D in the first block (i.e., in order of presentation group C2D2). Hence, the critical importance of the interaction effects and their pattern in this study.

Table 1 presents the pattern of effects in the two experiments (experiment  1 in the middle section, experiment  2 in the bottom section) and in the initial study (top section).

3.3. Experiment  1
3.3.1. Total Time

Main Effect for Order of Presentation
Starting with a Dis-2D presentation and moving to Col-2D resulted in average performance (26.4 sec, S D = 1 0 ) not significantly different ( 𝑃 = 0 . 5 3 ) from starting with Col-2D and then moving to Dis-2D (23.8, S D = 9 ).

Main Effect for Display
The average time it took to complete the task under Col-2D (24.3 sec, S D = 9 ) did not differ significantly ( 𝑃 = 0 . 3 9 ) from the average time that was needed to complete the task under Dis-2D (25.9 sec, S D = 1 0 ).

Interaction of Order and Display
The interaction was significant ( 𝑃 = 0 . 0 0 1 ), indicating a larger improvement in Dis-2D (following training in Col-2D) than in Col-2D (following training in Dis-2D).

3.3.2. Error Time

Main Effect for Order of Presentation
The order of block experience did not have a significant effect ( 𝑃 = 0 . 7 1 ) on the average level of performance (Dis-2D-Col-2D: 5.7 sec, S D = 3 . 7 ; Col-2D-Dis-2D: 5.3 sec, S D = 2 ).

Main Effect for Display
The average error time was significantly ( 𝑃 = 0 . 0 0 4 ) shorter in the Col-2D (4.6 sec, S D = 2 . 2 ) than in the Dis-2D (6.4 sec, S D = 3 . 3 ).

Interaction of Order and Display
The interaction was significant ( 𝑃 < 0 . 0 0 0 1 ), indicating a larger improvement in Dis-2D (following training in Col-2D) than in Col-2D (following training in Dis-2D).

3.4. Experiment  2
3.4.1. Total Time

Main Effect for Order of Presentation
Starting with a Col-2D presentation and moving to Col-3D resulted in average performance (31 sec, S D = 1 3 . 6 ) significantly different ( 𝑃 = 0 . 0 0 6 ) from starting with Col-3D and then moving to Col-3D (16, S D = 6 . 4 ).

Main Effect for Display
The average total time was significantly ( 𝑃 = 0 . 0 0 1 ) shorter in the Col-3D (18.2 sec, S D = 6 . 1 ) than in the Col-2D (29.2 sec, S D = 1 5 . 6 ).

Interaction of Order and Display
The interaction was significant ( 𝑃 = 0 . 0 0 1 ), indicating a larger improvement in Col-2D (following training in Col-3D) than in Col-3D (following training in Col-2D).

3.4.2. Error Time

Main Effect for Order of Presentation
Starting with a Col-2D presentation and moving to Col-3D resulted in average error performance (3.7 sec, S D = 3 ) not significantly different ( 𝑃 = 0 . 2 0 ) from starting with Col-3D and then moving to Col-3D (2.4, S D = 1 . 4 ).

Main Effect for Display
The average error time was significantly ( 𝑃 = 0 . 0 0 0 8 ) shorter in the Col-3D (1.6 sec, S D = 1 . 2 ) than in the Col-2D (4.6 sec, S D = 2 . 4 ).

Interaction of Order and Display
The interaction was marginally significant ( 𝑃 = 0 . 0 7 ), indicating a larger improvement in Col-2D (following training in Col-3D) than in Col-3D (following training in Col-2D).

4. Discussion

The realistic new task, developed in this study for studying learning and transfer between virtual environments (VEs) of different level of immersiveness, replicated in all three experiments the classic finding that a more immersive VE yields better performance (e.g., [10, 11, 14]), thus substantiated its validity.

The novel results of this study suggest transferability from any higher immersive environments to lower immersive environments, not necessarily only from a stereoscopic collocated environment. Learning in a 2D collocated environment, transfers to performance in a 2D dislocated environment, suggesting a nesting effect: sensory patterns learned in the 3D collocated include patterns that are needed for performance in a 3D dislocated environments and sensory patterns learned in a 3D dislocated environment are applicable for performance in 2D dislocated environments. This nesting effect suggests a model of differential learning across sensory environments. The hypothesis that more immersive VEs will induce some benefit to learning was corroborated by the results. The critical test for an immersiveness-induced skill (IIS) was the patterns of interaction between Display condition and order of presentation. Across experiments, this interaction was found to be significant and in all cases the pattern was the same: an improvement in performance for the less immersive condition, following training in the more immersive condition, which is higher than any change in performance in the more immersive condition following training in the less immersive condition.

The fact that training in one condition, highly immersive VE, is more beneficial to transfer than training in the other condition, low immersiveness VE, means that either the ability underlying performance benefits more from high immersiveness than from low immersiveness or that training under high immersiveness facilitate an additional ability that is not provided by training under low immersiveness. There is one kind of abilities that was common to both high immersiveness and low immersiveness display conditions. That is the haptic, motor, family of skills, as the motor presentation and requirements were the same in all display conditions, across all experiments. The displays differed in either visual dimensionality (2D versus 3D), visuo-haptic spatial-congruence (dislocated versus collocated), or both. An effect of visual dimensionality calls primarily for a visual IIS, while an effect of visuo-haptic spatial-congruence calls for a more integrative kind of IIS. In effect, both conditions yielded the same pattern of results, and so did the combined condition. This leaves the question of what kind of skill is differentially induced by different immersiveness conditions open for future research.

Only in two cases, there was a significant effect to the order of presentation (in the initial experiment for the error time measure and in experiment  3 for the total time measure). The result proved it better to start with the more immersive VE and then move to the less immersive VE. Presumably, the advantage of these conditions is because the IIS is gained in the first block and carryover to the second block (of low immersiveness), while in the opposite order the IIS is introduced only in the second block and thus plays a role in much less trials.

The finding of an IIS development also has practical implications. For the design of training systems, it implies that training system developers should verify that any compromise made in the system’s real-world fidelity does not affect the development of a critical skill, as we have shown that training can suffer from relative minor compromises (e.g., having a force feedback device but keeping it dislocated from the visual display).

Acknowledgment

This work was supported by EC Fund Immersence 027141.