Abstract

As a basic study into 3-D audio display systems, this paper reports the conditions of moving sound image velocity and time-step where a discrete moving sound image is perceived as continuous motion. In this study, the discrete moving sound image was presented through headphones and ran along the ear-axis. The experiments tested the continuity of a discrete moving sound image using various conditions of velocity (0.25, 0.5, 0.75, 1, 2, 3, and 4 m/s) and time-step (0, 0.02, 0.04, 0.06, 0.08, 0.10, 0.12, and 0.14 s). As a result, the following were required in order to present the discrete moving sound image as continuous movement. (1) The 3-D audio display system was required to complete the sound image presentation process, including head tracking and HRTF simulation, in a time shorter than 0.02 s, in order to present sound image movement at all velocities. (2) A processing time longer than 0.1 s was not acceptable. (3) If the 3-D audio display system only presented very slow movement (less than about 0.5 m/s), processing times ranging from 0.04 s to 0.06 s were still acceptable.

1. Introduction

3-D audio display technology is important for virtual reality technologies. Simulation of the head-related transfer function (HRTF) using digital signal processing is the key technology for a 3-D audio display. In principle, sound signals are processed digitally for HRTF simulation and are presented to the listener through headphones. “Head tracking” technologies that control the virtual sound field according to the listener’s head position and orientation are well-known techniques to enhance the sound localization of a virtual sound image [1, 2] and are frequently used with HRTF simulation. Head tracking is based on position and orientation sensing technologies, and various sensors, such as magnetic 6 degrees of freedom (6DOF) sensors [3, 4], a gyro [5], a global positioning system (GPS) [6], and a camera [7], have been used for head tracking with a 3-D audio display.

However, these sensors need adequate processing time to obtain the position and orientation. For example, the sampling frequency of GPS is about 5–20 Hz (i.e., the processing time for position and orientation sampling is 0.05–0.2 s). The long processing time makes the 3-D audio display “discrete,” which means that the sound image cannot move continuously and alternates between static and jumping. During the time the system is processing, the sound image cannot move and must remain static, and then after processing, the sound image can change its location. Even if the 3-D audio display processing is fast enough to produce a continuous moving sound image, the sound image must be displayed discretely for each sampling period of the sensor. Furthermore, many other factors, such as the virtual field control process and system user interface process, also reduce the 3-D audio system latency and cause the resulting positions rendered by the display to be updated less frequently.

Under certain conditions, a listener will not perceive a moving sound image as jumping between discrete positions. Whether the listener can perceive the discrete motion as continuous or not is dependent on conditions such as the spatial separation of sounds, stimulus duration, moving velocity, or time-step (time period of the discrete motion) of the discrete process. Strybel et al. [8] investigated the effects of stimulus duration and spatial separation of two sounds that were displayed through two loudspeakers in a free field alternately. Lakatos [9] also reported that their results indicated a direct relationship between the horizontal separation of two sounds and the critical stimulus onset asynchrony (SOA). Mizushima et al. [10] investigated the continuity of a moving sound image caused by successive signals from two discretely located loudspeakers. Strybel and Menges [11] reported the effect of a frequency difference in the first and second sounds in apparent auditory motion.

As part of the research into the perception of moving sound images, the minimum audible moving angle (MAMA) has been investigated in many studies. Perrott and Musicant [12] reported that the MAMA increased as the velocity increased. Grantham [13] investigated the effect of prior exposure to motion on motion detectability. Perrott and Marlborough [14] studied the ability to discriminate the direction of motion and the order of events. Chandler and Grantham [15] investigated the MAMA as a function of stimulus frequency and bandwidth, source azimuth, and velocity. Perrott and Tucker [16] also measured the MAMA as a function of signal frequency and the velocity of the source. Strybel et al. [17] measured the MAMA as a function of the azimuth and the elevation of the source. Grantham et al. [18] measured the MAMA and the minimum audible angle (MAA) in horizontal, vertical, and diagonal planes. Carlile and Best [19] and Agaeva [20] measured the discrimination of sound source velocity in horizontal and vertical planes, respectively. Getzmann [21] investigated the effects of velocity and motion-onset delay on detection and discrimination of sound motion.

Some studies have investigated the moving sound image presented through headphones. Altman and Viskov [22] investigated the discrimination of perceived movement velocity for fused auditory images in dichotic stimulation. Perrott et al. [23] investigated the discrimination of moving events that accelerate or decelerate over the listening interval.

Although the many previous studies described above have investigated various aspects of moving sound images, they have not revealed the effect of velocity and time-step on the continuity of a discrete moving sound image in conditions using headphones. Under some conditions of velocity and time-step, the listener does not perceive discrete moving sound images as discrete motion. This paper reports the conditions of velocity and time-step where a discrete moving sound image is perceived as continuous motion. In the experiments described in this paper, the discrete moving sound image was presented through headphones and ran along the ear-axis. The experiments tested the continuity of the discrete moving sound images using various conditions of velocity and time-step.

2. Experiment A: Discrete Moving Sound Image

2.1. Purpose

The first experiment (titled “Experiment A”) tested the continuity of a discrete moving sound image using various conditions of velocity and time-step.

2.2. Method
2.2.1. Subjects

Subjects were 10 listeners, 3 males, and 7 females, averaging 20.8 years of age (SD = 1.32) with normal hearing.

2.2.2. Experimental Setup

Figure 1 shows the experimental setup. The experiment was conducted in a soundproof room. The stimulus sounds were generated by a computer (Apple iBook G4) and were presented to the listener through an amplifier and headphones (STAX SRS-4040). The computer-generated sound signals (see Section 2.2.3) were recorded in a WAVE format with a 16-bit, Stereo, 44.1 kHz sampling frequency in advance. The listener answered whether the presented sound image movement was continuous or not by operating switches on the graphical user interface (GUI) of the experimental application software (developed by REAL Software “REAL basic”) using a mouse (the details of the trials are described in Section 2.2.6).

2.2.3. Sound Signals

The sound images were presented by dichotic sound signals through headphones. The moving sound image ran along the ear-axis. The location of the sound image was controlled by interaural level difference (ILD).

Generally, 3-D audio display uses the HRTF simulation as described in Section 1. However, it is well known that there are individual differences in HRTFs across listeners, and the HRTF simulation cannot always provide the precise moving sound image for every listener. Thus, HRTF simulation was not used in this experiment.

The other way to control the sound image location on the ear-axis is by the operation of interaural time difference (ITD). However, if the moving sound image is controlled by ITD, timbre change will occur due to phenomena such as the Doppler effect in some conditions, and this can be a cue to detect whether the sound image movement is continuous or not. Thus, ITD operation was not used in this experiment.

The following explanation describes control of the sound image location on the ear-axis by ILD in this experiment.

Figure 2(a) shows the - coordinates of the horizontal plane. The origin is the center of the head of the listener, and the -axis is the ear-axis. The left ear position is , and the right is (). The sound image moves in the range of on the -axis.

When the sound image is presented at location (), the sound pressures of the left ear and the right ear are set as shown in (1)–(4) as follows: (i) , (ii) , (iii), where is the ILD in dB at which the sound image locates at the left ear position, and is a coefficient to adjust the absolute values of the sound pressure. Figure 2(b) shows the sound pressure of the left and the right ears as a function of the sound image location.

(left–right) is found as shown in (6)–(8) from (2)–(4) through (5). Figure 2(c) shows the as a function of the sound image location: (i) , (ii), (iii) ,

When the sound image is inside of the listener’s head (i.e., ), ILD is directly proportional to . In this case, can be regarded as the lateralization of the sound image. If , or , then ILD = 0, , or , respectively. These simulate the linear relationship [24, 25] between and sound image lateralization.

When the sound image is outside of the listener’s head (i.e., or ), ILD maintains a constant value of or . This means that the sound image is to the left or the right side of the head. The sound pressures and are inversely proportional to . In this case, can be regarded as the distance from the head center. These simulate the sound attenuation in distance (i.e., the inverse square law).

The sound signals and that were provided to the left and the right ears, respectively, were found by where is time, is the location of the sound image as a function of , and is a carrier signal of the sound image as a function of .

2.2.4. Sound Image Movements

Figure 3 shows models of the sound image movement as a function of time . In these examples, the sound image moves from right to left.

Figure 3(a) shows that the sound image moves continuously at a constant velocity . Duration of movement and location as a function of time are found by

Figure 3(b) shows that the sound image moves discretely at the same average velocity as the constant velocity of Figure 3(a). Number of steps , stepping distance , duration of start and end step , and location as a function of time (step index ) are found by where is the average velocity and is the time-step of discrete movement. The sound image jumped by distance every seconds.

In this experiment, the number of steps was an odd number so that the sound image could appear at the head center once (if was an even number, the sound image would never appear at the head center).

2.2.5. Conditions

The following are the conditions of the present experiments.

Distance between the head center and the ear  m.

Duration of movement  s.

The range of the sound image movement varied depending on according to (10).

The maximum value of ILD   = 20 dB. According to previous studies [24, 25], the sound image locates at the ear position when ILD is in a range of about 12 to 20 dB. The was set at the largest value of the range so that the sound image could clearly be located to the left or the right side of the head.

The sound pressure coefficient  dBSL. Each listener adjusted the coefficient using the volume adjustment on the headphones amplifier so that the sensation level was 60 dB (i.e., first, the listener adjusted the volume at minimum audible level; then, the value of the volume was risen 60 dB from the value that listener had adjusted), when the sound image was at head center (i.e., ).

The sound signal carrier was a white noise. The frequency range was from 20 Hz to 20 kHz.

Velocity = 0.25, 0.5, 0.75, 1, 2, 3, and 4 m/s.

Time-step = 0, 0.02, 0.04, 0.06, 0.08, 0.10, 0.12, and 0.14 s. The condition means that the sound image movement was continuous (practically, was the sampling rate of the WAVE files: 1/44100 s).

Figure 4 shows examples of the sound waveform when the sound image moved from right to left. Figure 4(a) shows an example with continuous movement with conditions  m/s and  s. This waveform was found by (1)–(4), (9), and (10). In the period from the start () to the peak of the right waveform envelope, the sound image was right outside of the head and approaching the right ear. At the peak of the right waveform, the sound image was approximately at the right ear. Between the peaks at the right and left, the sound image ran from the right ear to the left ear. After the peak on the right, the sound image moved away from the left ear to the left of the head. Figure 4(b) shows an example of the discrete movement with conditions  m/s and  s. This waveform was found by (1)–(4), (9), and (11). The envelope of the discrete movement waveform (Figure 4(b)) consisted of “steps” while the continuous movement waveform (Figure 4(a)) was smooth. This shows that the discrete movement alternated between staying and jumping while the continuous movement kept moving.

2.2.6. Trials

One session consisted of 56 trials (= 7 velocities × 8 time-steps) in a random order. One listener was tested for 6 sessions with alternating directions of movement (from left to right or from right to left). The direction of movement was constant in a session. In total, 60 (= 10 listeners × 6 sessions) data sets for each combination of velocity and time-step were obtained.

In each trial, after the sound image movement was presented, the listener answered as to whether the presented sound image movement was continuous or not by operating switches on the GUI of the experimental application software using a mouse (Figure 1). The listener had to judge whether it was not continuous when the movement was discontinuous even over a very short time.

The duration of the sound image movement was 5 s, and the listener needed about 3 s to respond. Thus, one session required 448 s (= 56 trials × (5 + 3) s). The listener rested for 5 minutes between sessions. The total time of the experiment for one listener was about 70 minutes (= 6 sessions × 448 s + 5 intervals × 5 minutes).

2.2.7. Pretest of the Presentation Method for Sound Image Movement

As described in Section 2.2.3, the presentation method for the sound image movement used neither HRTF nor ITD. Therefore, it was necessary to confirm the appropriateness of our method in advance.

Prior to the experiment, all listeners made sure, as a subjective impression, that they could perceive that the sound image moved from outside of the head to the opposite side via the inside of the head using the present presentation method.

The listeners tested several samples of the sound image movement conditions. As a result, all listeners reported the perception that the sound image approached from outside of the head, ran through the head, and went away to the opposite side.

The reasons why the listeners could perceive the sound image outside of the head without HRTF are discussed in Section 4.

2.3. Results

Figure 5 shows the results of Experiment A. The horizontal axis is the velocity in m/s, and the vertical axis shows the probability of perceived continuity as a %.

The variation in the probability of perceived continuity was statistically significant for velocity (ANOVA, , and ) and time-step (ANOVA, , and ).

The results showed that the probability of perceived continuity was greater than 70% for all velocities when the time-step was less than 0.02 s.

On the other hand, the probability of perceived continuity was less than 20% at any velocity when the time-step was greater than 0.1 s.

When the time-step was between 0.04 and 0.08 s, the probability of perceived continuity was greater than 70% for a velocity of 0.25 m/s. However, it decreased as the velocity increased to 1 or 2 m/s. As the velocity increased from 1 to 4 m/s, it increased slightly again.

2.4. Discussion

The results suggested that the time-step of the discrete movement must be less than 0.02 s in order to keep the probability of perceived continuity over 70% at any velocity. When the time-step was over 0.1 s, the discrete movement could hardly be perceived as continuous movement at any velocity.

When the time-step was 0 s (completely continuous), the probability of perceived continuity should ideally be 100%, but actually it was not always 100%, especially for high velocity conditions. The reason was considered as being that, for high velocity conditions, the listeners mistook the “very fast movement” for “jumping from left (right) to right (left)” because the sound image appeared and disappeared over a very short duration, and, as a result, it was reported as being discrete.

When the time-step was between 0.04 and 0.08 s, the results showed a “U-shape.” A slow velocity of less than 0.25 m/s showed a high probability of perceived continuity over 70%. This suggested that it was very difficult for listeners to perceive the discreteness of movement at a very slow velocity. It was suggested that the stepping distance of the very slow velocity was too small to be distinguished from a slow continuous movement. On the other hand, at a speed greater than 1 or 2 m/s, the probability of perceived continuity increased slightly again. This suggested that it was also difficult for listeners to perceive the discreteness of movement when the velocity was greater than 1 or 2 m/s. It was suggested that the stepping distance for the velocities greater than 1 or 2 m/s was large enough to be confused with a fast continuous movement.

On the other hand, the next question was how the listeners decided whether the sound image movement was continuous or discrete. It must be taken into account that the current presentation method of sound image movement changed the “loudness” of the sound image as well as its location. For instance, when the sound image was at the left ear position, the loudness perceived by the left ear must be at a maximum. When the sound image approached or went away from the left ear, the loudness perceived by the left ear must increase and decrease, respectively. Thus, a sound image movement was generated in part by varying its loudness. There was a possibility that the listener did not make the decision based on continuity of movement, but by continuity of “loudness variation.” The next experiment tested this possibility.

3. Experiment B: Discrete Loudness Variation

3.1. Purpose

The second experiment (titled “Experiment B”) tested the continuity of discrete loudness variation under the same conditions as Experiment A. In Experiment B, the sound image was located at the head center by “diotic” presentation, and only the same loudness variations as in Experiment A were presented.

3.2. Method
3.2.1. Subjects

The subjects were the same 10 listeners as in Experiment A.

3.2.2. Experimental Setup

The experimental setup was also the same as in Experiment A.

3.2.3. Sound Signals

The dichotic sound signals of Experiment A were added and divided equally. The diotic sound signal for Experiment B was found by where and were dichotic sound signals in Experiment A for left and right ears, respectively, and were defined in (9).

The diotic sound signal was presented to both the left and the right ears through the headphones. The sound image was located at the head center, and only the same loudness variations as in Experiment A could be presented.

3.2.4. Loudness Variation

In Experiment B, variable as a function of time was the same as in Experiment A. The sound pressure of as a function of time can be found by (1)–(12).

3.2.5. Conditions

The conditions were also the same as in Experiment A.

3.2.6. Trials

The trials and total time were also the same as in Experiment A.

In each trial, after the loudness variation was presented, the listener answered as to whether the presented loudness variation was continuous or not in the same manner as in Experiment A.

3.3. Results

Figure 6 shows the results of Experiment B. The horizontal axis is the velocity in m/s, and the vertical axis shows the probability of perceived continuity as a %. It must be noted that, in Experiment B, the actual velocity was zero because the sound image stayed at the head center. The velocity in Experiment B was just a variable and was used to find the value of by (10)-(11).

The variation in the probability of perceived continuity was statistically significant for the velocity (ANOVA, , and ) and the time-step (ANOVA, , and ).

The results showed that when the time-step was short (less than 0.04 s), the probability of perceived continuity was high (greater than 50%) at all velocities. When the time-step was long (greater than 0.08 s), the probability of perceived continuity was larger as the time-step or the velocity was reduced. When the time-step was in the middle of the range (0.06 s), the results showed a “U-shape” characteristic. These qualitative tendencies were similar to the results of Experiment A.

However, the results of Experiments A and B were quantitatively different. Figure 7 shows a comparison of both results. The filled circle shows the probability of perceived continuity of the moving sound image, and the open circle shows the probability of perceived continuity of the loudness variation. The diameter of the circle indicates the probability. The test for independence ( test) was conducted to compare both results. The gray square indicates significant differences: dark gray means , and light gray means . Results suggested that almost half of the conditions showed quantitative differences in the probability of perceived continuity although the qualitative tendencies were similar.

3.4. Discussion

The results shown in Figure 7 suggested that there was a tendency that the probability of perceived continuity of the loudness variation was greater than that of the moving sound image, especially when the time-step was greater than 0.06 s. In about half of the conditions, these differences were significant. This means that the listeners could perceive the discreteness more strongly when they listened to the moving sound image than when they listened only to the loudness variation. This fact could answer the previous question: was there a possibility that the listener did not make the decision by the continuity of movement but rather by the continuity of “loudness variation”? The answer was “no” because the listeners could perceive the discreteness of the moving sound image even if they could not perceive the discreteness of the loudness variation.

As in Experiment A, when the time-step was 0 s (completely continuous), the probability of perceived continuity should ideally be 100%, but the results shown in Figure 6 suggested that they were not always 100%, especially for high velocity conditions. It was suggested that, as for Experiment A, for high velocity conditions, the listeners mistook the “varying very quickly” for “varying discretely” because the sound appeared and disappeared over a very short duration, and the sound was reported to be discrete. In Figure 7, in conditions where the time-step was 0 s, some results (velocity = 0.5, 2, 3, and 4 m/s) showed a significant difference. This was because in conditions where the time-step was 0 s, the results of Experiment A ranged from 85% to 100%, whereas the results of Experiment B ranged from 75% to 90%.

4. Conclusion

As a basic study into 3-D audio display systems that need long processing times (e.g., a system with a very slow head tracking sensor), this paper reports the conditions of moving sound image velocity and time-step where a discrete moving sound image is perceived as continuous motion. In the present experiments, the discrete moving sound image was presented through headphones and ran along the ear-axis. The experiments tested the continuity of a discrete moving sound image using various conditions of velocity (0.25, 0.5, 0.75, 1, 2, 3, and 4 m/s) and time-step (0, 0.02, 0.04, 0.06, 0.08, 0.10, 0.12, and 0.14 s).

As a result, in order to present a discrete moving sound image as continuous movement,(1)the 3-D audio display system is required to complete the sound image presentation process, including head tracking and HRTF simulation, in less than 0.02 s, in order to present sound image movements at all velocities,(2)a processing time longer than 0.1 s was not acceptable,(3)if the 3-D audio display system only presents very slow movement (less than about 0.5 m/s), processing times ranging from 0.04 s to 0.06 s were allowed.

In addition, a comparison of the results of a moving sound image and loudness variation showed that the listeners did not decide whether the movement was discrete or continuous only by the loudness variation. The listeners could perceive the discreteness of the moving sound image, even if they could not perceive the discreteness of loudness variation.

Ethical Approval

The experiments included in this paper were conducted with the understanding and the consent of the human subjects, and the responsible Ethical Committee approved the experiments.

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This study was partially funded by Research Grants of the Okawa Foundation for Information and Telecommunications, 2008, and the Cooperative Research Project Program of the Research Institute of Electrical Communication, Tohoku University, 2007–2009.