Abstract
This paper proposes a new algorithm for a directional aid with hearing defenders. Users of existing hearing defenders experience distorted information, or in worst cases, directional information may not be perceived at all. The users of these hearing defenders may therefore be exposed to serious safety risks. The proposed algorithm improves the directional information for the users of hearing defenders by enhancing impulsive sounds using interaural level difference (ILD). This ILD enhancement is achieved by incorporating a new gain function. Illustrative examples and performance measures are presented to highlight the promising results. By improving the directional information for active hearing defenders, the new method is found to serve as an advanced directional aid.
1. Introduction
In many-cases, individuals are forced to use hearing
defenders for their protection against harmful levels of sound. Hearing
defenders are used to enforce a passive attenuation of the external sounds
which enter our ears. The use of existing hearing defenders
affect natural sound perception. This, in turn, results in a reduction of direction-of-arrival
(DOA) capabilities [1, 2]. This impairment of DOA estimation accuracy has been
reported as a potential safety risk associated with existing hearing defenders
[3].
This paper presents a new method for enhancing the
perceived directionality of impulsive sounds while such sounds may contain
useful information for a user. The proposed scheme introduces a directional aid
to provide enhanced impulsive types of external sounds to a user; improving the
DOA estimation capability of the user for those sounds. Exaggerating this
directional information for impulsive sounds will not generally produce a
psychoacoustically valid cue. Instead, this method is expected to enhance the user's
ability to approximate the direction of an impulsive sound source, and thereby
speed up the localization of this source. With the exception of enhanced
directionality of impulsive sounds, the proposed method should not alter other
classes of sounds (e.g., human speech sounds). Safety is likely to be increased
by using our new approach for impulsive sounds.
The spatial information is enhanced without increasing
the sound levels (i.e., signals are only attenuated and not amplified). The
risk of damaging the user's hearing by the increased sound levels is thereby
avoided. However, the proposed directional aid passes the enhanced external
sounds directly to the user without any restrictions. It is therefore
recommended, in a real implementation, that a postprocessing stage is
incorporated after the proposed directional aid for limiting the sound levels
passed to the user. Active hearing defenders with such limiting features are
commercially available today.
A suitable application of our directional aid is for
the active hearing defenders used in hunting, police, or military applications,
in which impulsive sounds such as gun or rifle shots are omnipresent. In these
applications, the impulsive sounds are likely to accompany danger, and
therefore fast localization of impulsive sound sources is vital. A similar idea
for enhancing the directional information can be found in [4], wherein the hearing
defender is physically redesigned using passive means in order to compensate
for the loss in directional information.
A brief introduction to the theory of human
directional hearing is provided hereafter followed by our proposed scheme for a
directional aid. An initial performance evaluation of the proposed method is
given with a summary and conclusions.
2. Theory of Human Directional Hearing
The human
estimation of direction of arrival can be modeled by two important binaural
auditory cues [5]:
interaural time difference (ITD) and interaural level difference (ILD). There
are other cues which are also involved in the discrimination of direction of
arrival in the elevation angle. For example, the reflections of the impinging
signals by the torso and pinna are some important features for the estimation
of elevation angle. These reflections are commonly modeled by head related
transfer functions (HRTFs) [6, 7]. The focus of this paper is on the use of the binaural
cue ILD and estimation of direction of arrival on the horizontal plane.
The spatial characteristics of human hearing will be
focused on when describing the underlying concept of these two cues, ITD and
ILD. It is assumed that the sound is emitted from a monochromatic point source
(i.e., a propagating sinusoidal specified by its frequency, amplitude, and
phase). In direction-of-arrival estimation, the intersensor distance is very
important to avoid spatial aliasing, which introduces direction-of-arrival
estimation errors. The distance between the two ears of a human individual
corresponds roughly to one period (the wavelength) of a sinusoidal with
fundamental frequency
.
(For an adult person, this fundamental frequency is
kHz.) A signal whose frequency exceeds
is represented by more than one period for
this particular distance. Those signals with frequencies below this threshold,
,
are represented by a fraction of a period. Consequently, for a signal whose
frequency falls below
,
the phase information is utilized for direction-of-arrival estimation and this
corresponds to the ITD model. However, for a signal with frequencies above
,
the phase information is ambiguous, and the level information of the signal is
more reliable for direction-of-arrival estimation; this corresponds to the ILD
model. The use of this level information stems from the fact that a signal that
travels a further distance has, in general, lower intensity, and this feature
is more accentuated at higher frequencies. Consequently, the ear closer to the
source would have higher intensity sound than the opposite ear. Also, the human
head itself obstructs signals passing from one ear to the other ear [8, 9].
This discussion (above) gives only a general overview
and is a simplification of many of the processes involved in human
direction-of-arrival estimation. However, this background provides us with the
basis for a simplified human direction-of-arrival estimation model, as
considered in this paper.
3. Proposed Scheme for a Directional Aid
In our scheme,
two external omnidirectional microphones are mounted in the forward direction
on each of the two cups of the hearing defender; see Figure 1. Also, two
loudspeakers are placed in the interior of each cup. These loudspeakers are
employed for the realization of a directional aid.
Figure 1: A hearing defender with directional aid where
external microphone signals,

and

,
are used to impose internal sounds through loudspeakers,

and

,
in order to realize the directional aid.
An overview of the scheme proposed for a directional
aid is shown in Figure 2. Note that in this scheme, the low-frequency signal
components are simply passed without any processing.
Figure 2: Directional aid for enhancing human
direction-of-arrival estimation.
3.1. Signal Model
The microphones
spatially sample the acoustical field, providing temporal signals
and
,
where L and R represent the left and right sides of the
hearing defender, respectively. An orthogonal two-band filter bank is used for
each microphone. The low-frequency (LF) band of this filter bank, denoted by
,
consists of a low pass filter having a cut-off frequency around the fundamental
frequency,
,
corresponding to the ITD spectral band. Similarly, the high-frequency (HF) band
of the filter bank is denoted by
and corresponds to the ILD spectral band.
Since only the ILD localization cue has been employed in our approach, the LF
signals (corresponding to the ITD cues) are simply passed through the proposed
system, unaltered.
The left microphone signal,
,
is decomposed by the two-band filter bank into an LF signal,
,
and an HF signal,
.
Similarly the right microphone signal,
,
is decomposed into LF and HF components,
and
.
The HF components are the inputs to the ILD enhancement block, see Figure 3,
providing enhanced outputs of
and
. The left- and right-side output signals,
and
,
are the sum of LF input signal components and enhanced HF output signal
components according to
and
,
respectively.
Figure 3: A block scheme for the enhancement of ILD cue
for human direction-of-arrival estimation.
These filters,
and
,
are for the sake of simplicity 128 tap long finite impulse response (FIR)
filters, and they have been designed by the window method using Hamming window.
It should be noted that, in a real implementation, it is of utmost importance
to match the passive path to the active (digital) path with respect to signal
delay in order to avoid a possibly destructive signal skew. The impulse
response function of the passive path between the external microphone of a
hearing defender to a reference microphone placed close to the ear canal of a
user is presented in Figure 4. This estimated impulse response has a low pass
characteristic and it has a dominant peak at 7 samples delay with sampling
frequency 8 kHz. Thus, the active path should match this 7 sample delay of the
passive path. This can be achieved in a real implementation by selecting a low
delay (1 sample delay) analog-to-digital and digital-to-analog converters. In
addition, the digital filter bank should be selected (or designed) with a
pronounced focus on group delay in order to satisfy the matching of the passive
and active paths (e.g., by using infinite impulse response (IIR) filter banks).
The Haas effect (also denoted by the precedence effect) [10] pronounces the importance
to minimize the temporal skew between the active and passive paths. An overly
long delay in combination with a low passive path attenuation yields that our
directional aid is unperceived. These aforementioned practical details are
however considered out of the scope of this paper. However, these matters
should be subject to further investigation in a later real-time implementation
and evaluation of the proposed method.
Figure 4: The estimated impulse response
function of the passive path of a hearing defender with a dominant peak after 7
samples and sampling frequency 8 kHz.
3.2. The Proposed ILD Enhancement Scheme
One fundamental
consideration regarding our proposed method involves first distinguishing
whether a signal onset occurs. (A tutorial on onset detection in music
processing can be found in [11], and a method for onset detection for source localization
can be found in [12].)
Once a signal onset has occurred, any other new onsets are disregarded within a
certain time interval, unless a very distinct onset appears. This time interval
is used to avoid undesired false onsets which may occur due to high reverberant
environment or acoustical noise. When an onset is detected, the method
distinguishes which of the sides (i.e., left or right) has the current
attention. For instance, for a signal that arrives to the left microphone
before the right microphone, attention will be focused on the left side, and
vice versa. Based on the information about the onset and the side which
provides the attention, the “unattended” side will be attenuated accordingly.
Hence, the directionality of the sound can be improved automatically.
A detailed description of the important stages of the
proposed method, involving onset detection, formation of side attention, and
gain function computation method for the desired directionality enhancement, is
followed here.
3.2.1. Onset Detection
The envelopes
of each HF input signal are employed in the onset detection. The envelopes are
denoted by
and
.
To avoid mismatch due to uneven amplification among the two microphone signals,
a floor function is computed for each side. These floor functions, denoted by
and
,
are computed as
(1) Here,
represents a factor associated with the
integration time of the floor functions. This integration time should be in the
order of seconds such that the floor functions track slow changes in the
envelopes. The function
takes the minimum value of the two real
parameters
and
.
The normalized envelopes,
and
,
are now computed according to
(2)The envelope difference function
is defined as
(3)A ceiling function,
,
of the envelope difference function is computed according to
(4)Here,
is a real valued parameter that controls the
release time of the ceiling function. This release time influences the
resetting of some attention functions in (7), and this release time should
correspond to the reverberation time of the environment. The function
returns the maximum value of the real parameters
and
.
Now, an onset is detected if the ceiling function
exactly equals the envelope difference function, that is
.
This occurs only when the
function in (4) selects the second parameter,
,
which corresponds to an onset.
3.2.2. Side Attention Decision
In the case of
a detected onset, the values of the normalized envelopes determine the current
attention. If
,
the attention is to the left side and the corresponding attention function
is updated. If, on the other hand,
,
the attention will be on the right side, and the attention function for the
right side is updated. This attention function mechanism is formulated as two
cases:
(5)where the cases
and
are
(6)and
represents a forgetting factor for the
attention functions and its integration time should be close to the expected
interarrival time between two impulses.
3.2.3. Directional Gain Function
To avoid any
false decisions, due to high reverberation environment or acoustical noise, a
long-term floor function,
,
is employed to the ceiling function according to
(7)where the parameter
controls the integration time of this
long-term average, and this integration time should be in the order of seconds
in order to track slow changes in the ceiling function. In order to avoid drift
in the attention functions, they are set to
if the
function of (7) selects the second parameter,
.
This condition will trigger a time after a recent onset has occurred (this time
is determined mainly by
and partly by
). Thereafter, the recent impulse is
considered absent.
Depending upon the values of attention functions of
and
and the ceiling and floor functions of
and
,
the two directional gain functions,
and
,
can be calculated. If
,
the attention will shift towards the left side and consequently the right side
will be suppressed. If, on the other hand, the attention is shifted towards the
right side, that is,
,
then the left side is suppressed. The directional gain functions are computed
according to
(8)where the cases
and
are
(9)Here,
is a mapping function that controls the
directional gain, and should be able to discriminate certain types of sounds.
The mapping function used in this paper is inspired by the unipolar sigmoid
function that is common in neural network literature [13]; it is defined here
as
(10)where the parameter
controls the maximum directional gain imposed
by the proposed algorithm. The parameter
corresponds to a center-point that lies
between the pass-through region (
) and attenuation region (
) of the mapping function. The parameter
corresponds to the transition rate of the
mapping function from the pass-through region to the attenuation region. The
reason for using the quotient of the two parameters,
and
in (10), is to make the mapping function
invariant to scales of the input signal. The various parameters in the present
mapping function have been selected empirically such that impulsive sounds
(which are identified as target sounds) are differentiated from speech
(nontarget sounds). A set of parameters that appear to be suitable in the
tested scenarios are
,
,
and
.
The mapping function in (10) is presented in Figure 5. It is
stressed that these parameters are found empirically through manual calibration
of the algorithm. Optimal parameter values can be found by using some form of
neural training.
Figure 5: Mapping
function (
10) employed in this paper, where

,

,
and

.
Now, the output signals of the ILD enhancement block
can be expressed as
and
.
Consequently, the total output of the directional aid can be obtained as
and
.
3.3. Illustration of Performance
This section
illustrates important output signals with the proposed algorithm. An impulsive
sound signal (gun shots) and a speech signal are used as input for the
algorithm. To aid the illustration, all signals have the peak magnitude 1. The
sampling frequency and the algorithm's parameter values follow those outlined
in Section 4. Four impulses are present; the first two impulses originate from
the left side of the hearing defender, the second two impulses from the right
side of the hearing defender. After 3.5 seconds, only speech is active. Figure 6 illustrates the input with its corresponding directional aid outputs and
other relevant intermediary signals. This illustration highlights the operation
of the algorithm, also demonstrates that the directional information for the
two test signals is in fact enhanced (according to magnitude of the outputs for
the two test impulses).
Figure 6: Input signals and corresponding enhanced
output signals of the directional aid with important intermediary signals. The
first two pulses of the test signal originate from the left, the second two
pulses from the right, and after 3.5 seconds only speech is active.
4. Performance Evaluation
In the
following, the performance and characteristics of the proposed algorithm are
demonstrated. Two cases are investigated. First is the directional aid's
ability to enhance the directionality of impulsive sounds (gun shots) relative
to speech sounds evaluated. Speech is a type of signal that should be
transparent to the algorithm, that is, it should pass through the algorithm
unaltered, since the focus of our algorithm is the enhancement of impulsive
sounds. Second, the directional aid's sensitivity to interfering white noise is
evaluated at various levels of impulsive sound peak energy to interfering noise
ratio (ENR). The signals used in this evaluation are delivered through a
loudspeaker in an office room (reverberation time
milliseconds) and recorded using the
microphones on an active hearing defender; see Figure 1. The sampling frequency
is
kHz, and the parameter values used in the
evaluation are selected as
seconds, and
second, where the actual value of every
parameter
is computed using
,
where
is the time constant (in seconds) associated
to every parameter
.
This approximation is valid for
.
4.1. Performance Measures
The maximal
spectral deviation (MSD) is used as an evaluation measure. The MSD assesses the
maximal deviation (in log-scale) of the processed output signal related to the
unprocessed input signal, and is defined as
(11)where the spectral deviation
is
(12)Here,
and
represent power spectral density estimates of
the processed output signal
and the corresponding input signal
,
where
represents the channel index and
corresponds to the frequency bin index. In
other words, MSD assesses the maximal spectral deviation of the output signal
with respect to the input signal over all channels and all frequencies. In
general, the MSD is high if the process alters the output signal with respect
to the input signal, and MSD is low if the output signal is spectrally close to
the input signal.
For the evaluation of the directional aid's
sensitivity to interfering noise, a directional gain deviation (DGD) measure is
used. This measure compares the directional gains of each channel in an ideal
case when no noise is present (ENR =
), denoted by
and
,
with the case when interfering noise is present at a specific ENR level, while
the directional gains are denoted as
and
.
The DGD measures for each channel are defined as
(13)Consequently, the desired
behavior can be obtained if the directional gains at a specific ENR level
exactly follow the directional gains in the ideal case, yielding the DGD
measures to be zero. Any deviation from this behavior is considered as
nonideal.
4.2. An Impulsive Test Signal
In this first
test, an impulsive type of test signal (gun shots) is used to show the
objective performance. The MSD for this impulsive test signal is 4.3 dB, which
implies that the algorithm spectrally alters this test signal. This is also the
expectation of the algorithm.
4.3. A Nonimpulsive Test Signal
In this second
test, a nonimpulsive test signal (a speech signal) is used to demonstrate the
performance. It is expected that such a signal should be transparent to the
algorithm. The MSD for this speech test signal is
0 dB, which indicates that the algorithm is able
to let such nonimpulsive signals remain spectrally undistorted.
4.4. Sensitivity to Interfering Noise
A mixture of
white Gaussian noise and impulsive sounds acts as an input to the directional
aid. The impulsive sounds are set to have a maximal amplitude of 1. The level
of the interfering noise is then set according to a desired ENR level. The DGD
measures for each channel are presented in Figure 7. This figure indicates that
the directional aid fails to operate for ENR levels below 20 dB.
Figure 7: Directional gain deviation (DGD) measures for
the left channel (solid line) and the right channel (dashed line).
5. Summary and Conclusions
This paper
presents a novel algorithm that serves as a directional aid for hearing
defenders. Moreover, this algorithm intends to provide a protection scheme for
the users of active hearing defenders. The users of the existing hearing
defenders experience distorted directional information, or none at all. This is
identified as a serious safety flaw. Therefore, this paper introduces a new
algorithm and an initial analysis has been carried out. The algorithm passes
nonimpulsive signals unaltered and the directional information of impulsive
signals is enhanced as obtained by the use of a directional gain. According to
some objective measures, the algorithm performs well and a more detailed
analysis including a psychoacoustic study on real listeners will be conducted
in future research. Furthermore, the psychoacoustic study should be carried out
on a real-time system, where the impact of various design parameter values is
evaluated with respect to the psychoacoustic performance with an intended live
application.
The work presented herein is an initial work
introducing a strategy for a directional aid in hearing defenders, with focus
on impulsive sounds. Future research may include enhancing directional
information (other than those related to impulsive sound classes) such as
directionality of, for example, tonal alarm signals from a reversing truck.
Future research may also involve modifications of this
proposed algorithm such as reduction of the sensitivity to interfering noise.
The directional aid may be further enhanced with the addition of a control
structure that restrains enhancement of the repetitive impulsive sounds, such
as those from a pneumatic drill. This would extend the possible application
areas of our directional aid.
References
- B. D. Simpson, R. S. Bolia, R. L. McKinley, and D. S. Brungart, “The impact of hearing protection on sound localization and orienting behavior,” Human Factors, vol. 47, no. 1, pp. 188–198, 2005.
- D. S. Brungart, A. J. Kordik, C. S. Eades, and B. D. Simpson, “The effect of microphone placement on localization accuracy with electronic pass-through earplugs,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA '03), pp. 149–152, New Paltz, NY, USA, October 2003.
- L. D. Hager, “Hearing protection. Didn't hear it comingnoise and hearing in industrial accidents,” Occupational Health & Safety, vol. 71, no. 9, pp. 196–200, 2002.
- P. Rubak and L. G. Johansen, “Active hearing protector with improved localization performance,” in Proceedings of the International Congress and Exposition on Noise Control Engineering (Internoise '99), pp. 627–632, Fort Lauderdale, Fla, USA, December 1999.
- J. Blauert, Spatial Hearing: The Psychacoustics of Human Sound Localization, MIT Press, Cambridge, Mass, USA, 1983.
- D. R. Begault, 3-D Sound for Virtual Reality and Multimedia, Academic Press, San Diego, Calif, USA, 1994.
- R. O. Duda, “Modeling head related transfer functions,” in Proceedings of the 27th Asilomar Conference on Signals, Systems and Computers (ACSSC '93 ), vol. 2, pp. 996–1000, Pacific Grove, Calif, USA, November 1993.
- B. C. J. Moore, An Introduction to the Psychology of Hearing, Academic Press, San Diego, Calif, USA, 4th edition, 1997.
- C. I. Cheng and G. H. Wakefield, “Introduction to head-related transfer functions (HRTFs): representations of HRTFs in time, frequency, and space,” Journal of the Audio Engineering Society, vol. 49, no. 4, pp. 231–249, 2001.
- M. B. Gardner, “Historical background of the Haas and/or precedence effect,” The Journal of the Acoustical Society of America, vol. 43, no. 6, pp. 1243–1248, 1968.
- J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, and M. B. Sandler, “A tutorial on onset detection in music signals,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 1035–1047, 2005.
- B. Supper, T. Brookes, and F. Rumsey, “An auditory onset detection algorithm for improved automatic source localization,” IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 3, pp. 1008–1017, 2006.
- S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall, Upper Saddle River, NJ, USA, 1998.