Philips Research, High Tech Campus 36, 5656AE Eindhoven, The Netherlands
Abstract
Improving the intelligibility of speech in different environments is one of the main objectives of
hearing aid signal processing algorithms. Hearing aids typically employ beamforming techniques using multiple microphones for this task. In this paper, we discuss a binaural beamforming scheme that uses signals from the hearing aids worn on both the left and right ears. Specifically, we analyze the effect of a low bit rate wireless communication link between the left and right hearing aids on the performance of the beamformer. The scheme is comprised of a generalized sidelobe canceller (GSC) that has two inputs: observations from one ear, and quantized observations from the other ear, and whose output is an estimate of the desired signal. We analyze the performance of this scheme in the presence of a localized interferer as a function of the communication bit rate using the resultant mean-squared error as the signal distortion measure.
1. Introduction
Modern digital hearing aids perform a variety of
signal processing tasks aimed at improving the quality and intelligibility of
the received sound signals. These tasks include frequency-dependent
amplification, feedback cancellation, background noise reduction, and
environmental sound classification. Among these, improving speech
intelligibility in the presence of interfering sound sources remains one of the
most sought-after features among hearing aid users [1]. Hearing aids attempt to achieve this goal through
beamforming using two or more microphones, and exploit the spatial diversity
resulting from the different spatial positions of the desired and interfering
sound sources [2].
The distance between the microphones on a single
hearing aid is typically less than 1 cm due to the small size of such devices
for aesthetic reasons. This small spacing limits the gain that can be obtained
from microphone array speech enhancement algorithms. Binaural beamforming,
which uses signals from both the left and right hearing aids, offers greater
potential due to the larger inter-microphone distances corresponding to the
distance between the two ears (16–20 cm). In addition, such a scheme also
provides the possibility to exploit the natural attenuation provided by the
head. Depending on the location of the interfering source, the
signal-to-interference ratio (SIR) can be significantly higher at one ear
compared to the other, and a binaural system can exploit this aspect.
A high-speed wireless link
between the hearing aids worn on the left and right ears has been recently
introduced [3]. This
allows binaural beamforming without the necessity of having a wired connection
between the hearing aids, which is impractical again due to aesthetic reasons.
The two hearing aids form a body area network, and can provide significant
performance gains by collaborating with one another. The performance of
binaural noise reduction systems has been previously studied in, for example,
[4–8]. However these systems
implicitly assume the availability of the error-free left and right microphone
signals for processing. In practice, the amount of information that can be
shared between the left and right hearing aids is limited by constraints on
power consumption imposed by the limited capacity of hearing aid batteries. It
is known [9] that quantization
of a signal with an additional bit causes the power dissipation in an ADC to be
increased by 3 dB. Hence to conserve battery in a hearing aid, it is critical to
compress with as few bits as possible before wireless transmission occurs. One
in five users was reported to be dissatisfied with hearing aid battery life
[10], and it is thus
an important consideration in hearing aid design. In this paper, we study
analytically the trade-off in the performance of a GSC beamformer with respect
to quantization bits.
Different configurations are possible for a binaural
beamforming system, for instance, both hearing aids could transmit their
received microphone signals to a central device where the beamforming is
performed, and the result could then be transmitted back to the hearing aids.
Alternatively, the hearing aids could exchange their signals and beamforming
may be performed on each hearing aid. In this paper, to analyze the effect of
quantization errors on beamforming, without loss of generality we assume that
each hearing aid has one microphone and that the right hearing aid quantizes
and transmits its signal to the left hearing aid, where the two signals are
combined using a beamformer. This paper is an extension of our earlier work
[11], incorporates the
effect of head shadow and presents a more detailed experimental analysis.
If the power spectral density (PSD) of the desired
source is known a priori, the two-microphone Wiener filter provides the optimal
(in the mean squared error sense) estimate of the desired source. The effect of
quantization errors in such a framework has been investigated in [12]. However, in practice the
PSD is unknown. In this paper, we consider a particular beamformer, the
generalized sidelobe canceller (GSC) [13], which does not require prior knowledge of the source
PSD.
The GSC requires knowledge of the location of the
desired source, which is available since the desired source is commonly assumed
to be located at
(in front of the microphone array) in hearing aid applications
[2]. The motivation
behind this assumption is that in most real-life situations, for instance, a
conversation, the user is facing the desired sound source. In a free field, the
two-microphone GSC can cancel out an interfering sound source without
distorting the desired signal, which is a desirable feature in hearing aids.
Thus, the GSC is well suited for hearing aid applications, and we study the
impact of quantization errors on the GSC in this paper.
The performance of the GSC may be affected by other
sources of error such as microphone mismatch, errors in the assumed model (the
desired source may not be located exactly at
, reverberation, and so forth. Variations of
the GSC that are robust to such imperfections are discussed in [14–16]. In this paper, we exclude such errors from our
analysis to isolate the effect of the errors introduced by quantization on the
performance of the GSC.
The remainder of this paper is organized as follows.
We introduce the signal model and the head shadow model we use in
Section 2.
The binaural GSC and its behavior in the presence of quantization errors are
discussed in Section 3.
The performance of the GSC at different bit rates is
analyzed in Section 4. Finally,
concluding remarks and suggestions for future
work are presented in Section 5.
2. Signal Model
Consider a desired source
in the presence of an interferer
,
where
represents the time index. A block of
samples of the desired and interfering signals
can be transformed into the frequency domain using the discrete Fourier
transform (DFT) as
(1)where
is the frequency index. Let
,
and
,
where
indicates complex conjugation. We assume that
the left and right microphones each have one microphone. The signal observed at
the microphone in the left hearing aid can be written as
(2)where
and
are the transfer functions between the
microphone on the left hearing aid and the desired and interfering sources,
respectively, and
corresponds to uncorrelated (e.g., sensor)
noise with
.
The transfer functions
and
include the effect of head shadow. For each
,
we model
,
,
and
as memoryless zero mean complex Gaussian
sources, with variances
,
,
and
,
respectively. Their real and imaginary parts are assumed to be independent with
variances
,
,
and
,
respectively.
The signal observed at the right ear can be written
as
(3)where the relevant terms are
defined analogously to the left ear. We assume that
,
and that
,
,
and
are pairwise independent.
We use the spherical head shadow model described in
[17] to obtain the
head related transfer functions (HRTFs)
,
,
and
.
Define the origin to be the center of the sphere. Let
be the radius of the sphere,
be the distance between the origin and the
sound source, and define
.
Let
denote the angle between a ray from the origin
to the sound source and a ray from the origin to the point of observation (left
or right ear) on the surface of the sphere as shown in Figure 1. The HRTF
corresponding to the angle of incidence
is then given by [17]
(4)with
(5)where
is the Legendre polynomial of degree
,
is the spherical Hankel function of order
,
and
is the derivative of
with respect to its argument.
Figure 1: The head shadow model. The left and right hearing aids
each have one microphone and are located at

on the surface of a sphere of radius

.
Let
denote the angle between the vertical
-axis
and a ray from the origin to the desired source. Let
be defined similarly for the interfering
source. The microphones on the left and right hearing aids are assumed to be
located at
and
respectively, on the surface of the sphere.
For example, if in Figure 1,
,
then the location of the source relative to the left ear is
.
We have
(6)Similarly, the transfer
functions corresponding to the interferer are given by
(7)
We consider the case where the quantities
,
,
,
and
are all unknown. As is typical in hearing aid
applications [2], we
assume the desired source to be located in front of the user, that is,
.
Thus, due to symmetry, the HRTFs between the desired source and the left and
right microphones are equal (this is valid in anechoic environments, and only
approximately satisfied in reverberant rooms). Let
.
The GSC structure [13]
depicted in Figure 2 can then be applied in this situation. The fixed
beamformer simply averages its two inputs as the desired source component is
identical in the two signals. The blocking matrix subtracts the input signals
resulting in a reference signal that is devoid of the desired signal, and forms
the input to the adaptive interference canceller.
Figure 2: Frequency-domain implementation of the GSC.
We assume that the hearing aid at the right ear
quantizes and transmits its signal to the hearing aid at the left ear where the
two are combined. Let
represent the reconstructed signal obtained
after encoding and decoding
at a rate
bits per sample resulting in a distortion
,
where
.
The forward channel with respect to the squared error criterion can be
written as [18, pages
100-101],
(8)where
,
,
and
is zero mean complex Gaussian with variance
.
Recall that we model
,
,
,
and
as memoryless zero mean complex Gaussian random
sources for each
,
with independent real and imaginary parts. The rate-distortion relation for the
complex Gaussian source follows from the rate-distortion function for a real
Gaussian source [18, Chapter 4],
(9)so that the distortion
is obtained as
.
The signals
and
form the two inputs to the GSC.
If the PSDs
,
and
are known, more efficient quantization schemes
may be designed, for example, one could first estimate the desired signal
(using a Wiener filter) from the noisy observation
at the right ear, and then quantize the
estimate as in [12].
However, as the PSDs are unknown in our model, we quantize the noisy
observation itself.
3. The Binaural GSC
We first look
at the case when there is no quantization and the left hearing aid receives an
error-free description of
.
This corresponds to an upper bound in our performance analysis. We then
consider the case when
is quantized at a rate
bits per sample.
3.1. No Quantization
The GSC has
three basic building blocks. The first is a fixed beamformer that is steered
towards the direction of the desired source. The second is a blocking matrix
that produces a so-called noise reference signal that is devoid of the desired
source signal. Finally, the third is an adaptive interference canceller that
uses the reference signal generated by the blocking matrix to cancel out the
interference present in the beamformer output.
The output of the fixed delay-and-sum beamformer is
given by
(10)where
,
.
We can rewrite
as
(11)The blocking matrix is given by
,
so that the input to the adaptive interference canceller
is obtained as
(12)The adaptive filter
is updated such that the expected energy of
the residual given by
is minimized, for example, using the
normalized least mean square algorithm [19, Chapter 9]. Since
does not contain the desired signal,
minimizing
corresponds to minimizing the energy of the
interferer in the residual. Note that none of the above steps require knowledge
of the PSD of the desired or interfering sources.
For our analysis, we require the optimal steady state
(Wiener) solution for
,
which is given by
(13)where
(14)The GSC output can be written
as
(15)and the resulting estimation
error is
(16)where
(17)
3.2. Quantization at a Rate R
The beamformer output in this case is given as
(18)Comparing
(18)
with (11), since
,
it can be seen that while the fixed beamformer preserves the desired source in
the unquantized case, there is attenuation of the desired source in the
quantized case. The blocking matrix produces
(19)It is evident from
(19) that due
to the quantization, the reference signal
is not completely free of the desired signal
,
which will result in some cancellation of the desired source in the
interference cancellation stage. The adaptive interference canceller is given
by
(20)where
(21)where
.
The GSC output in this case is
(22)The corresponding estimation
error is
(23)where
(24)
4. GSC Performance at Different Bit Rates
Using (23)-(24), the behavior of the GSC can be
studied at different bit rates, and for different locations of the interferer.
The solid curves in Figure 3 plot the output signal-to-interference-plus-noise
ratio (SINR) obtained from the binaural GSC at different bit rates for an
interferer located at
.
The output SINR per frequency bin is obtained as
(25)For comparisons, we also plot
the output SINR obtained using a monaural two-microphone GSC (dotted line).
This would be the result obtained if there was only a single hearing aid on the
left ear with the two microphones separated by 8 mm in an end-fire
configuration. In the monaural case, we consider a rate
as both microphone signals are available at
the same hearing aid. To obtain Figure 3, the relevant parameter settings were
,
m,
m,
m, and
m/s. The mean input SIR and signal-to-noise
ratio (SNR) were set to 0 dB and 30 dB, respectively, where
(26)
Figure 3: SINR after processing for input SIR 0 dB, input SNR 30 dB,
and interferer located at 40°. Solid curves correspond to
binaural GSC at the specied bit rates (bits per sample),
and the dotted curve corresponds to the monaural case.
It can be seen from Figure 3 that at a rate of 5 bits
per sample, the binaural system outperforms the monaural system. Note that by
bits per sample we mean bits allocated to each sample per frequency bin. Figure
4 shows the performance of the binaural GSC without considering the effect of
head shadow, that is, assuming that the microphones are mounted in free space.
In this case, the transfer functions
,
,
and
correspond to the appropriate relative delays.
The sharp nulls in Figure 4 correspond to those frequencies where it is
impossible to distinguish between the locations of the desired and interfering
sources due to spatial aliasing, and thus the GSC does not provide any SINR
improvement. It is interesting to note that the differences introduced by head
shadow helps in this respect, as indicated by the better performance at these
frequencies in Figure 3.
Figure 4: SINR after processing for input SIR 0 dB, input SNR 30 dB, and interferer located at 40°, ignoring the effect of head
shadow (microphone array mounted in free space). Solid curves correspond to binaural GSC at the specied bit rates (bits per
sample), and the dotted curve corresponds to the monaural case.
The performance of the monaural system varies
significantly based on the interferer location. When the desired source and
interferer are located close together as in the case of Figure 3, the small end
fire microphone array cannot perform well due to the broad main lobe of the
beamformer. When the interferer is located in the rear half plane, the monaural
system offers good performance, especially at high frequencies.
Figure 5 plots
the output SINR under the same conditions as in Figure 3 except that the
interferer is now located at
and thus there is a larger separation between the desired (located at
and interfering sources. The monaural system
(dotted line) performs better than when the interferer was located at
In this case, the binaural system needs to operate at a significantly higher
bit rate to outperform the monaural system, and the benefits are mainly in the
low-frequency range up to 4 kHz.
Figure 5: SINR after processing for input SIR 0 dB, input SNR 30 dB,
and interferer located at 120°. Solid curves correspond
to binaural GSC at the specied bit rates (bits per sample),
and the dotted curve corresponds to the monaural case.
For an interferer located at
Figure 6 depicts the improvement in SINR averaged over all frequencies after
processing by the GSC, for different values of the SIR and SNR. The improvement
was calculated as
(27) The
largest improvements are obtained at low SIRs and high SNRs, where the adaptive
interference canceller is able to perform well as the level of the interferer
is high compared to the uncorrelated noise in the reference signal
.
At high SIR and low SNR values, the improvement reduces to the 3 dB gain
resulting from the reduction of the uncorrelated noise due to the doubling of
microphones. For low SNR values, the improvement due to the interference
canceller is limited across the entire range of SIR values. However, as the SNR
increases, the interference canceller provides a significant improvement in
performance as can be seen in the right rear part of Figures 6
and 7. At high SNR and SIR values, a low bit rate (e.g., 4 bits
per sample) results in degradation of performance as the loss due to
quantization more than offsets the gain due to beamforming. At low bit rates, the reference signal
,
which forms the input to the adaptive interference canceller, is no longer
devoid of the desired signal. This is one of the reasons for the poor
performance of the binaural GSC at low bit rates as the adaptive filter cancels
some of the desired signal. In fact, as observed in [20], in the absence of uncorrelated noise, the SIR at the
output of the adaptive interference canceller is the negative (on a log scale)
of the SIR in
.
At high input SIRs and SNRs, even a small amount of desired signal leakage
results in a high SIR in
,
which in turn results in a low SIR at the output as seen in
Figure 6. One
approach to avoid cancellation of the desired signal is to adapt the filter
only when the desired signal is not active [21]. The detections may be performed, for example, using
the method of [22].
Figure 6: Improvement in SINR after processing at 4 bits per sample for
interferer located at 40°, and for different values of
SIR and SNR.
Figure 7: Improvement in SINR after processing at 8 bits per sample
for interferer located at 40°, and for different values of
SIR and SNR.
So far, we have looked at the effect of quantization
at a bit-rate
independently with respect to each frequency
bin. In practice, the available
bits need to be optimally allocated to each
frequency band
.
The rate allocation problem can be formulated as
(28)A uniform rate allocation across
the different frequency bins cannot exploit the dependence of the output SINR
on frequency as seen in Figures 3
and 5, and thus a
nonuniform scheme is necessary. The distortion function
does not lend itself to a closed-form solution
for the rate allocation, and suboptimal approaches such as a greedy allocation
algorithm need to be employed. In a greedy rate allocation scheme, at each
iteration, one bit is allocated to the band
where the additional bit results in the
largest decrease in distortion. The iterations terminate when all the available
bits are exhausted. Figure 8 shows the output SINR (averaged across all
frequencies) at different bit rates for both uniform and greedy rate
allocation. Here, the desired and interfering signals were assumed to be
speech. The signals, sampled at 16 kHz, were processed in blocks of
samples, and the results were averaged over
all blocks. Figure 9 shows the PSD of a segment of the signal. It can be seen from Figure 8 that the greedy allocation (dotted)
scheme results in better performance compared to the uniform rate allocation
(solid) scheme. However, we note that the greedy algorithm requires knowledge
of the PSDs
and
,
and the location of the interferer.
Figure 8: Improvement in SINR after processing averaged across all frequencies at different bit rates (kbps) for uniform rate
allocation (solid) and greedy rate allocation (dotted).
Figure 9: The PSD

, of a segment of the signal used to obtain the results in Figure
8.
5. Conclusions
A wireless data link between the left and right hearing
aids enables binaural beamforming. Such a binaural system with one microphone
on each hearing aid offers improved noise reduction compared to a
two-microphone monaural hearing aid system. The performance gain arises from
the larger microphone spacing and the ability to exploit the head shadow
effect. The binaural benefit (improvement compared to the monaural solution) is
largest when an interfering source is located close to the desired source, for
instance, in the front half plane. For interferers located in the rear half
plane, the binaural benefit is restricted to the low-frequency region where the
monaural system has poor spatial resolution. Unlike the monaural solution, the
binaural GSC is able to provide a uniform performance improvement regardless of
whether the interferer is in the front or rear half plane.
Wireless transmission is power intensive and battery
life is an important factor in hearing aids. Exchange of microphone signals at
low bit rates is thus of interest to conserve battery. In this paper, the
performance of the binaural system has been studied as a function of the
communication bit rate. The generalized sidelobe canceller (GSC) has been
considered in this paper as it requires neither knowledge of the source PSDs
nor of the location of the interfering sources. Both the monaural and binaural
systems perform best when the level of uncorrelated noise is low, that is, at
high SNRs, when the adaptive interference canceller is able to fully exploit
the availability of the second signal. At an SNR of 30 dB and an SIR of 0 dB, the
binaural system offers significant gains (15 dB SINR improvement for interferer
at
even at a low bit rate of 4 bits per sample.
At higher input SIRs, a higher bit-rate is required to achieve a similar gain.
In practice, the total number of available bits needs
to be optimally allocated to different frequency bands. An optimal allocation
would be nonuniform across the different bands. Such an allocation however
requires knowledge of the source PSD and the location of the interferer.
Alternatively, a suboptimal but practically realizable uniform rate allocation
may be employed. It has been seen that such a uniform rate allocation results
in a performance degradation of around 5 dB in terms of SINR compared to a
nonuniform allocation obtained using a greedy optimization approach.
The main goal of this paper has been to investigate
the effect of quantization errors on the binaural GSC. Several extensions to
the basic theme can be followed. Topics for future work include studying the
effect of reverberation and ambient diffuse noise on the performance of the
beamformer. Binaural localization cues such as interaural time and level
differences have been shown to contribute towards speech intelligibility.
Future work could analyze the effect of quantization errors on these binaural
cues.
References
- S. Kochkin, “MarkeTrak V: ‘Why my hearing aids are in the drawer’: the consumers' perspective,” The Hearing Journal, vol. 53, no. 2, pp. 34–42, 2000.
- V. Hamacher, J. Chalupper, J. Eggers, et al., “Signal processing in high-end hearing aids: state of the art, challenges, and future trends,” EURASIP Journal on Applied Signal Processing, vol. 2005, no. 18, pp. 2915–2929, 2005.
- Oticon, “True binaural sound processing in new Oticon Epoq signals paradigm shift in hearing
care,” Press release, April 2007, http://www.oticon.dk/dk_da/Information/PressReleases/downloads/epoq_april2007.pdf.
- M. Dorbecker and S. Ernst, “Combination of two-channel spectral subtraction and adaptive Wiener post-filtering for noise
reduction and dereverberation,” in Proceedings of European Signal Processing Conference (EUSIPCO '96), pp. 995–998, Trieste, Italy, September 1996.
- J. G. Desloge, W. M. Rabinowitz, and P. M. Zurek, “Microphone-array hearing aids with binaural output—I: fixed-processing systems,” IEEE Transactions on Speech and Audio Processing, vol. 5, no. 6, pp. 529–542, 1997.
- D. P. Welker, J. E. Greenberg, J. G. Desloge, and P. M. Zurek, “Microphone-array hearing aids with binaural output—II: a two-microphone adaptive system,” IEEE Transactions on Speech and Audio Processing, vol. 5, no. 6, pp. 543–551, 1997.
- V. Hamacher, “Comparison of advanced monaural and binaural noise reduction algorithms for hearing aids,” in Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing
(ICASSP '02), vol. 4, pp. 4008–4011, Orlando, Fla, USA, May 2002.
- T. J. Klasen, S. Doclo, T. van den Bogaert, M. Moonen, and J. Wouters, “Binaural multi-channel wiener filtering for hearing aids: preserving interaural time and level differences,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '06), vol. 5, pp. 145–148, Toulouse, France, May 2006.
- R. H. Walden, “Analog-to-digital converter survey and analysis,” IEEE Journal on Selected Areas in Communications, vol. 17, no. 4, pp. 539–550, 1999.
- S. Kochkin, “MarkeTrak VII: customer satisfaction with hearing instruments in the digital age,” The Hearing Journal, vol. 58, no. 9, pp. 30–43, 2005.
- S. Srinivasan, A. Pandharipande, and K. Janse, “Effect of quantization on beamforming in binaural hearing aids,” in Proceedings of the 3rd International Conference on Body Area Networks, Tempe, Ariz, USA, March 2008.
- O. Roy and M. Vetterli, “Collaborating hearing aids,” in Proceedings of MSRI Workshop on Mathematics of Relaying and Cooperation in Communication Networks, Berkeley, Calif, USA, April 2006.
- L. Griffiths and C. Jim, “An alternative approach to linearly constrained adaptive beamforming,” IEEE Transactions on Antennas and Propagation, vol. 30, no. 1, pp. 27–34, 1982.
- O. Hoshuyama, A. Sugiyama, and A. Hirano, “A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters,” IEEE Transactions on Signal Processing, vol. 47, no. 10, pp. 2677–2684, 1999.
- W. Herbordt and W. Kellermann, “Frequency-domain integration of acoustic echo cancellation and a generalized sidelobe canceller with improved robustness,” European Transactions on Telecommunications, vol. 13, no. 2, pp. 123–132, 2002.
- B.-J. Yoon, I. Tashev, and A. Acero, “Robust adaptive beamforming algorithm using instantaneous direction of arrival with enhanced noise suppression capability,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '07), vol. 1, pp. 133–136, Honolulu, Hawaii, USA, April 2007.
- R. O. Duda and W. L. Martens, “Range dependence of the response of a spherical head model,” The Journal of the Acoustical Society of America, vol. 104, no. 5, pp. 3048–3058, 1998.
- T. Berger, Rate Distortion Theory: A Mathematical Basis for Data Compression, Information and System Sciences Series, Prentice-Hall, Englewood Cliffs, NJ, USA, 1971.
- S. Haykin, Adaptive Filter Theory, Prentice-Hall, Englewood Cliffs, NJ, USA, 3rd edition, 1995.
- B. Widrow, J. R. Glover, Jr., J. M. McCool, et al., “Adaptive noise cancelling: principles and applications,” Proceedings of the IEEE, vol. 63, no. 12, pp. 1692–1716, 1975.
- D. van Compernolle, “Switching adaptive filters for enhancing noisy and reverberant speech from microphone array recordings,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '90), vol. 2, pp. 833–836, Albuquerque, NM, USA, April 1990.
- S. Srinivasan and K. Janse, “Spatial audio activity detection for hearing aids,” in Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP '08), pp. 4021–4024, Las Vegas, Nev, USA, March-April 2008.