Department of Electronic and Electrical Engineering, School of Engineering, Trinity College, Dublin, Ireland
Academic Editor: Joe C. Chen
Abstract
This paper proposes a technique for determining the distance between a sound source and the microphones in an array. The proposed “Range-Finder” algorithm is robust in the presence of reverberation and, in contrast with previously published source-localization techniques, does not require knowledge of the relative positions of the microphones. We discuss the factors affecting the accuracy of our range estimates and present the results of experiments using simulated and real data
to demonstrate the efficacy of our approach.
1. Introduction
Estimating
the distance between a source and a receiver has been a central problem in
array signal processing since the earliest days of radar and sonar. For indoor
applications, using microphone arrays, such estimates could have use in source
localization or speaker tracking. In addition, they could inform decisions
regarding microphone selection, allowing us to select the microphone(s) nearest
the source or farthest from some likely interference. Range estimates could
also have use in determining appropriate speech enhancement strategies, such as
when deciding whether or not to use a dereverberation algorithm.
Typically, range is determined by measuring the
time-of-flight of a transmitted or reflected soundwave and multiplying it by
some known propagation speed. In [1] this is achieved by simultaneous transmission of a
soundwave and a “time-stamped” radio signal. Provided that the
transmitter and receiver are synchronized, the time-of-flight may be easily
obtained as the difference in the times of transmission and reception. However,
in a majority of cases the sources of interest will not be specifically
designed transmitters and so such techniques have limited application.
Given the knowledge of the relative microphone
positions, the source-microphone range may easily be obtained from estimates of
the relative position of the source—an end to which a variety of solutions
have been proposed.
For the sake of clarification, we note that many of
the methods, presented in the literature as “source-localization”
techniques, are, in fact, solutions to the related but distinct problem of
delay-vector estimation, that is, obtaining the relative intersensor time-delay
estimates (TDEs). Furthermore, in many cases, the source “location” is
defined in terms of a bearing line only. In this paper, we use the term
“source localization” to refer to the problem of estimating the position
of a source, with respect to some coordinate system.
Much of the previously published work on source
localization has focused on the use of TDEs (see [2] and the references therein for a review of
time-delay-estimation techniques). In the two-dimensional case, source
localization may be considered a practical application of Apollonius' problem
of tangent circles [3]. The numerical solution to this problem, as
discovered by Viète (see [4] for a description of his solution), may be easily
expanded to the three-dimensional case and, given TDEs between a minimum of
four microphones (three in the two-dimensional case), a source location may be
found. In [5], TDEs are determined for pairs of microphones in a
series of four-element, square microphone arrays. From these, source-bearing
lines are calculated, with the final source location estimate being calculated
as a weighted average of the closest intersections between bearing-line pairs.
In [6, 7] the authors estimate the source location via a least-squares
fitting of the TDEs for an ad hoc deployment of sensors.
Relative range estimates may also be obtained from a
comparison of received signal power. In [8] the authors combine TDEs and relative signal power measurements
to determine the location of a source in the extreme near-field of a
two-element array. In [9] the authors present a method for source localization
that utilizes received signal energy only. Whilst this technique is reported as
returning consistently accurate source-bearing estimates, in the presence of
reverberation range estimation is shown to be subject to a significant bias.
The use of techniques employing power measurements is
commonly restricted to nonreverberant acoustic environments, or to situations
where the effects of reverberation are negligible. This is due to the
difficulty inherent in modelling and/or mitigating against the presence of
reverberation and its consequent adverse effects. Techniques that use TDEs only
are preferred when reverberation is present although, as we have noted, these
require knowledge of the relative microphone positions.
However, for many practical applications, microphone
locations will be unknown or unreliable. Yet, the question of how to estimate
the range between a sound source and a microphone, in the presence of
reverberation and with the relative positions of the microphones unknown,
remains largely unaddressed. We propose a solution to this problem. Our method
combines relative power measurements with TDEs in such a way as to mitigate
against the adverse effects of reverberation and obtain absolute
source-microphone range estimates for microphones at unknown locations.
In the following section, we will briefly discuss the
relevant characteristics of sound propagation in rooms. In Section 3, we derive
a well-known but naïve range estimator as well as the proposed
“Range-Finder” algorithm. In Section 4, we address the factors affecting
range-estimate distribution. In Section 5, we present the results of a series
of simulations and experiments designed to test the performance of our
algorithm. We discuss the potential uses of the Range-Finder algorithm and
suggest future work in Section 6.
2. Sound Propagation in Rooms
In a noiseless but reverberant environment, the signal
received at some microphone,
, will consist of a direct-path component and multiple
reflected components jointly referred to as reverberation. The input to the
microphone may be modelled as the convolution of the source-microphone impulse
response,
, and the source signal,
:
(1)
In the frequency domain,
(2)
where
is the
component of
due to
direct-path (nonreflected) propagation and
is the
reverberant component due to multipath reflections. The received signal power
spectrum may be calculated as follows. Note that, for clarity, we omit the
dependence on
in the sequel
(3)
where
denotes the
real component and
denotes the complex conjugate.
In air, for an omnidirectional source and receiver,
the power of the direct-path component of sound, received at
, is inversely proportional to the squared
source-microphone range, that is, the squared distance between the source and
the microphone,
(4)
where
and
and
denote the
Cartesian coordinates of the source and
, respectively. The direct-path component therefore
decays at a rate of
per doubling of
the source-microphone range. This model does not address effects due to
variations of air pressure or temperature, however, in a room environment it is
reasonable to assume a homogenous medium. From (4), we may derive an expression for the power of the
direct-path component of the sound received at some microphone
:
(5)
The reverberant component of an impulse response will
be dependent upon factors such as the dimensions and surface absorption
characteristics of the room. These vary widely from room to room and so we
cannot know
a priori.
Typically, the degree to which a room is reverberant
is described with reference to a metric known as the reverberation time (
). The
is defined as
the average time taken for the reverberant sound energy to decay by
dB. Although
useful for conveying a general idea of how reverberant a room may be,
specifying the
gives no idea
of how reverberant a recorded sound will be. Consider, for example, a recording
made in a room at a distance of
from a sound
source. This recording will be perceived as being less reverberant than one
made in the same room at
from the
source. This is because the direct path component decays as we get farther from
the source, despite the
being the same
in each instance.
A more effective way of describing the degree of
reverberation that obtains on a recording is to specify the
direct-to-reverberant ratio (
), that is, the
ratio of the received sound energy due to the direct-path component and
multipath reflections. For a given bandwidth, the
at the
microphone,
, may be defined
as follows:
(6)
An investigation of DRRs in real rooms proves
informative. Figure 1 shows a plot of DRRs, found at a variety of
locations in an office, classroom, and reception hall. The DRRs are plotted
with respect to
. The reverberation times were determined
experimentally using the transient decay method [10] and were found to be
and
seconds,
respectively. The DRR estimates were obtained as follows. Recordings were made
at varying locations in each room and at varying distances relative to a single
source—in this case a loudspeaker. The sampling rate was
kHz. In each
instance, the microphone was placed directly in front of the loudspeaker so as
to avoid complications due to the directivity of the source. The loudspeaker
produced a maximum-length-sequence (MLS) of approximate duration
seconds, also
at a sampling rate of
kHz. These
recordings were then cross-correlated with the “clean” MLS to obtain an
impulse response estimate, from which a DRR estimate was calculated.
Figure 1: Direct-to-reverberant ratios versus

, where

is the
source-microphone range. Results shown are for an office, classroom, and
reception hall.
Figure 1 also shows “best-fit” linear approximations of
the data. The slopes of these fits are
and
decibels per
doubling of range for the office, classroom, and hall, respectively. Given that
we can expect
to decay at a
rate of
dB per doubling
of the source-microphone range, these results suggest that, in a given room,
(where
is the
expectation operator) is a constant that is independent of the
source-microphone range.
We define the following:
(7)
where the
and
subscripts
denote the impulse response components corresponding to the microphones
and
, respectively. Consider the cross-terms in (7). Direct path propagation applies a delay and scaling
to a sound wave. Therefore, for any source-microphone impulse response,
is a scaled
exponential. Similarly,
may be
considered to be the sum of scaled exponentials corresponding to multiple
reflected sound waves. As such,
is also the sum
of multiple scaled exponentials. Therefore, invoking the central limit theorem,
we will assume
and
to be zero-mean
normally distributed random variables. Following from our previous results, we
also assume
and
to be random
variables distributed about the same mean. Therefore, invoking the central
limit theorem once again, we may consider
to be a
zero-mean normally distributed random variable.
Note that if
and
are nonzero at
,
will exhibit a
positive bias. We may ignore this, however, as the frequency responses of real
microphones will not have a nonzero component at
.
As an aside, we note that a brief inspection of the
results in Figure 1 reveals that although it had the greatest
, the reception hall was not the most reverberant of
the rooms in which we took measurements. This further illustrates the
inadequacy inherent in characterizing the degree of reverberation in a room by
specifying its
alone. Our
results do, however, suggest an alternative metric. The intercept of best-fit
line with the
-axis defines the spatially averaged “DRR-at-
” and we
will use this metric to describe acoustic conditions in the sequel.
3. Range Estimation
In this section, we derive two range estimation
algorithms: firstly a well-known but naïve range estimator that assumes an
anechoic environment, and secondly the proposed algorithm, which we refer to as
the Range-Finder and which is robust against the effects of reverberation.
3.1. A Naïve Range Estimator
When
is the relative
intersensor time-delay between
and
,
(8)
where
is the speed of
sound in air. Using any one of a variety
of time-delay estimation techniques, we may obtain an estimate of the relative
intersensor time-delay,
. In noiseless, anechoic environments the direct-path
sound accounts for all acoustic energy received by the microphones and so, by
substituting (3) and (8) into (5) and
performing algebraic manipulation, we obtain a simple and well-known estimator
of
:
(9)
Unfortunately, in nonideal acoustic environments, the
presence of interfering reverberation can severely distort this estimate,
making the above range estimator unsuitable for use in practical environments.
Where more than two microphones are available, the most accurate range estimate
will be obtained by using only those two microphones closest to the source.
These may be presumed to have the highest DRRs. The outputs of the remaining
microphones will contain proportionally greater levels of reverberation and
will, therefore, lead to greater distortion in the range estimates.
3.2. The Range-Finder Algorithm
From (5) and (8),
(10)
The term in the square brackets is a function of
, and
which we denote
as
:
(11)
Integrating (3) across the full bandwidth of the signal, we obtain
—the total
received signal power at
:
(12)
We define
as being the
difference between the total received signal power at
and
:
(13)
Let us assume, for the moment, that
is a constant
with respect to frequency (we will return to this assumption later).
Substituting (12) into (13) and performing algebraic manipulation yields
(14)
where
. From (14), we see that the difference between the signal power
received at two microphones is proportional to the sum of a scaled, deterministic
function,
, and a zero-mean and normally distributed random
variable,
. We define the following vectors, noting that we have
omitted the arguments of the
terms for
clarity:
(15)
Once again, using any of the many well-known
techniques for delay-vector estimation, we may obtain the time-delay estimates
and
. We then define
and the
corresponding vector
from
(16)
Following from the Cauchy-Schwartz inequality, the
optimal range estimate,
, is obtained by a matched-filtering of the
power-difference vector,
, with
:
(17)
Following from this estimate, we may easily obtain
estimates of the remaining source-microphone ranges,
, by inserting
and the TDEs
used to calculate
into (8).
Previously, we assumed
to be a
constant with respect to frequency. In many cases, including that of human
speech, this is unrealistic. In reality, speech is both a lowpass and often
harmonic signal. This poses particular problems. We have assumed
to be a
zero-mean, normal random variable. The analysis and experimental evidence
underpinning this assumption are for broadband signals and we cannot reasonably
expect it to hold for cases, such as speech, where the bulk of the energy is
concentrated at low frequencies.
This problem was overcome as follows. The microphone
outputs are split into individual, nonoverlapping subbands. The bandwidth of
these subbands are chosen such that they are narrow enough that
is roughly
constant within the subband whilst also being wide enough that there is always
a direct-path speech component present.
is then
calculated for each subband. Each
is normalized
and, from these, an average power-difference vector,
, is found across all the subbands. The range estimate
is found, as in (17) by a matched filtering of
with
.
4. Estimate Distribution and Accuracy
Given multiple estimates for range, we might expect
that, as the number of estimates increases, their mean will approach the true
range. As we will see in the following section, this is not necessarily the
case. We will also show how the accuracy of a range estimate is dependant upon
the actual source-microphone ranges. We restrict our analysis to the situation
where we have three microphones only—the minimum number required to
implement the Range-Finder. We do this both for the sake of simplicity and to
allow us to employ an alternative formulation of the Range-Finder algorithm.
This alternative formulation more clearly illustrates how the distribution of
range estimates is related to the distribution of the ratio of normal random
variables, a well-understood, albeit nontrivial, distribution that has received
extensive study in the literature.
4.1. An Alternative Formulation of the Range-Finder
The range estimate,
, is that which maximizes the expression in (17). For two vectors with given norms, the dot product
of the vectors is a maximum when they are proportional. Therefore, we may write
. For the three-microphone case, this implies
(18)
Using an equivalent expression, we define
:
(19)
and from this, we obtain an alternative formulation
for the Range-Finder:
(20)
For 3 microphones there are, of course, 5 further
permutations of
(
, etc.). However, all may be shown to yield identical
range estimates and so we will consider only
. Furthermore, to simplify our analysis, we will
assume that
. We note that this relationship is for simplicity
only and is not an absolute requirement. Rather, it is merely a result of the
arbitrary way in which we assign labels to the microphones. Once again,
omitting the arguments of the
terms for
clarity:
(21)
From (21), we see that
is the ratio of
normally distributed and correlated random variables, with unknown variances
and means of
and
, respectively. Such a ratio is itself a Cauchy
distributed random variable.
4.2. Cauchy Distribution
In [11] it is shown that, following a translation and a
change of scale,
has the same
distribution as the ratio of two uncorrelated normal random variables of unity
variance,
. The real constants
and
may be
calculated as follows:
(22)
where
is the standard
deviation of
,
is the
correlation between
and
(which may be
shown to be
), and the sign
of
is chosen to be
the same as that of
. For the sake of simplicity and to avoid unwieldy
equations, the following discussion will be with reference to the simplified
standard form
. From [12], the probability density function (PDF),
, of
may be given as
shown below:
(23)
Figure 2 shows the PDFs for varying values of
and
. A very wide variety of distribution shapes are
possible and the ones shown are chosen for specific illustrative purposes. For
a more complete selection of graphs please see [12]. Shown also is
(dashed line).
In Figure 2, the distributions are not symmetric about
. In addition and contrary to what we might expect,
the “mean” of
is not
. In fact, strictly speaking, the mean and variance of
do not exist.
This is because
is undefined
when the denominator equals zero.
Figure 2: Portions of the
PDFs of

, also shown is

(dashed
line).
In practice, we may calculate a pseudomean and
pseudovariance by considering only those estimates that fall within certain
bounds. A natural bound would be that value of
corresponding
to a range estimate of zero meters (negative range estimates cannot be
correct). In setting such bounds, however, we should be mindful that the
consequent truncation of the PDF may introduce a bias into the pseudomean.
In general, when defined within sufficiently wide
bounds, the pseudomean tends towards
for
, as occurs when
. Furthermore, under these conditions,
tends to have
quite a narrow distribution (see Figure 2(c)). Unfortunately, the converse is also the case. In
general, without knowing
or
, we cannot calculate/estimate the distribution of
and, hence,
cannot quantify the bias that any given bounds may introduce. We can, however,
identify certain situations in which such a bias is likely to be very large.
Consider the case where
, that is, when the array is remote from the source.
From inspection of (11), we see that under these conditions,
. As a result,
is widely
distributed, causing our range estimates to exhibit a large variance and,
depending upon the bounds used, the mean of the range estimates to be subject
to a potentially large bias.
4.3. The Effect of Array Geometry
The actual source-microphone ranges determine the
values of
and
. We have seen how these parameters can affect the
distribution of
and bias its
pseudomean away from
. In this respect, therefore, the accuracy with which
we may estimate range is determined by the array geometry. Array geometry also
determines the extent to which a bias/error in
translates into
an error in the corresponding range estimate. To investigate this second effect
of array geometry, we examine how a fixed bias,
, translates into an error in the range estimate.
Consider an estimate,
, of the true range,
, and let us assume that this estimate contains some
error,
:
(24)
As an illustrative example, we plot
against
for
in Figure 3. Outside of a small region around
, as
increases the
slope of the graph reduces and
becomes larger.
Figure 3:

versus

for

. Range estimate error increases with

.
Figure 4, showing
with respect to
and
, provides a more complete description of how array
geometry affects estimate accuracy. Note that the region where
is not shown as
in this region
, obscuring the remaining detail in the graph.
However, it is the region where
that is of
particular interest. Here,
approaches zero
leading to a very large
. In the extreme case, where
, no range estimate may be found as
will be unity
for all values of
. Similarly, no range estimate may be found if
or
equals zero, as
will be zero or
undefined, respectively, for all values of
.
Figure 4:

with respect to

and

.
The analysis in this section has been limited to the
three microphone case. However, the results of our analysis have implications
for implementations of the Range-Finder using any number of microphones. To
obtain accurate range estimates, we require access to a minimum of three
microphones for which no two are equidistant (or approximately equidistant)
from the sound source. Furthermore, we will not achieve accurate range
estimation when
. Under such conditions we may expect
to exhibit a
wide distribution and significant bias. This bias/error will then translate
into a large error in the range estimate due to
.
We should not, therefore, apply the Range-Finder
algorithm in what might be considered the classical microphone array scenario,
that of closely spaced microphones and a distant, “farfield” source.
Rather, successful implementation would require microphones to positioned in
such a way that they are unlikely to be equidistant from the source and,
ideally, we will have access to at least 3 microphones for which
. We will discuss this further and consider the
potential applications of the Range-Finder algorithm in Section 6.
5. Simulations and Experiments
5.1. Simulations
A series of simulations were performed to examine the
performance of the Range-Finder algorithm and compare it to that of the naïve
range estimator under varying reverberant conditions. Our simulated
environment, Figure 5, was a simple rectangular room of dimensions
and uniform
surface absorption coefficient of
. In this room, we simulated three omnidirectional
sources and six omnidirectional microphones (see Table 1 for coordinates). The sampling frequency used was
kHz. The
source-microphone impulse responses were generated using an acoustic modeling
software package [13]. A ray tracing algorithm was used to determine first
milliseconds of
the impulse response after and including the arrival of the direct-path
component. Statistical, random reverberant tails were used for the remaining
reflections. Two “source signals”—a maximum-length sequence (MLS) of
seconds in
duration and concatenated voice samples of approximately
seconds total
duration, both bandlimited to avoid aliasing—were convolved with each
impulse response to obtain the simulated “recordings.” The TDEs were
calculated geometrically, using the source and microphone coordinates and a
known speed of sound.
Table 1: The coordinates of the microphone and source locations
for the simulated room. Coordinates are in meters.
Figure 5: A diagram of
the simulated room and setup. For precise coordinates of the microphones and
loudspeakers, see Table
1.
The recordings were split
into segments of
samples and
windowed using a Hamming window. The segment overlap was
. In the case of the speech recordings, the signals
were separated into eight nonoverlapping subbands with bandwidth
kHz and
was determined
as described in Section 3. For each segment, the Range-Finder algorithm
(original formulation (17)) was then used to estimate the distance between the
sources and each of the microphones. Negative range estimates and estimates
greater than
were ignored—having been determined that wider boundaries did not increase the accuracy
of the range estimates.
To investigate the effect of reverberation, the
at
of the
simulated room was varied by applying an appropriate scaling to the direct-path
components of the simulated impulse responses. Range estimates were then
obtained as previously described. The results for each source are shown in
Figures 6 and 7. The mean of the range estimates,
one standard
deviation, is shown with respect to the
at
. The results shown relate to the estimates of
only. Estimates
of the remaining ranges (
to
) are omitted
because, as is apparent from (8), these will exhibit an identical bias and
distribution to those corresponding to
. Note that
is the closest
microphone to each source. The estimates of
will,
therefore, exhibit the greatest percentage error.
Figure 6: Mean range
estimates

standard deviation
for source producing an MLS.
Figure 7: Mean
range estimates

standard
deviation for a voice source.
The means of the results obtained using the voice
recordings are slightly more accurate than those found using the MLS
recordings, albeit with a significantly greater variance. Each set of graphs
shows that the range estimates are subject to a negative bias that reduces as
the reverberation levels decrease. In Section 4.2, we discussed the factors
that may explain the presence of a bias in the range estimates. While it is not
necessarily the case that any such bias should be negative, from inspection of
the PDFs in Figure 2 we see that the density below the mean tends to be
greater than that above. Therefore, we may speculate that, for a finite number
of estimates, any bias present would tend to be negative, although the precise
nature of such a bias is ultimately determined by the reverberation levels
present and the array geometry and estimate bounds used.
In Figure 8, the performance of the Range-Finder algorithm is
compared to that of the naïve range estimator derived in Section 3. The
estimates made using the naïve range estimator were found using the two
microphones closest to the source so as to achieve the best possible results.
The results shown are for Source 2 but are illustrative of the results obtained
for the other sources. In both the voice and MLS cases, the Range-Finder
algorithm outperforms the naïve range estimator.
Figure 8: A
comparison of mean range estimates (

one standard
deviation) for the naïve range estimator and the Range-Finder algorithm.
5.2. Experiments
A series of recordings were made to test the
Range-Finder under real conditions. The room used was the office, which was
chosen for being a highly reverberant environment that would best highlight the
superior performance of the Range-Finder over the naïve range estimator. Six
microphones were positioned at distances of between
and
from a
loudspeaker, at intervals of roughly
. The loudspeaker and microphones were arranged so as
to be approximately colinear, so as to avoid errors due to the directionality
of the source. Voice and MLS signals were produced by the loudspeaker. The
microphone outputs were recorded before being bandlimited and downsampled to a
sampling rate of
. These recordings were then split into segments of
samples and
windowed using a Hamming window.
The segment overlap was
. The TDEs were found using a PHAT-GCC [14] and range estimates were obtained for each segment.
This procedure was repeated for each of three setups in which the loudspeaker
and microphones were arranged colinearly along the length and each diagonal of
the office, respectively.
The results are shown in Figure 9 and, as with the simulations, clearly show the
superior performance of the Range-Finder method. As before, the variances of
the results found using voice recordings are greater than those found using MLS
recordings, however, there is no noticeable trend with respect to the bias in
the mean of the estimates.
Figure 9: Mean range estimates

standard
deviation from real-room recordings.
6. Discussion
We have proposed a method for estimating
source-microphone ranges that is robust against the effects of reverberation.
We have discussed the factors affecting the distribution and accuracy of the
range estimates obtained by our method and have presented simulated and real
experimental results demonstrating its efficacy.
In contrast with source-localization techniques, our
method requires no information regarding microphone locations in order to
return a range estimate. However, our analysis in Section 4 revealed that the
accuracy of the range estimates so obtained is, nonetheless, affected by the
relative positioning of the microphones and the sound source. In particular, it
was found that we can expect the range estimates to be inaccurate if
. Rather, successful implementation of the
Range-Finder requires that the microphones be positioned such that there is a
sufficient “spread” in the distances from the source to each microphone.
This then precludes the application of the
Range-Finder method to the classical scenario of closely spaced microphones and
a farfield source. Nonetheless, there are several scenarios in which this
requirement is likely to be met and, hence, to which we may successfully apply
the Range-Finder method. Consider, for example, the case in which it is
required to capture the contributions of a large and distributed group of
talkers using a finite number of remote microphones. Under such conditions, it
may be found that the classical approach of concentrating the microphones in a
closely spaced array causes many of the participants to be a significant
distance from all available microphones. As the DRR of recorded sound reduces
with increasing distance (see Figure 1) this could cause the contributions from some talkers
to be degraded unacceptably. We may, then, prefer to distribute the microphones
throughout or around the group of participants such that every potential talker
is sufficiently close and has unobstructed access to at least one microphone.
Given the wide distribution of the microphones, it is also likely that, when
the sound source is any given talker, we will have access to at least three
microphones for which
. We may, therefore, expect accurate range estimates.
We also note that it is often most advantageous to be
able to estimate source-microphone ranges in scenarios in which these are not
equal for all microphones (so that we may determine which microphones are
closest/farthest away, etc.). In addition, when microphones are widely
separated, determining their relative locations is likely to be cumbersome and
prone to error. Where microphones are frequently moved, say in response to
changes in the distribution of participant talkers, it may not be practical to
measure microphone locations at all. The Range-Finder algorithm is, therefore,
most effective in precisely those scenarios in which it may be required to
estimate source-microphone ranges in the absence of reliable microphone-location
information.
Our analysis in Section 4 identified scenarios in
which the Range-Finder is likely to be inaccurate. Conversely, however, it is
possible to specify situations in which the Range-Finder will perform well
where many source-localization techniques fail completely. Consider, for
example, a situation in which the microphones and sound source are colinear.
For such a setup, the intersensor time delays will be identical for all
(assuming that
the source is not in the interior of the array). As a result, no TDE-based
localization technique can return a unique estimate of
. Where the source and microphones are nearly
colinear, we can expect significant error in our range estimates due to errors
in the TDEs.
It is apparent, therefore, that the relative positions
of the microphones and sound source have a significant bearing upon the
accuracy or otherwise of source localization algorithms as well as that of the
Range-Finder method. For this reason, any experimental comparisons made between
their relative performances would yield scenario-specific results that could
not be considered valid in general.
So far, we have assumed an omnidirectional source. In
doing so, we have ignored a very pressing practical problem. In reality,
sources of interest are likely to be directional and the received sound
intensity will depend not only upon the microphone's distance from the source
but also its relative azimuth and elevation. If the azimuth-elevation-dependant
gain were known for each microphone, it could easily be included in our
formulation of the Range-Finder. However, we are unlikely to have such information,
or, indeed, to know the orientation of the source relative to the microphones.
A further complicating factor is that source directionality is
frequency-dependant, with sources typically becoming increasingly directional
with frequency.
We should, however, be careful not to overstate the
difficulties that directionality presents. Some studies would suggest that
directivity would not be a significant factor at frequencies below
kHz and within
an azimuth of
relative to the
direction in which a talker is facing [15]. If we could assume that the microphones were within
some angular boundaries relative to the source, then we may apply the
Range-Finder with confidence. Yet, in the absence of comprehensive data
regarding azimuth-elevation-dependant gain for the source of interest, it is
hard to see how we might specify and justify the required angular boundaries.
We therefore require such data and are limited in application when it is not
available.
We note that not all microphones need to be within the
specified boundaries; only a minimum of 3 need be and the remaining ranges may be
found from the TDEs. Future work will focus on determining the directionality
of typical sources and on methods for automatically determining which, if any,
of the microphones we should use in the presence of a directional source.
We also note that, when the source and microphones are
colinear, the directionality of the source does not pose a problem. However, as
previously mentioned, given such a setup, TDE-based source localization
techniques will fail. This, therefore, suggests a role for the Range-Finder as
an auxiliary source localization algorithm.
Acknowledgments
The support of the Informatics Commercialisation
initiative of Enterprise Ireland is gratefully acknowledged. Denis McCarthy
also acknowledges the financial support, from Trinity College, of a
postgraduate studentship.
References
- L. Girod and D. Estrin, “Robust range estimation using acoustic and multimodal sensing,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS '01), vol. 3, pp. 1312–1320, Maui, Hawaii, USA, October-November 2001.
- J. Chen, J. Benesty, and Y. Huang, “Time delay estimation in room acoustic environments: an overview,” EURASIP Journal on Advances in Signal Processing, vol. 2006, Article ID 26503, p. 19 pages, 2006.
- D. Gisch and J. M. Ribando, “Apollonius' problems: a study of their solutions and connections,” American Journal of Undergraduate Research, vol. 3, no. 1, pp. 15–26, 2004.
- E. W. Weisstein, ““Apollonius' Problem” from MathWorld-A wolfram web resource,” http://mathworld.wolfram.com/ApolloniusProblem.html.
- M. S. Brandstein, J. E. Adcock, and H. F. Silverman, “A closed-form method for finding source locations from microphone-array time-delay estimates,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '95), vol. 5, pp. 3019–3022, Detroit, Mich, USA, May 1995.
- K. Yao, R. E. Hudson, C. W. Reed, D. Chen, and F. Lorenzelli, “Blind beamforming on a randomly distributed sensor array system,” IEEE Journal on Selected Areas in Communications, vol. 16, no. 8, pp. 1555–1567, 1998.
- Y. Huang, J. Benesty, G. W. Elko, and R. M. Mersereau, “Real-time passive source localization: a practical linear-correction least-squares approach,” IEEE Transactions on Speech and Audio Processing, vol. 9, no. 8, pp. 943–956, 2001.
- H. Teutsch and G. W. Elko, “An adaptive close-talking microphone array,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (ASSP '01), pp. 163–166, New Paltz, NY, USA, October 2001.
- S. T. Birchfield and R. Gangishetty, “Acoustic localization by interaural level difference,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '05), vol. 4, pp. 1109–1112, Philadelphia, Pa, USA, March 2005.
- K. S. Sum and J. Pan, “On the steady-state and the transient decay methods for the estimation of reverberation time,” Journal of the Acoustical Society of America, vol. 112, no. 6, pp. 2583–2588, 2002.
- G. Marsaglia, “Ratios of normal variables,” Journal of Statistical Software, vol. 16, no. 4, pp. 1–10, 2006.
- G. Marsaglia, “Ratios of normal variables and ratios of sums of variables,” Journal of the American Statistical Association, vol. 60, no. 309, pp. 193–204, 1965.
- EASE, “Enhanced acoustic simulator for engineers,” version 4.0, http://www.renkus-heinz.com/ease/.
- C. H. Knapp and G. C. Carter, “Generalized correlation method for estimation of time delay,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24, pp. 320–327, 1976.
- J. Huopaniemi, K. Kettunen, and J. Rahkonen, “Measurement and modeling techniques for directional sound radiation from the mouth,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (ASSP '99), pp. 183–186, New Paltz, NY, USA, October 1999.