Abstract

This manuscript addresses the problem of 3D source localization from direction of arrivals (DOAs) in wireless acoustic sensor networks. In this context, multiple sensors measure the DOA of the source, and a central node combines the measurements to yield the source location estimate. Traditional approaches require 3D DOA measurements; that is, each sensor estimates the azimuth and elevation of the source by means of a microphone array, typically in a planar or spherical configuration. The proposed methodology aims at reducing the hardware and computational costs by combining measurements related to 2D DOAs estimated from linear arrays arbitrarily displaced in the 3D space. Each sensor measures the DOA in the plane containing the array and the source. Measurements are then translated into an equivalent planar geometry, in which a set of coplanar equivalent arrays observe the source preserving the original DOAs. This formulation is exploited to define a cost function, whose minimization leads to the source location estimation. An extensive simulation campaign validates the proposed approach and compares its accuracy with state-of-the-art methodologies.

1. Introduction

The problem of acoustic source localization with microphone arrays has been an active topic of research in the last thirty years [1]. Several are the applications that benefit from this technology, among which video-conferencing [2, 3], audio-surveillance [4], and so forth are worth mentioning.

In the literature, it has been proposed to use either compact or distributed arrays. Compact arrays are advantageous since their deployment is easy. As an example, in a video-conferencing system, a compact array can be located in the proximity of the camera to steer it towards the detected speaker. Unfortunately, compact arrays pose some limits in terms of localization accuracy. Indeed, for far sources, only the direction of arrival (DOA) can be accurately estimated, while the range estimate is generally unreliable [1]. On the other hand, distributed arrays envision the presence of multiple sensors, each equipped with one or more microphones, located around the volume of interest [510]. With this configuration, the source is observed from different angles, with benefits in terms of localization accuracy. Until recently, the cost of the deployment of the sensors and the cabling made this kind of solutions cumbersome and possible only in specific ad hoc applications. In the last few years, however, the advent of wireless sensor networks made this kind of solutions attractive for a wider range of applications.

When each sensor is equipped with multiple microphones, the direction of arrival of the source can be measured internally to each sensor. The source is then localized by a central node, which combines measurements coming from the individual sensors. The advantage of DOAs with respect to other measurements (e.g., time differences of arrival or times of arrival) lies in the fact that each sensor transmits a single measurement to the central node, independently from the number of microphones.

Source localization from multiple DOAs can be brought to the problem of triangulating the estimated DOA lines [11, 12]. Following this approach, 3D localization is typically accomplished by combining multiple 3D DOAs measured at sensors with noncollinear microphone arrangements. For instance, multiple spherical arrays are used in [13]. A 3D DOA is represented by a pair of angles, which denote the azimuth and the elevation of the source with respect to the sensor position. Therefore, measuring a 3D DOA involves a two-dimensional search space. As robust DOA estimation algorithms are generally based on grid-search solutions (e.g., beamforming [14], steered-response-power [15]), the inherent power requirement may be an issue for their implementation at local sensors. For this reason, many works limit source localization to 2D (planar) geometries, assuming that all the sensors and the source lie on the same plane. This is typically accomplished through the triangulation of multiple 2D DOAs, as done, for instance, in [8, 12, 16]. From a geometrical standpoint, a 2D DOA is measured in the plane containing the array and the source and consequently consists of a single angle. The search space is therefore one-dimensional, and simple array geometries can be adopted for this task. For instance, using a linear array, the DOA can be estimated by evaluating an objective function in the range , suitably sampled according to the desired angular resolution. Moreover, in order to achieve a prescribed resolution, a smaller number of microphones are required compared to that needed for 3D DOA measurements. For these reasons, 2D DOA measurements are more affordable for low-power sensing devices, as typically required in designing wireless sensor networks.

In this manuscript, we propose a technique that localizes a source in 3D from the combination of 2D DOAs, measured using multiple linear microphone arrays. Differently from state-of-the-art methods exploiting 2D DOAs, we remove the requirement of working in a planar geometry. Specifically, we do not pose any constraint about the positions of the source and of the arrays in the 3D space. The proposed technique proceeds in two steps. First, each array measures the 2D DOA of the source referred to a local plane, determined by the lying line of the array and the location of the source. Using the concept of equivalent arrays, the array positions are roto-translated into equivalent ones all lying in the same plane containing also the source. Each equivalent array position is found so that the original DOA is preserved, as well as its distance from the source. Once this transformation has been accomplished, in the second step the source can be localized in a 2D geometry. In order to do this, we take advantage of the Ray Space [17], in which DOAs are interpreted as acoustic rays originating from the source and impinging on the acoustic centers of the sensors. Acoustic rays are parameterized by the parameters of the line on which they lie. A cost function is defined, whose minimization gives us the source location.

The proposed technique is advantageous with respect to state-of-the-art methods for 3D source localization using DOAs, which, to the best of our knowledge, all rely on 3D measurements. Being based on simpler 2D DOA measurements, indeed, the proposed method requires a reduced number of microphones for each sensor of the network, with obvious advantages in terms of hardware cost (microphones and analog-to-digital converters). For the same reasons, power consumption at each sensor turns out to be significantly reduced, as the beamforming operation leading to the 2D DOA estimate is accomplished on a one-dimensional search space. Finally, it is worth noticing that, exploiting a simple geometry such as that of a linear array, the manufacturing costs are much smaller than those required to realize more complex structures (e.g., spherical and cylindrical arrays).

The remainder of the manuscript is structured as follows. Section 2 gives some background information concerning the Ray Space representation and the localization of acoustic sources in 2D. Section 3 introduces the theoretical framework for the proposed approach, with particular reference to the equivalent arrays. Section 4 addresses the problem of source localization using the equivalent arrays. Section 5 is devoted to the validation of the proposed technique. Finally, Section 6 draws some conclusions.

2. Background

2.1. Ray Space Representation

Consider an acoustic source in a 2D scenario. Its Cartesian coordinates are . In geometrical acoustics, it can be described as the set of all acoustic rays that originate from it. Each ray can be parameterized with the line on which it lies. The homogeneous representation of all the lines passing through is This is equivalent to the vector form The parameter vector is homogeneous; that is, any scaling , , represents the same line. Consequently, the homogeneous coordinates form a class of equivalence in a two-dimensional projective space, defined in [18] as the projective Ray Space.

2.2. DOAs in the Ray Space

Let us now assume that the acoustic source is located in the far field with respect to the center of a microphone array ; that is, the length of the array is much smaller than the curvature of the wavefront. In this context, the spatial information of the source can be described in terms of its direction of arrival (DOA), that is, the angle of propagation of the wavefront. Source and array location are related to the DOA by The DOA defines the acoustic ray that joins the source position and the observation point, which turns to be oriented as . In the Ray Space, the DOA is thus represented as the acoustic ray passing through and directed as , whose parameters are

2.3. 2D Source Localization in the Ray Space

We now consider the problem of localizing the acoustic source, by combining a set of DOAs measured by a distributed network of linear microphone arrays. With reference to Figure 1, the acoustic center of the th array is at , , and the orientation of the line on which it lies is given by the unit vector . The vector forms the angle with the -axis. The DOA of the source, as measured with respect to the array line, is represented by the angle . The DOA is more conveniently represented in the global reference frame as .

Using (4), each DOA , , is turned into the corresponding acoustic ray . In absence of measurement noise, all these acoustic rays are bound to pass through the source position , meaning that In practical situations, DOAs are corrupted by measurement noise; thus the equalities in (5) do not hold. However, since the system is linear in the unknown variables , the source position can be easily estimated as the solution of a linear least squares problem. To do so, we first rewrite the system as where Then, the source position is estimated as which corresponds to the solution of the linear least squares problem

3. Theoretical Framework

Despite its simplicity, the solution provided in (8) is valid only for 2D configurations, that is, when the source and the microphone arrays lie on the same plane. For more general 3D scenarios, unfortunately, the generalization of the Ray Space is not straightforward. Indeed, representing acoustic rays in 3D requires the introduction of a five-dimensional projective space involving the use of Plücker’s coordinates along with a number of nonlinear constraints [17], which would make the source localization problem intractable.

In this manuscript we follow an alternate route: rather than extending the idea of the Ray Space to a 3D setup, we focus on the reinterpretation of 3D source localization into an equivalent 2D space, by introducing the concept of equivalent microphone arrays.

3.1. DOAs in 3D Using Linear Arrays

A linear microphone array can be described by the parameters of a line passing through its center and directed as the unit vector . In a 3D setup, a total of 5 parameters are required. Specifically, three Cartesian coordinates define the center , while the orientation is defined by the pair of angles and so that Using a linear array it is only possible to estimate a 2D DOA, which corresponds to the angle from which the source is observed by the array center, measured in the plane containing and the source position . More formally, the DOA as measured by the array is given by the angle formed between the vectors and , which can be computed as where we used the fact that .

3.2. Equivalent Microphone Arrays

Each DOA measurement refers to a different plane, according to the displacement and orientation of the arrays with respect to the source position. Nonetheless, we can interpret the available DOAs as being measured by a set of equivalent microphone arrays all located in the same plane, which contains also the source position . Analogously to the real ones, the equivalent arrays are described by the pairs indicating their centers and orientations, respectively, defined for as This idea is illustrated in Figure 2, which highlights a set of real arrays disposed arbitrarily in the space and the corresponding equivalent arrays that all lie in the same plane. For clarity of visualization, we denoted each pair of real and equivalent arrays using the same color.

The final goal of this reinterpretation is to bring the 3D source localization problem to the one formulated in (9), which exploits the fact that all the measurements are referred to the same plane. For this reason, we must determine the unknown parameters and so that the equivalent arrays preserve the information related by their real counterparts. To do so, we notice that the only requirements are that all the equivalent arrays(i)must lie in a unique plane containing the source;(ii)must preserve the DOA, so that they observe the source from the same angles as the real arrays.

To satisfy the first constraint, without loss of generality, a convenient choice is to force all the arrays to lie on the plane . Mathematically speaking, this corresponds to posing for . As for the second requirement, formally we must require that the angle formed by the vectors and equals the DOA as defined in (11); that is, which is equivalent to We observe that the condition in (16) is met by an infinite number of equivalent arrays. This happens for two reasons, as exemplified in Figure 3. Indeed, on one hand, for a given choice of the angle , the DOA is preserved by all the parallel equivalent arrays directed as and having centers on a common line, as in Figure 3(a). On the other hand, by keeping the distance of the array from the source fixed, each rigid rotation of the array about leads to another possible equivalent array preserving the DOA, as shown in Figure 3(b). In order to make the solution unique, we impose that the transformation preserves the angle (the orientation of the array on the horizontal plane) and the distance from the source. These additional constraints can be formally written as where . Given the constraints in (13), (14), (17), and (18), what still remains undetermined are the positions of the equivalent arrays in the plane , namely, the terms and . Since they depend on the unknown source location, we will derive and as functions of . To do so, we start by replacing (18) in (16) to obtain By inserting (14), (13), and (17) into (19) and by making the terms explicit we obtain Moreover, if we take the square of (18) and we exploit the constraint in (13), we can write that

Expressions in (20) and (21) form a second-order system in the unknown variables and . Rearranging the terms, this system can be rewritten in a more compact form as with the following definitions: Solving for and we obtain

Using the definitions in (23) and noticing that , we readily obtain the expressions for the equivalent array positions Note that, as expected, the second-order system (22) leads to two valid solutions. They represent two equivalent arrays, mutually mirrored about the source. Consequently, we can safely select one of those randomly, without loss of information. For instance, one may select the equivalent array located closest to the corresponding real one.

4. Source Localization

In this section we take advantage of the planar geometry induced by the transformation of the real into the equivalent arrays for the purposes of source localization. We first reformulate the source localization problem as a nonlinear least squares minimization problem, introducing the equivalent array positions in the cost function in (9). Then, we describe the iterative minimization process leading to the estimation of the source location. Finally, we describe how the estimated location can be refined by detecting and removing potential outliers from the set of measurements.

4.1. Cost Function Definition

The introduction of the equivalent arrays makes it possible to exploit the 2D Ray Space localization discussed in Section 2.3. Let us consider a network of linear arrays arbitrarily deployed in the 3D space, whose positions and orientations are assumed to be known. For a candidate source position , the positions of the equivalent array centers can be computed using (25), here denoted as and to stress the dependency on . With being the direction angle of the equivalent array and the DOA in the array reference system, the DOA expressed in the global reference system (i.e., measured in the plane with respect to the axis) is therefore . Consequently, the acoustic ray associated with the DOA measured by the th equivalent array turns to be , whose parameters can be computed using (4), obtaining Differently from the pure 2D case discussed in Section 2.3, the parameter of the acoustic ray is a nonlinear function of the source position . For this reason, the combination of equations in the form does not lead anymore to the definition of a linear system. Nevertheless, we can still formulate the source localization problem as in (9), where now the vector depends nonlinearly on the source location . More formally, the optimization problem can be rewritten as the minimization of the cost function ; that is,where is the error vector (i.e., the localization residuals) and .

4.2. Minimization

The minimization of the cost function can be efficiently addressed in an iterative fashion, by locally approximating at each iteration by means of a Taylor series expansion. Following the same approach as in [19], we compute the first-order expansion of the localization residual about an initial guess position as The Jacobian matrix is given by where According to the problem in (27), the update equation of the iterative update is given by [19] where is the iteration number and The source location is estimated as after stopping the iteration when is smaller than a prescribed threshold.

4.3. Localization Refinement

In adverse acoustic conditions, the DOA set may contain some outliers, typically related to the bias introduced by reverberation in standard DOA estimation algorithms [20]. A simple and effective method for detecting outliers is to check the magnitude of localization residuals , . Inspired by the work in [21] related to time differences of arrival-based source localization, we compute the standard deviation of the residual set . If exceeds a prescribed threshold , the DOA set potentially contains outliers. In this case, we mark as an outlier the DOA associated with the maximum localization residual, that is, so that This outlier DOA is thus removed from the measurement set, and the associated residual is removed from . The standard deviation is then recomputed, and the procedure is iterated until or a maximum number of measurements are removed. At this point, the source location is refined by running a second time the algorithm presented in Section 4.2, recomputing the cost function with the outlier-free measurement set.

As discussed in [21], it is worth noticing that this refinement step has a negligible computational cost, as the residual terms are directly available after source localization. Moreover, the iterative minimization procedure is repeated at most one time. Note also that the localization refinement is accomplished by the central node and does not require additional data exchange with the sensors.

5. Validation

We now present the results of a set of simulations aimed at verifying the validity of the proposed method. With reference to the setup shown in Figure 4, we tested the localization algorithm in a rectangular room of size 4 × 3 × 3 m3 by means of linear arrays located close to the vertices of the room. Each sensor is equipped with 3 microphones, located on a line segment and uniformly spaced by . The positions and orientations of the sensors are summarized in Table 1. We considered 10 test source positions, located on the line connecting the center of the room and the center of the first sensor.

5.1. Test 1: Robustness against Additive Noise on DOAs

In a first stage, we considered a completely anechoic scenario, in which we simulated the measurement of noisy DOAs. This allowed us to test the statistical behavior of the proposed localization cost function. More specifically, we corrupted the nominal DOAs , , observed by each array with additive noise , which is a realization of zero-mean white noise drawn from a normal distribution with standard deviation . Note that, in this case, the measurements are by definition free of outliers, since the deviation from the nominal DOA values is only due to the additive error model [22].

In order to enforce the statistical validity of the analysis of the cost function, we took care of two aspects. First, we bypassed the source localization refinement step by setting . This avoided the risk of incidentally removing DOAs from the measurement set, which is explicitly free of outliers. Second, we verified (at least empirically) that the iterative minimization procedure converges to the global optimum. Even though the cost function is not convex, for the considered setup we observed that it presents a very smooth behavior, along with a wide basin of attraction around its global minimum. This turns to be true independently from the tested source positions. These facts ensure a rapid convergence of the iterative algorithms towards the solution, without significant risks of getting trapped into local optima. An illustrative example is offered in Figure 5, which shows three planar sections of corresponding to a randomly picked realization of noisy DOAs, for a selected source location. In particular, the source is located at and the cost function is shown in three orthogonal planar sections passing through ; that is, at , , and .

For the statistical analysis, we considered different noise standard deviations in the range . For each value of and for each tested source position we run Monte-Carlo repetitions. The localization accuracy was measured in terms of root means squared error, computed as where is the true source location and denotes its estimate at the th Monte-Carlo run and is the number of realizations.

As a matter of comparison, we computed the lower bound for the RMSE implied by the Cramer-Rao Lower Bound (CRLB). More specifically, given the CRLB in terms of the localization variances , , and , the RMSE Lower Bound (RLB) was computed as Details about the computation of the CRLB are given in Appendix.

In Figure 6 we report the resulting RMSE as a function of the distance of the source from the center of the room, for three selected noise standard deviations. The continuous lines refer to the RMSE, while the dashed lines denote the related RLB. We observe that RMSE ranges from a minimum of about 2 cm for to a maximum of about 7.5 cm for . The RMSE tends to increase as the source position approaches the center of the first array. Moreover, we notice that the RMSE approaches the RLB for source positions close to the room center, while the accuracy slightly degrades for farther sources. Nevertheless, the maximum gap between the RMSE and the RLB maintains reasonable, being slightly above 1 cm in the worst case.

Inspired by this observation, we evaluated the RMSE as a function of the standard deviation of the injected noise for 4 selected source positions located at different distances from the room center. Results are reported in Figure 7. We first notice that the RMSE grows almost linearly with respect to the standard deviation . Moreover, once again we notice that the gap between the RMSE and the RLB tends to increase when the distance of the source from the room center increases. In the worst case (i.e., when and ) this gap is about 2 cm; thus we can conclude that the proposed method is sufficiently robust against Gaussian additive noise on DOAs.

5.2. Test 2: Robustness against Reverberation and Noise

To mimic a realistic scenario, we simulated microphone signal acquisitions in a reverberant room by means of the image source method [23]. In particular, we adopted the implementation provided in the RIR toolbox [24] to generate the room impulse responses from the source position to all the microphones. For this test we considered 50 test source positions randomly selected within the rectangular volume formed by the 8 array centers. The acoustic source was modeled to be omnidirectional and emitted 30 s of phonetically rich female speech. The microphone signals were obtained by convolving the source signal with the related RIRs and finally corrupted with zero-mean Gaussian noise with standard deviation tuned to simulate a prescribed signal-to-noise ratio (SNR). The signals were divided into frames of length 2048 samples at a sampling frequency . DOA estimation was performed by using a beamforming technique. Specifically, for each array, the received microphone signals were transformed into the time-frequency domain via short-time Fourier transform (STFT), considering a Hamming analysis windows of length 256 samples, with 50% of overlap, leading to a total of time windows per frame. Let be the STFT representation of a generic signal frame acquired by the th array, where is the index of the time window and is the bin associated with the frequency . At each frequency bin, we computed the Minimum Variance Distortionless Response (MVDR) pseudospectra [14] as where is the far-field steering vector for the used arrays, each composed of three microphones, and is the sample estimate of the array covariance matrix, obtained as The DOA at the th array is finally estimated as where is the geometric mean of the pseudospectra with frequency bins in the range . The collected DOAs , , were then used to estimate source position with the proposed method. For this test, we enabled the localization refinement step by selecting , while ensuring that a maximum of 2 potential outliers are removed from the DOA set.

As a matter of comparison, we estimated the source position also using the popular steered-response power with phase transform (SRP-PHAT) method, which is known to be very robust against both noise and reverberation. In particular, we used the Stochastic Region Contraction variant of SRP (SRC-PHAT) [25], which is best suited for 3D scenarios. To enable a fair comparison, the generalized cross correlations (GCC-PHATs) were computed only at microphone pairs belonging to the same linear array, for a total of GCC-PHAT computations. Thinking in terms of distributed localization, these GCC-PHATs functions are transferred to the central node that finally combines them to produce the source position estimate.

The results are reported in Figures 8(a) and 8(b), showing the RMSE achieved by the proposed method and SRP-PHAT, averaged among all the tested source positions and all the analyzed audio frames. In particular, Figure 8(a) reports the localization accuracy at different levels of reverberation, expressed in terms of the reverberation time [26], while keeping the SNR fixed to 20 dB. Conversely, Figure 8(b) shows the localization accuracy at different SNR levels, while the reverberation is kept fixed to . It can be noticed that the proposed technique achieves better scores than SRP-PHAT up to a moderate amount of reverberation and above an acceptable SNR level (i.e., greater than 15 dB). Out of these ranges, SRP-PHAT is slightly more robust against reverberation and noise.

5.3. Discussion

Even though SRP-PHAT tends to be more robust in particularly adverse acoustic environments, this comes at the expense of both higher computational complexity and higher communication bandwidth requirements. Let us consider again the distributed source localization scenario, in which source localization is demanded to a central node that gathers measurements coming from the distributed arrays. For both the considered methods, the computational power required at the sensors is comparable (beamforming for the proposed technique, computation of generalized cross correlations for SRC-PHAT). However, as far as the bandwidth is concerned, the proposed method only has to transmit a single real value to the central node, that is, the DOA. Conversely, each sensor must transmit the entire GCC-PHATs to the central node when localization is tackled via SRC-PHAT. Moreover, the computational burden at the central node is much higher for the SRC-PHAT method, being based on the minimization of a very irregular 3D cost function which requires thousands of evaluations of the SRC functional [21]. On the other hand, being based on a very smooth cost function, the proposed method requires a few iterations to converge to the optimum solution. As an example, for the considered simulation scenario, we measured the execution times at the central node, which highlighted that, on the average, the proposed method (implemented in Matlab®) is more than 10 times faster than the SRC-PHAT Matlab implementation (https://it.mathworks.com/matlabcentral/fileexchange/24352-acoustic-source-localization-using-srp-phat) released by the authors of [25]. More specifically, algorithms at the central node were run on a laptop equipped with a 2.9 GHz Intel Core i7 processor and 8 GB of RAM. The average time to process a 128 ms-long audio frame was 3.26 ms for the proposed method and 39.89 ms for the SRC-PHAT algorithm. It is worth noticing that this result is only qualitative, as the algorithm implementation is not optimized. Nevertheless, this suggests that the proposed method may represent a valid alternative for low-bandwidth and low-power application scenarios.

6. Conclusions

In this manuscript we presented a technique for 3D source localization based on the combination of two-dimensional DOAs. The proposed methodology works in two steps: first real arrays are transformed into equivalent arrays so that they are coplanar with the source. In the second step, the source is localized on the plane on which equivalent arrays and source lie. The main benefit of the proposed technique lies in the hardware and computational costs. Indeed, the measurement of 2D DOAs can be accomplished using linear arrays and involves a one-dimensional grid search. Conversely, state-of-the-art techniques for 3D source localization are based on 3D DOAs, which require more complex microphone arrangements as well as a higher computational complexity. The proposed technique is validated with an extensive simulation campaign and compared with Stochastic Region Contraction variant of SRP (SRC-PHAT). Results demonstrate that in moderately reverberant environments and in the presence of moderate noise, the proposed technique achieves results comparable with SRC-PHAT, which requires a much greater effort in terms of bandwidth and computational cost.

We are currently working on the extension of the proposed algorithm to multiple sources. Another important issue that deserves to be investigated is that of the sensitivity of the localization accuracy towards noise on the estimated DOAs and how the configuration of the arrays impacts on the sensitivity.

Appendix

Cramer-Rao Lower Bound

The Cramer-Rao Lower Bound (CRLB) expresses the theoretical limit for the variance of an unbiased estimator. We consider here the problem of estimating the source position from DOA measurements at linear arrays. To compute the CRLB, we follow the same rationale of [1] for localization based on time differences of arrival. We start modeling the DOA measurements as where is the noiseless DOA and is the measurement error for the th array. Expressing the measurement model in a vector form, we have where , , and .

The CRLB is given by the elements of the inverse of the Fisher information matrix, which is in general computed as where is the Hessian matrix of the function and is the probability density function (PDF) of measuring given the source position . More specifically, the CRLBs of the variance of the estimation of , , and are obtained as

Assuming that the noise corrupting the DOAs is jointly Gaussian distributed, and independent of both the observed DOAs and the source position, the PDF reads as where is the covariance matrix of the measurement error . In this case, the Fisher information matrix simplifies to where the Jacobian matrix is given by The gradient vectors can be computed in closed form as

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work is supported by “CARIPLO Foundation” and “Regione Lombardia” under Grant no. 2016-0917 (SPIN project).