Abstract

Small aperture microphone arrays provide many advantages for portable devices and hearing aid equipment. In this paper, a subspace based localization method is proposed for acoustic source using small aperture arrays. The effects of array aperture on localization are analyzed by using array response (array manifold). Besides array aperture, the frequency of acoustic source and the variance of signal power are simulated to demonstrate how to optimize localization performance, which is carried out by introducing frequency error with the proposed method. The proposed method for 5 mm array aperture is validated by simulations and experiments with MEMS microphone arrays. Different types of acoustic sources can be localized with the highest precision of 6 degrees even in the presence of wind noise and other noises. Furthermore, the proposed method reduces the computational complexity compared with other methods.

1. Introduction

Microphone array based acoustic source localization is widely used in many application scenarios. However, a big array aperture may greatly limit its applications. In recent years, due to the development of microelectromechanical system (MEMS) technology, small aperture array shows its potential superiority for portable applications. Microphone arrays for hearing aid [1], videoconferencing [2], and robotic audition [3, 4] are often used in small offices or desktop applications and meanwhile high mobility equipment is required in vehicle localization [5] and gunshot localization [6] in the battlefield. Therefore, small aperture arrays have received high attention.

In general, acoustic source localization using microphone arrays is achieved by differences of the signals received from different microphones. The two aspects of the differences are the amplitude change and the phase difference.

The aspect of the amplitude change, concerning array apertures, is related to the solution of the spherical wave model and the plane wave model of the signal [7, 8]. Under large and medium size apertures, the spherical wave model of acoustic source is applied to microphone arrays for high performance of direction of arrival (DOA) estimation [9, 10]. However, the plane wave model can be applied to small aperture arrays for the sake of simplification and not limiting the accuracy of DOA estimation.

The aspect of the phase difference forms the essence of localization using spatial information [11]. Methods for acoustic source localization can be generally divided into three categories based on computational complexity: time delay based methods [1214], subspace based methods, and parametric methods [1517]. The time difference of arrival (TDOA) in time delay based methods is obtained from the phase differences of microphones [18]; it is easy to compute the TDOA if delays are known to be less than the sampling period. In [19, 20], several adaptive filtering time delay based methods are proposed to compare the performance of speaker localization using small aperture arrays. However, when the array aperture is small, time delay based methods tend to have a high sampling rate which aggravates the burden on the hardware system [21] and the accuracy of the localization estimates depends on the system’s sampling rate. The proposed method in this paper is independent of the sampling rate as long as Nyquist-Shannon sampling law is satisfied. Parametric methods are featured by high computational cost [10] and thus not suitable for real time processing, while subspace based methods such as multiple signal classification (MUSIC) [22], root MUSIC [23], and ESPRIT algorithm [24] are computationally attractive with high accuracy. As for subspace based methods, researchers focus on the model of acoustic source or new focusing matrix for wideband signals. However, with respect to small aperture arrays, positive effects on subspace based method should be applied to improve the accuracy of DOA estimation and reduce the complexity of processing. Parameters concerning the localization performance should be studied for the aim of accuracy and generality of application scenarios.

In this paper, a low complexity subspace based method is proposed for small aperture microphone array by applying compact planar model. The array aperture, frequency, and variance of the source are discussed as parameters related to localization performance. Under a small aperture, the method reduces the computational complexity as well as achieving high accuracy in general scenarios.

The remainder of this paper is organized as follows. In Section 2, the proposed method was elaborated. In Section 3, localization performance and computational complexity are compared with other representative acoustic source localization methods. Performance of the proposed method is theoretically evaluated with simulations of parameters of the acoustic signals. Experiments with MEMS microphone array are given in Section 4 and conclusions are presented in Section 5.

2. Subspace Based Localization Method Using Small Aperture Arrays

We first establish here the notations used in this paper.(1)The text in bold denotes vectors.(2)The superscript denotes the complex conjugate of the matrix.(3) denotes the frequency domain, .(4) is the received signal power as a function of frequency [25].(5)Let be the number of microphones in the array.(6)Let be the number of acoustic sources.(7) denotes the signal bandwidth.(8) is the aperture of the array.

2.1. The MUSIC Algorithm

The MUSIC algorithm is a well-known subspace based method. Generally, the DOA estimation of the acoustic sources can be obtained through the use of the plane model under a small array aperture [8, 9].

The signal received at each sensor is given by where is the signal source, is the white Gaussian noise, and is the array manifold which describes the difference of the same signal received at different sensors with time delays caused by spatial diversity. The covariance matrix is constructed by where is the matrix of eigenvectors. The eigenvectors can be partitioned into two subspaces, the signal subspace and the noise subspace, so we have where is an matrix whose columns are the noise eigenvectors.

Array manifold (i.e., the columns of ) and the noise eigenvectors are orthogonal to each other [22]. The distance between the two subspaces is defined by the MUSIC spectrum as follows: where exhibits peaks in the vicinity of the true DOA.

2.2. The Proposed Localization Method Using Small Aperture Arrays

The MUSIC algorithm is based on the fact that array manifold and the noise eigenvectors are orthogonal to each other. Wideband MUSIC algorithms for acoustic source focus on the fact that the array manifold changes as the frequency varies and hence have either to calculate all the frequencies separately (incoherent signal method, ISM) or to find a focusing matrix and transform all the frequencies to a single one (coherent signal method (CSM)). However, the two methods will greatly increase the computational burden, and therefore, they are not suitable for certain portable real time applications, whereas the power supply is limited.

The array manifold changes as the frequency varies, while the decrease in the array aperture will make the change of array manifold smaller. In other words, the error caused by frequency dispersion declines as the array aperture becomes smaller. The approximation error of array manifold is derived below with the array aperture and other parameters. In this paper, is assumed to be fixed at rather than changing from to , which forms the main error source of our method. The array manifold changes as the frequency varies, while the decrease in the array aperture will make the change of array manifold smaller. In other words, the error caused by frequency dispersion declines as the array aperture becomes smaller. The approximation error of array manifold is derived below with the array aperture and other parameters.

According to (3), then The reference point is the first sensor in Figure 1; then By applying the error transfer function to the variables of and [26, 27], then The variance of is expressed as where is the received signal power as a function of frequency [25].

With respect to the definition of array aperture , the maximum distance between any two sensors, hence

Note that (13) allows the error of array manifold caused by our assumption in (10) to be related to array aperture . Therefore when the array aperture is small enough, the change of array manifold caused by frequency change is negligible, allowing us to simplify the localization method. With respect to (11), as the basis of array manifold, the accuracy of DOA estimation is related to and . As varies all over the plane, the array manifold also changes. Seeking the to find orthogonal array manifold vector to noise eigenvectors is related to the performance of our method. The quotient of the first part and the second part of (11) is defined as error ratio (ER). The first part denotes error of the assumption of fixed and the second part represents the performance of the method. When is smaller, the proposed method will be more effective.

Hence,

It can be derived from (14) that, with relatively smaller and larger , will be decreased, which makes the estimation more accurate.

Consequently, the proposed method mainly focuses on two aspects: one is to simplify the array manifolds of different frequencies to a single frequency and the other is to choose the higher useful bands of the signal according to prior knowledge of acoustic source and thus to improve the accuracy. For a given array of microphones, the overall localization method is now presented in a step-by-step format.

Step 1. Collect snapshots of data from the small aperture array of sensors.

Step 2. Associate bandwidths of acoustic source and environment to limit the processed bandwidths to relatively higher bandwidths of ; calculate the Fourier transforms of the signals of different microphones to .

Step 3. Construct the covariance matrices according to using (4).

Step 4. Estimate which minimizes the variance of the acoustic source spectrum as follows: In the case that is not given, .

Step 5. Calculate from (3) with .

Step 6. Isolate the source locations as the maxima of the pseudospectrum defined in (7).

3. Analysis and Simulation

3.1. Theory Analysis of Array Aperture and Other Factors

Traditionally, the acoustic sources such as vehicle, music, and human talking are treated as wideband sources. The method for acoustic localization includes ISM and CSM. Among them, the SSA, RSS, and TCT are representative methods. TCT have the best performance. The basic idea for TCT is focusing. However, this will greatly improve the computation. Instead of focusing, we choose a proper approximation for without focusing on the cost of some performance of localization for computational complexity.

3.1.1. The Selection of the Focusing Frequency

Concerning the simple uniform distribution, A reasonable approximation of is [28, 29]. Concerning the approximation of other distributions, is the error of approximation. When , the error is minimum [30, 31]: According to (17), When there is no prior knowledge about , will be a compromising choice.

3.1.2. Discussion of Other Factors

The essence of the algorithm in this paper is that when the aperture of array is very small, the acoustic signal which is band-limited could be viewed as narrowband signal or approximate narrowband signal. However, there is no explicit definition of narrowband signal for array signal. The 3 expressions listed below show some of the characteristics of narrowband array signal which is widely accepted [32]:(i) ;(ii) ;(iii)if the bandwidth of a signal is such that the second eigenvalue of the signal’s noise free covariance matrix is larger than the noise level in the signal plus noise covariance matrix, then that signal may not be described as narrowband [25].

The first expression is the basic understanding of narrowband signal. The second expression describes the narrowband signal for array signal. The third definition is more complicated. The definition of narrowband signal is a function of bandwidth, AOA, SNR, array gain, and dimension of the array.

The conclusion drawn from the 3 definitions is consistent with Section 2.2 that, with smaller aperture , higher signal frequency , and smaller bandwidth , the signal is narrower. In this case, the localization error caused by frequency error is smaller so that our method is more effective. To improve the performance for localization, we could use a band-pass filter to the signal and also make use of relatively higher band of the signal; after all, when the array aperture is very small, the error of acoustic source localization could be under control.

3.2. Simulations

Comparison with other methods is performed in this part. And the error of DOA caused by frequency error with aperture , signal frequency , and signal power variance will also be simulated.

3.2.1. Comparison with Other Algorithms

Comparing our method with TCT(CSM) [30], RSS(CSM) [30], RSS(CSM) [33], SSA(ISM) [34], and TDE method in RMSE and computational complexity.

Considering one source impinges on the array from 108.5 degrees. The aperture of the uniform circular array is 1/3 and 1/6 of the wavelengths of the center frequency. The bandwidth of the signal is 20% of the center frequency. RMSE is defined as based on 500 Monte Carlo experiments. In Figure 2, the RMSE of TCT and RSS methods are smaller. Our method has nearly the same performance as the SSA method and much smaller performance than the TDE method. When the SNR is higher than 10 dB, the RMSE of all the subspace based method are nearly the same. Comparing Figures 2(a) and 2(b), when the aperture of the array is smaller, the performances of the methods are closer.

The computational complexity is compared in Table 1. The snapshots of the signal are 1024, the sampling rate is 8192 Hz, the source is from 108.5 degrees, and the array is a uniform circular array with radius of 0.02 m. All of the methods are executed in the Matlab 2008a environment on a personal computer (dual core, 2.9 GHz-frequency processor, and 2 GB memory). The time elapses are shown in Table 1.

As shown in Table 1, the computation shows an order of magnitude less than the other TCT, RSS, and SSA. The proposed algorithm was also applied in ADSP21375 processor, with 75 MHz clock, the signal snapshots of 1024, and sampling rate of 8192 Hz. The DSP system could work out the DOA of the acoustic source in real time with DOA error less than 6 degrees. The computation process of every 1024 snapshots takes no longer than 0.05 s.

The performance of method is not worse than ISM and CSM under small aperture. However, the computational complexity is much lower than ISM and CSM.

3.2.2. The Aperture

As is discussed in Section 2.2, the frequency error will result in manifold error. However, with the decrease in array aperture, the error will be smaller. We focus on the frequency error that is defined as the difference between used in the simulation method and defined in Step 4.

The simulation signal is a wideband signal with bandwidth ranging from 200 to 1000 Hz. The real DOA is 108.5 degrees. The array is a uniform circular array. Table 2 shows the error frequency boundary which is defined as the frequency error when the difference between the estimated DOA and the real DOA is more than 10 degrees for the first time.

As is shown in Table 2, when the array aperture is less than 0.05 m, the error frequency boundary will be larger than 2000 Hz. This means if we confine the input signal to a certain bandwidth by analog or digital means, the DOA error caused by frequency error could be under control. In this case, the prior knowledge of the signal is not necessary, which could broaden the use of our method. Figure 3 shows the relationships between frequency error ( -axis), array aperture ( -axis), and DOA error ( -axis). At the same aperture, the DOA error is increased as the frequency error grows. At the same frequency error, the DOA error is reduced with the decease of aperture size.

3.2.3. The Signal Frequency

We conclude in Section 2.2 that, with the increase of signal frequency, the DOA error caused by frequency error will be decreased. The simulation is done as follows. The array aperture is 0.02 m; the signal bandwidth is 400 Hz. The signal frequency is the center of the bandwidth. Figure 4 shows the relationships between frequency error ( -axis), signal frequency ( -axis), and DOA error ( -axis). At the same frequency error, the DOA error is decreased with the increase of signal frequency. The proposed algorithm is used to estimate DOA by choosing frequency error from 0 to 1000 Hz stepped by 2 Hz. The RMSE of DOA is shown in Figure 5. RMSE is reduced with the increase of signal frequency.

3.2.4. Variance of the Signal Power of Frequency

As presented in Section 2.2, as is decreased, the error caused by frequency error is reduced. Two signals and are generated, whose powers are shown in Figure 6(a). The array is uniform circular array; the real DOA of the signals is 108.5 degrees. Obviously, the variance of signal 1 is larger than signal 2.

As shown in Figure 6(b), the estimated DOA of is more accurate than that of at the same frequency error, which supports our derivation.

3.3. Negative Influence of Small Aperture Array

Our method is based on the fact that small array aperture could decrease the manifold error caused by the frequency error. However, some negative influences of small aperture should be mentioned.

3.3.1. High Consistency in Amplifying Circuit

In a real microphone array system, the difference of the amplifying circuit cannot be avoided and will cause amplitude error and phase difference for different array elements, which will affect the array manifold.

Based on the Nyquist-Shannon sampling theorem, then and space sampling theorem is expressed as follows [35]: where is the minimum distance of any two array elments and is the shortest wavelength of the signal. Just as in time domain, higher space sampling rate means more information is kept, but it will aslo cost more in system design. will cause an aliasing effect as described by Nyquist-Shannon sampling theorem. When the array size is not limited, is always chosen as the array aperture. This is because with the increase of the arrary aperture size, the tolerance of phase difference introduced by the amplifying circuit is increased.

The DOA errors in Figure 7 are caused by 1 degree of phase difference introduced by the amplifying circuit. The signal bandwidth is 400 Hz; the center of the band is 500 Hz. The array is a uniform circular array. Figure 7 shows that, with the increase of the array aperture size, the DOA error is reduced.

In other words, in order to achieve high accuracy, the small aperture array should be highly consistent in the amplifying circuit.

3.3.2. Lower Resolution

Another question caused by high space sampling rate (small array aperture) is the resolution, as for multiple signals, it is more difficult to separate closely spaced signals. When the array size is not limited, is always chosen as the array aperture to achieve higher resolution. However, the algorithm still works well for single signal.

Figure 8(a) shows the MUSIC spectra for two signals from 50 degrees and 60 degrees, by using a uniform linear array with 4 elements. is the wavelength of the signal. The distances between every two elements are , , and . Figure 8(b) shows the MUSIC spectra for one signal from 60 degrees, by using a uniform linear array with 4 elements.

Even though the resolution of small aperture is lower than that of large aperture, our method is still able to distinguish multiple signals. It works well for single signal.

4. Experiment with MEMS Microphone Arrays

Several experiments are carried out by real microphone array in this section. MEMS microphone is chosen as the sensor to ensure the array aperture is small enough. The package dimensions of MEMS microphone below are 3.35 × 2.5 × 0.88 mm.

4.1. Validation of the Method Using Different Arrays

After the array for localization was properly designed, six arrays were proposed to validate the algorithm with different shapes. Table 3 and Figure 9 list all of them.

A turntable was designed to test the changes of DOA. It was driven by a stepping motor controlled by a microcontroller with a constant rotating speed of 25.71 degrees/s. The acoustic source is a famous Chinese folk song Jasmine (Molihua) played by piano. The sampling rate of the array system is 8192 Hz, the band is limited to 200–2000 Hz, and is 800 Hz. The experiment was conducted in a quiet room and the source was fixed to 0 degrees to the array at the distance of 2 m.

As shown in Figure 10, when the speed of the turntable is constant, the estimated DOA varies linearly with time. The slope means the velocity of the turntable. The intercept is the deviation from the real DOA when the turntable starts to revolve, and the deviation could be used to correct the estimated DOA as a whole. A relevant coefficient describes how the result obeys the linear law with 1 meaning 100%. Standard deviation means the DOA error. As shown in Table 4, in comparison with the results between arrays number 1, number 2, and number 3, and number 4, number 5, the error and the intercept are increased with the decrease in the array aperture. This is because of the inconsistency of the amplifying circuit. As mentioned in Section 3.3.1, the consistency of the amplifying circuit is especially important for small aperture array.

In comparison of arrays number 1, number 4 and number 2, number 5, a larger number of sensors result in better estimated DOA, because the former acquires more information. However, array number 6 could not be localized, because in order to minimize the aperture of an equilateral triangle, the sound hole was arranged as an equilateral triangle, but the MEMS microphone array was not equilateral. For the position of the MEMS microphone that sense sound may not be the exact sound inlet, the shape of the array is not equilateral, which leads to the indeterminacy of the array manifold and the invalidity of the estimated DOA. Nevertheless, the result of number 3 shows that our method could work at very small aperture as 5 mm.

4.2. Validation of the Method Using Different Acoustic Sources

An application of the proposed method to locate moving vehicles using a 0.04 m microphone array (array number 1) will be elaborated in this part. The experiment was conducted on a cement road after a light rain with wind levels from 0 to 2 during November 2012, on an island in Zhejiang Province, China. is the vertical distance from the array to the car, and is the velocity of the cars. The velocities of vehicles can be regarded as uniform. The car is running from left to right (Figure 11); then

Four typical types of vehicles are used for localization: an electric bicycle, a tricycle, a car, and a truck. Their signals are shown on (A) in Figure 12 and the signal power spectrum amplitudes are shown on (B) in sequence. A nonlinear fit of DOA is conducted with the fit function equation (23). The estimated DOAs of the four vehicles are depicted on (C) in Figure 12.

Considering the width of the road, as the driver drives on the right side, is 3 m when the vehicle is driving from left to the right; is 5 m when the vehicle is driving from the opposite side. The results of the fit of DOA for different vehicles are shown in Table 5.

represents the relative speed of the acoustic source. The DOA estimation error is residual mean square, which is a standard for nonlinear fit of DOA. The relevant coefficient is a parameter that describes how the data fit (23) with 1 meaning 100%. The results of the fit of DOA for different vehicles are shown in Table 5.

According to our initial parameter analysis in Section 3.2, the performance of acoustic localization estimation is related to the signal frequency and SNR of the acoustic signal. The four types of vehicles consist of different combinations of signal frequencies and SNRs. The frequency spectra of the four vehicles are shown on (B) in Figures 12(a), 12(b), 12(c), and 12(d) and are related to the speed of the vehicles [36]. Sorting the vehicles in a descending order by frequency spectra, the order is as follows: car, truck, electric bicycle, and tricycle. Sorting the vehicles in a descending order by SNRs, the order is as follows: truck, tricycle, car, and electric bicycle. The frequency spectra of vehicles have a significant effect on the performance of the localization estimation. With the highest frequency spectrum and the third highest SNR, the car showed the best localization performance, followed closely by the truck, ranking as the second highest in frequency and highest in SNR. Taking into account the fact that the frequency spectra of electric bicycle and tricycle are nearly the same, despite the fact that they are different in SNRs, the performances are nearly the same.

In the application scenarios considered in this part, the proposed method has been applied to more than 200 experiments for over 20 types of vehicles. The maximum estimation error of DOA is no bigger than 6°. According to the results in this section and computational complexity comparing with other methods in Section 3.3, on one hand, the proposed method is excellent in accuracy of localization; on the other hand, the method is suitable for real time processing for portable applications.

5. Conclusion

In this paper, an acoustic source localization method using small aperture array was proposed. The method could be used in real time processing systems such as vehicle localization and videoconference due to low computation complexity. As the proposed algorithm is featured by small aperture, array response obeys a plane wave propagation model and the consistency of the amplifying circuit is very important, which is often ignored for the discussion of ordinary arrays. The accuracy of DOA estimation could be improved by taking advantage of the source’s relatively higher band and confining the signal to a certain band. Simulations and experiment show that the proposed method can effectively localize different acoustic sources and has a lower computation complexity compared with the existing methods.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors would like to thank the associate editor and anonymous reviewers for their valuable comments and suggestions to improve this paper. This work has been supported by fund 9140C18020211ZK34.