Journal of Sensors

Volume 2019, Article ID 4639850, 13 pages

https://doi.org/10.1155/2019/4639850

## A Sound Source Localization Device Based on Rectangular Pyramid Structure for Mobile Robot

School of Mechanical and Electronic Engineering, Wuhan University of Technology, Wuhan 430070, Hubei, China

Correspondence should be addressed to Guoliang Chen; nc.ude.tuhw@nehclg

Received 5 April 2019; Revised 7 June 2019; Accepted 24 July 2019; Published 27 August 2019

Academic Editor: Calogero M. Oddo

Copyright © 2019 Guoliang Chen and Yang Xu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

A sound source localization device based on a multimicrophone array with the rectangular pyramid structure is proposed for mobile robot in some indoor applications. Firstly, a time delay estimation method based on the cross-power spectral phase algorithm and a fast search strategy of peak value based on the geometric distribution of microphones are developed to estimate the sound propagation delay differences between two microphones. Moreover, a rejection strategy is presented to evaluate the correctness of the delay difference values. And then, the device’s geometric equations based on the time-space mapping relationship are established to calculate the position of the sound source. For fast solving the equations, the multimicrophone array space is divided into several subspaces to narrow the solution range, and Newton iteration algorithm is introduced to solve the equations, while its solution is evaluated by an evaluation mechanism based on coordinate thresholds. Finally, some experiments are carried out to verify the performance of the device, of which the results show that the device can achieve sound source localization with a high accuracy.

#### 1. Introduction

As an important branch of the robot family, mobile robots for home services are definitely the trend that must lead a new life style of human being [1]. Compared to industrial robots, the kind of robots is required to offer safer, more flexible, and more intelligent services to human being because they serve humans directly. So, home service robots are paid higher requirements of intelligence. Robot audition stimulated by human hearing capabilities which is an important part of robot intelligence has attracted great attention of many scholars in recent years. Numerous works have been proposed by a growing community, with contributions ranging from sound source localization (SSL) and separations in realistic reverberant conditions to speech or speaker recognition [2].

In contrast to vision, robot audition has its unique advantages in perception. Robot’s perception of sound is nearly omnidirectional and independent of the lighting conditions. Similarly, robots are able to detect sound signals even in the presence of obstructions [3, 4]. Robot audition can not only locate sound sources, but also recognize their origins and interpret their contents, which is a prominent capacity for human-robot harmonious interaction. Therefore, robot audition has broad application prospects [5]. For example, home service robots are expected to receive voice commands from elderly people with limited exercise and provide corresponding services for them. On the other hand, robot audition can help a robot to recognize its environment and then achieve some special tasks, such as detecting and tracking an abnormal sound to prevent the occurrence of dangerous situation in time [6, 7].

SSL technology allows a robot determine the direction and position of a sound source using only sound data, which is essential in the overall scheme of robot audition, and has an important impact on other robot audition modules[8]. In order to achieve a higher localization accuracy, SSL systems usually need many microphones with a complex array, which increases the computational complexity greatly. So, there is a trade-off between the localization accuracy and the complexity of microphone array. A SSL system for home service robots has several unexpected constraints. It is mounted on a robot platform, which limits its size and the number of microphones. Moreover, in order to meet the requirements of real-time human-robot interaction and target tracking, the location algorithm adopted by SSL system should have a low computational complexity and high computational accuracy [9]. Therefore, the trade-off is particularly important for the mobile robots’ SSL system.

Taking the localization of real-time and high accuracy and a relatively simple structure into account, this paper designs a new microphone array with a regular quadrature pyramid structure for mobile robots in indoor environment. The proposed microphone array provides a nonplanar reference point with the vertex of the regular quadric pyramid, which can help the array to obtain sound source signals and improve the localization accuracy. According to the microphone array, this paper develops its estimation model of SSL and computing method and proposes a double screening mechanism to improve the reliability of position results.

The rest of the paper is as follows. Section 2 introduces some of the relevant works in the field of robot audition. Section 3 describes the principle of time delay estimation. Section 4 presents the process of the localization algorithm based on iterative optimization. Section 5 shows the experimental prototype and receives the final experimental results and then analyzes the experimental errors. The last section obtains the conclusion with plans for the future studies.

#### 2. Related Work

Irie [10] applies SSL technology to mobile robots for the first time. Since then, a number of methods have been proposed to enrich SSL technology. At present, there are three kinds of well-known methods for SSL based on microphone array, including directional technology based on high-resolution spectral estimation (HRSA); controllable beamforming technology based on the biggest output power (BS); technology based on time difference of arrival (TDOA) [11].

All above SSL methods are always based on a certain number of microphones. Pavlidi et al. [12] present a uniform circular microphone array to overcome the ambiguities of linear arrays. Cho et al. [13] develop a SSL system for mobile robots that uses a square microphone array of 0.17×0.17 m^{2} with four sensors attached on the shoulder of a plaster cast of the small size home robot. Ren et al. [14] use a triangular pyramid microphone array with four omnidirectional microphones for multiple sparse source localization. Its lateral faces are isosceles right triangles. Each omnidirectional microphone is placed at the vertex of the triangular pyramid, and the sensor located at origin point is taken as the reference sensor. Valind et al. [15] construct an 8-microphone array with a cuboid structure to locate the axial angle and elevation angle of a sound source. Huang et al. [16] use four microphones to form a three-dimensional microphone array to locate axial and elevation angles of sound sources.

The above-mentioned microphone arrays have a regular structure, such as linear, triangular, polygonal, circle, and polyhedral arrays, which have the ability to locate sound sources in two-dimensional and three-dimensional space, respectively. The number of microphones and their topology in SSL system mainly depend on the SSL method adopted. Generally, the number of microphones is required by TDOA, HRSA, and BS increases successively.

Among these SSL localization methods, TDOA method is more suitable for robot auditions, in which the azimuth and horizontal distance of a sound source should be determined in real-time. So, the localization methods considered here are all based on TDOA technique. The basic idea of SSL based on TDOA technique is using the observed time difference between signals of a sound source arriving two microphones to construct a hyperboloid in space, on which the location of the sound source can be constrained to lie [17]. According to the idea, TDOA-based localization technique can be divided into two steps, including the step of time delay estimation (TDE) to compute the time delays between the sound source and each sensor of a microphone array; the step of localization to compute the direction and position of the sound source based on the geometric model of the microphone array.

The accuracy of TDE estimation is related to the performance of a SSL system. Some research results show, as the source is moved further away from the robot, an important error growth in distance estimation, as opposed to azimuth and elevation estimation [6]. The generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) is by far the most used in TDE technique for single direction-of-arrival estimation in robot audition because of its robustness and easy implementation. The paper just gives some related research works based on GCC-PHAT method.

Perez-Lorenzo et al. [18] evaluate some GCC methods in typical conditions of real scenarios with the presence of external, stationary, and in most cases, correlated noise sources. Chen et al. [19] analyze the performance of several weighting functions in TDOA algorithm based on GCC and indicate that PHAT weighting is the best choice for SSL using GCC method for its small fluctuations, sharp peak, and strong antijamming ability. Padoisa et al. [20] develop an improved GCC method to detect the source positions. The method introduces two criteria to improve the noise source map, which are based on the geometric properties of the microphone array, the scan zone whereas, and the energy of the spatial likelihood function, respectively. Valind et al. [15] develop an estimate method based on GCC-PHAT to compute the time delay by suppressing the impact of narrowband noise by adjusting the weight according to the different signal noise ratios (SNR). In view of the problems of diffraction and ambiguity of SSL technique based on GCC-PHAT method used on a binaural robot platform, Kim et al. [21] develop an improved SSL method to overcome the diffraction problem by incorporating a new time delay factor into GCC-PHAT method under the assumption of a spherical robot head and the ambiguity problem by utilizing a amplification effect of the pinnae for localization over the entire azimuth. Zhang et al. [22] propose a comprehensive PHAT method and a maximum likelihood method based on GCC to reduce the influence of reverberation on the signal effectively, which obtains more accurate time difference estimations.

#### 3. Method of Sound Source Localization Based on Rectangular Pyramid

##### 3.1. Principle of Sound Source Localization

As shown in Figure 1, S is a sound source, and M_{1}, M_{2}, and M_{3} are three microphones. is the time difference of S propagating to two microphones M_{1} and M_{2}. From the nature of the hyperbolic curve, the absolute value of any distance from the two points on the hyperbolic curve is constant, that is, the length of the real axis. Thus, the sound source will be located on the branch line of the hyperbolic curve near M_{1}, which takes the positions of the two microphones as the focal points, and the distance difference of the sound source is the length of the real axis. Multiple hyperbolic curves can be obtained when multiple microphones are used, and the intersection point of multiple hyperbolic curves is the position of the sound source [23].