Acoustic array processing is today an essential part of many applications involving the analysis of audio signals, such as hearing aids, hands-free devices, or immersive audio recording. While a number of acoustic sensing and processing systems have been proposed over the last decades, these have typically relied on high-throughput computing platforms and/or expensive microphone arrays. Although microphone arrays yield a higher performance than single-microphone systems, some limitations arise from the fact that the position of the microphones tend to be fixed, and all the signal processing tasks are performed on a centralized processor. The alternative is to use comparatively low-resource, distributed nodes with sensing devices and algorithms aimed at detecting, localizing, or characterizing acoustic events. The advantage of these systems is that the wireless, battery-powered nodes are less expensive and can be easily deployed in a wide range of environments. Moreover, as opposed to traditional microphone arrays that sample a sound field only locally, distributed acoustic sensing systems allow using many more sensors to cover a large area of interest.

Signal processing and machine learning research for advanced acoustic systems of this type is giving birth to emerging technologies and services with a great exploitation potential. Current application domains such as smart cities and buildings, ambient assisted living, or habitat monitoring have already demonstrated the interest for acoustic-based solutions. Internet of Things (IoT) platforms and single-board computers have substantially increased the capabilities of sensor networks aimed at acoustic signal processing, opening new possibilities, and challenges for making of sound a valuable source of information for the development of new services. Therefore, audio signal processing and machine learning for Wireless Acoustic Sensor Networks (WASNs) has attracted the interest of many authors.

The article by M. Cobos et al., coauthored by the guest editors of this special issue, provides an extensive survey of the current state of the art of sound localization approaches in WASNs. The article assumes the case of a fusion center where localization takes place and considers both the case of single and multiple microphones at each WASN node. The most popular approaches for sound source localization are presented, including approaches based on signal energy, Time of Arrival (TOA), Time Difference of Arrival (TDOA), Direction of Arrival (DOA), and steered-response-power (SRP) methodologies. The problem of estimating the node locations (typically referenced as self-localization) is also considered. The article concludes by posing significant challenges in this area which are still open and call for further research efforts.

One of the most common approaches for source localization in WASNs is based on DOAs and envisions the cooperation of multiple nodes, each estimating a DOA. A three-dimensional source location estimate typically requires each node to provide azimuth and elevation of the sources in the acoustic scene to the central node, with a negative impact on the hardware costs, as in each node microphones must be deployed on a two-dimensional grid, whereas an azimuth-only DOA would require a linear array. A. Canclini et al. in the paper “Distributed 3D Source Localization from 2D DOA Measurements Using Multiple Linear Arrays” propose a methodology to overcome this issue, combining multiple DOAs expressed in terms of azimuth only to yield a three-dimensional estimate of the source location. The methodology is based on the observation that an azimuth-only DOA is referred to the plane where both source and sensor lie. The original arrays are converted into equivalent arrays, all lying on the same plane along with the source. Using this formalism, a cost function is defined, whose minimization leads to the source location estimate. Authors test the effectiveness of the novel algorithm against variable reverberation and signal to noise ratio, also comparing it with state-of-the-art techniques based on the combination of azimuth and elevation DOAs.

Source localization algorithms usually rely on the knowledge of the speed of sound. However, a variable speed of sound can be encountered in several application scenarios of WASNs, such as multizone buildings and outdoor environments. The paper “Acoustic Source Localization under Variable Speed of Sound Conditions” by P. Annibale and R. Rabenstein investigates the problem of source localization under unknown speed of sound. After a revisitation of the physical foundations of the relation between speed of sound and temperature, the paper focuses on the reformulation of the problem of source localization from Time Differences of Arrival (TDOAs) when the unknown parameters also include the speed of sound. In particular, the reformulation is provided for the two main classes of algorithms for source localization based on TDOAs, namely, Unconstrained Least Squares and Constrained Least Squares. Authors show the improvement in source location estimate brought by the extended formulation. P. Annibale and R. Rabenstein extensively discuss problems different from source localization that are affected by an unknown speed of sound, ranging from reflector estimation to TDOA disambiguation. Finally, authors also shed light on some applications, though not strictly related to WASNs, where the speed of sound reveals important information.

Self-localization of acoustic sensors is one of the most important topics of WASNs for cases when acoustic sensor nodes are deployed as components of the network without any prior knowledge of their locations. The paper “Acoustic Sensor Self-Localization: Models and Recent Results” is a review paper that covers this topic when the acoustic sensor nodes exclusively rely on acoustic signals for localization without relying on any communications among them or any other types of sensors. In particular, this paper addresses the scenario when the sensor nodes are positioned in a known environment with the prior knowledge of the probe signal and the locations of the sources, or the loudspeakers within the environment. This enables the self-localization of the sensor nodes simply by analyzing only the received acoustic signals at each node. Based on this scenario, this paper first reviews existing closed-form least squares solutions based on the measurements of the TOA or the TDOA by considering two practical issues of their measurements, namely, asynchrony and frequency mismatch between loudspeakers and the microphones. Then, this paper describes methods based on the concepts of its well-known dual problem, that is, sound source localization for the acoustic sensor localization. The paper also presents the sliding window technique, matching pursuit algorithm, and TOA selection for improving the TOA/TDOA estimates and addresses the topic of designing the probe signals such that they are inaudible and have low power for practical deployment. This review paper will serve as a good starting point for those beginning research in this field.

Smartphones sold nowadays are commonly equipped with multiple microphones and loudspeakers. In typical situations such as in meeting a room where there are multiple participants each carrying a smartphone, there may exist several of these, making them great candidates for constructing an ad hoc WASN. In order to use them as components of a WASN for joint processing of acoustic signals, it is essential to determine their locations as well as their orientations. The paper “Indoor Self-Localization and Orientation Estimation of Smartphones Using Acoustic Signals” by H. A. Sánchez-Hevia et al. addresses this topic for smartphones equipped with at least two microphones and a loudspeaker by allowing them to communicate with other smartphones and to play sound with their loudspeakers. One of the most challenging problems for self-localization is the DOA uncertainty that can be caused by various factors of a WASN. The authors addressed this problem using the genetic algorithm (GA) in order to overcome the DOA uncertainty issue and the joint estimation of the orientation in the maximum-likelihood framework. Although the GA may add significant computational burden to the system, faster and parallelized processors are expected to overcome this issue in near future.

In most WASN applications, each node of the WASN consists of sensing devices, that is, one or more microphones. However, several interesting applications require each node to also include actuators, that is, loudspeakers. One such application is active noise control, where the WASN has the objective to create a zone of destructive interference in order to cancel a particular noise source by generating the appropriate sound signals. In the paper by C. Antoñanzas et al., the authors consider the problem of active noise control in Wireless Acoustic Sensor Networks and investigate several control effort strategies which are needed in order not to violate the power constraints associated with the operation of the acoustic actuators. They analyze and compare the different strategies in terms of performance, computational efficiency, and communication requirements.

Speech enhancement methods for WASNs are another topic of interest, where the exchange of audio signals between the wireless nodes is needed. Algorithms for reducing the amount of data exchange within the network are necessary from both an energy and bandwidth efficiency perspective. In the paper by F. de la Hucha Arce et al., an adaptive quantization scheme to optimize the bit depth of each exchanged signal that considers its contribution to the speech enhancement performance is discussed. A multichannel Wiener filter framework is considered, proposing a new metric for adaptive quantization based on the gradient of the minimum mean squared error (MMSE) that leads to a greedy algorithm. It is also shown that a previously proposed impact metric is a generalization of the gradient metric. The energy savings obtained through the use of the greedy adaptive quantization are discussed both in a simulated and in a real WASN setup.

Another application of WASNs is also discussed in the paper by A. Luque et al., entitled “Evaluation of the Processing Times in Anuran Sound Classification.” The paper proposes the use of WASNs for habitat monitoring, specifically for the classification of anuran songs. Their work is focused on the analysis of the processing times needed on different stages of the system, including acquisition, feature extraction, and classification. The paper considers different sets of features and classifiers, showing the trade-offs arising between classification accuracy and required processing power in this kind of WASN applications.

The works included in this special issue confirm the interest of the research community in taking advantage of the new advances in hardware and data connectivity for integrating acoustic-based technologies into a broad range of applications. The received contributions focused on important aspects such as sensor self-localization, sound recognition, speech enhancement, active noise control, or the localization of acoustic sources. In any case, it is imperative to continue progressing and putting more research efforts into many other aspects that are crucial for generalizing the use of WASNs into future sound-related applications. These include the development of methodologies for the assessment of cost-effectiveness, the availability of automatic calibration and deployment methods, the design of energy-efficient signal processing algorithms, and the appropriate management of privacy-related issues.

Maximo Cobos
Fabio Antonacci
Athanasios Mouchtaris
Bowon Lee