Abstract

Nowadays, more than half of the world’s population lives in urban areas. Since this proportion is expected to keep rising, the sustainable development of cities is of paramount importance to guarantee the quality of life of their inhabitants. Environmental noise is one of the main concerns that has to be addressed, due to its negative impact on the health of people. Different national and international noise directives and legislations have been defined during the past decades, which local authorities must comply with involving noise mapping, action plans, policing, and public awareness, among others. To this aim, a recent change in the paradigm for environmental noise monitoring has been driven by the rise of Internet of Things technology within smart cities through the design and development of wireless acoustic sensor networks (WASNs). This work reviews the most relevant WASN-based approaches developed to date focused on environmental noise monitoring. The proposals have moved from networks composed of high-accuracy commercial devices to the those integrated by ad hoc low-cost acoustic sensors, sometimes designed as hybrid networks with low and high computational capacity nodes. After describing the main characteristics of recent WASN-based projects, the paper also discusses several open challenges, such as the development of acoustic signal processing techniques to identify noise events, to allow the reliable and pervasive deployment of WASNs in urban areas together with some potential future applications.

1. Introduction

Of the 7.5 billion people living in the world today, 55% currently live in urban areas, a proportion that is projected to reach 68% by 2050 according to the United Nations [1]. As the world continues to urbanize, the sustainable development of urban areas becomes of paramount importance to guarantee the quality of life of their inhabitants, taking into account the economic, social, and environmental dimensions [1]. During the last decade, there has been a change of paradigm in the management of urban areas under the umbrella of the Information and Communication Technology (ICT) revolution, resulting in the so-called smart cities and smart regions [2]. Although there are different points of view of what can be understood as a smart city (see [3] and definitions therein), all of them consider, to some extent, ICT-based approaches and solutions. In 2011, it was stated that around half of the European cities with more than 100,000 residents had implemented or proposed smart city-related initiatives, addressing, at least, one of the following issues [3]: smart governance, smart people, smart living, smart mobility, smart economy, and smart environment—the last including pollution control and monitoring [3]. Nevertheless, it is worth noting that the transformation of any city into a smart city is a long and complex process [4], despite taking advantage of previous experiences and best practices in similar cases due to specific local particularities [5].

The continuous growth of the number of inhabitants has led to an expansion of transportation systems, including highways, railways, and airways [6], which, in turn, has provoked an increase in environmental pollution. This situation is a cause for concern not only due to the negative effect of air pollutants such as carbon dioxide (CO2) or carbon monoxide (CO) on citizens but also due to noise pollution [6]. For instance, around 90% of New York City residents are exposed to noise exceeding the levels considered harmful to people [7] according to the US Environmental Protection Agency (EPA) guidelines [8]. Moreover, it has been estimated that more than 40% of European inhabitants are exposed to excessive road traffic noise (RTN) levels during daytime, while more than 30% are affected during the night, respectively [9]. These values are typically obtained for a given period of time and represented and computed as the raw equivalent noise levels denoted as Leq (dB) or its perceptually A-weighted counterpart denoted as LAeq (dB(A)).

The impact of environmental noise on people is not harmless. Several studies of the World Health Organization (WHO) [9] conclude that the diseases related to the effect of noise on people are producing a huge loss in healthy life years, in addition to the annoyance it causes to neighbors [10]. For instance, permanent hearing loss and tinnitus are associated with noise exposure [8, 11]. Moreover, it has been argued that noise can lead to adverse birth outcomes, as stated in [12], a work focused on studying the effects of aircraft noise on preterm births. More specifically, RTN—one of the main noise sources in urban areas—increases tiredness and disturbs the sleep pattern [13]. The authors highlight the possibility that having some quieter parts of one’s own residence contributes to better physiological and psychological well-being. Finally, the reader is referred to [14] for a review of the transport noise interventions and their impacts on health in the European region in particular. Therefore, among the different challenges of current and future smart cities, urban managers are asked to react to the alarming increase in the health effects of environmental noise on their inhabitants due to its impact on their quality of life.

From the 70’s to the 90’s, different competent authorities reacted to noise pollution concern in medium and large urban areas. In 1981, the US EPA published a seminal study focused on noise exposure of Americans considering different noise sources (e.g., traffic, aircraft, and construction), introducing several estimations about the number of inhabitants affected by environmental noise across the country [15]. In that document, it was stated that around 90 million people in the US were exposed to harmful outdoor noise levels (from all kind of sources). In 1996, the European Commission (EC) published the “Future Noise Policy” [16], where, for the first time, the EC discussed the impact of noise on humans and the environment. It was estimated that around 80 million people in Europe were continuously exposed to noise levels higher than 65 dB(A) during daytime; besides, further 170 million people were affected by levels between 55 dB(A) and 65 dB(A). More recently, the European Union (EU) approved the Environmental Noise Directive 2002/49/EC (END) [17] and the subsequent Common Noise Assessment Methods in Europe (CNOSSOS-EU) methodological framework, to address noise pollution following consistent and comparable noise assessment results across the EU member states [18], a methodology that has to be fulfilled by all member states by the end of 2018. The main pillars of the END are the following: (i) determining noise exposure, (ii) making the updated information related to noise available to citizens, and (iii) preventing and reducing the environmental noise where necessary. Specifically, the END requires the European member states to publish noise maps and action plans every five years for large agglomerations (with +100,000 inhabitants), major roads (with +3 million vehicles/year), major railways (with +30.000 trains/year), and major airports (+50.000 movements/year) [17]. As a consequence, city noise management involves different disciplines such as planning, noise mapping, development of action plans, policing, management of citizens’ complaints, noise abatement, and public awareness, among others [19]. For instance, an action plan can lead road authorities to optimize the installation of noise-reducing pavements or noise barriers where required, as well as evaluating the achieved reduction of the noise exposure after its implementation [20].

Traditional noise measurements in cities have been mainly carried out by professionals that record and analyze equivalent noise levels in certain locations using certified sound level devices for a given period of time [21, 22]. Nevertheless, this approach becomes difficult to scale up when it comes to tackling the current demand for more frequent noise level measurements in both time and space. Moreover, this approach makes the consideration of oversimplified assumptions in the predictive models inevitable together with the loss of key features of environmental noise, such as the characterization of its temporal evolution [23]. Nevertheless, recent technological advances, mainly thanks to the development of the Internet of Things (IoT) framework, have allowed these drawbacks to be addressed through the deployment of wireless acoustic sensor networks (WASNs) [24, 25]. This development has been made possible by the availability of cheaper and smaller IoT hardware and innovations in communication networks [22, 26] and in acoustic signal processing [25, 27], mainly used to identify the noise source. For instance, WASNs can enable the automatic generation of dynamic noise maps in urban areas pervasively based on controlled measurements, while also lowering the cost of noise mapping by 50% in comparison to the corresponding expert-based update of static noise maps every five years [28]. The main goal of most of the developed projects is to measure and integrate the calculated LAeq for a certain interval of time in a map, sometimes together with other measurements, which are typically sent to a central server in the cloud (see Figure 1). Another common application for automatic sound classification projects is surveillance [29]. The understanding of the urban sound landscape and its corresponding noise identification is a research topic that has gained interest in recent times.

This work presents an up-to-date review of WASN-based approaches focused on environmental noise monitoring in smart cities. The paper describes the most relevant works presented in the literature to date, paying special attention to the characteristics of the network components and nodes in terms of accuracy and computational capacity. The paperalso discusses several open challenges that have to be faced in order to allow the reliable and pervasive deployment of WASNs in urban areas together with some potential future applications.

The paper is structured as follows. Section 2 reviews the main WASN-based environmental noise monitoring approaches, classified according to the typology of nodes. Section 3 describes the main acoustic sensor node categories, classified according to their accuracy and computational capacity. Section 4 presents several approaches of acoustic signal processing designed to run in urban environments. Next, Section 5 discusses several open challenges for the future reliable development of WASNs in smart cities pervasively. The paper ends with the conclusions and future work in Section 6.

2. WASN-Based Environmental Noise Monitoring Approaches

In this section, we describe the main WASN-based approaches developed during the last decade to monitor environmental noise. These projects are organized in two categories according to the types of included sensor nodes, which have moved from commercial devices to acoustic sensors designed ad hoc. The cloud is a key issue in both types of WASNs, in order to send the collected information (the LAeq levels alone or together with extra information obtained in each node). However, none of the deployed WASNs find a bottleneck in the communication [30], even if in each node there are diverse types of sensors, since the required throughput is small (i.e., only few bytes per second).

2.1. WASNs Based on Commercial Acoustic Measurement Devices

Most of the WASNs in this first category include commercial sound level meters as sensor nodes. These devices are connected to a central cloud server that gathers all the information provided by the nodes, mainly LAeq. Noise maps are subsequently generated in the server in order to inform the citizenship. Several pioneering projects in this field follow this basic WASN design idea, being the first projects for this kind of applications.

Telos [31] is one of the first experiences reported in the literature about the design of wireless acoustic sensor networks. The work introduces an ultralow power wireless sensor module designed for research and experimentation in the field of wireless sensor network (WSN) research developed by the University of California, Berkeley, which becomes one of the pioneers in the research in automatic environmental monitoring. It is a mote designed ad hoc with the major goals of minimizing the power consumption, the usability and present a widened software and hardware robustness. It presented extensive sensor interfaces—8 analog lines and IO channels—and it allowed up to 200ksamples/s via Analog-To-Digital Converter (ADC).

In [32], the authors also demonstrated the feasibility of a WASN to be used in a large variety of environmental monitoring applications, specially focused on the monitoring of environmental noise pollution in urban areas [33]. Later on, the same authors kept working on the WASN design, focusing on the problem of data transmission from the sensor nodes to the central server as one of the technical bottlenecks of the WASN design. The problem of adapting the data reporting rate in an autonomous manner was addressed in [34], using a forecasting model designed to suppress data communication when possible in order to allow high communication savings [35]. The platform used allowed up to 40 kHz of sampling frequency with an ultralow power sensor platform thanks to sleep modes.

In [36], the authors detail the deployment of a WASN with the goal of measuring acoustic noise in both industrial and residential environments in Ostrobothnia (Western Finland). Each of the sensor nodes measures the LAeq noise level at its location, and the data is collected by a master node, which is in charge of gathering the data into a web-based database. The sensor nodes are built over an ATmega128 and CC2420 platform. The network design covered a university campus, an industrial park, or a residential block. The authors describe the sampling frequency, which collects 72 bytes every 5 seconds for every sensor node. In [37], the authors explain how the network is designed following a tree topology and a global synchronization is achieved to supply the throughput of data previously mentioned, with an implemented transmission scheduling due to the fact that the noise measurements are time-correlated and cannot be stopped in time.

In [38], a project designed for the monitoring of the traffic noise in Xiamen City (China) is presented, for environmental purposes. Based on the traffic noise data from 35 roads of nine green spaces in Xiamen, the authors model the behavior of those measurement points in order to simulate the traffic of other 100 roads in the island. The design of the network included noise meters, ZigBee, and GPRS communication, and they were all assembled and tuned to get the different types of traffic noise data (e.g., fast road, main road, and secondary road) on specific locations, with the goal of being analyzed and compiled into a dataset depending on the types of measured roads.

An environmental noise monitoring network is being deployed in Barcelona (Spain) in order to manage the resources efficiently and to reduce the impact of urban infrastructures on the environment [39]. After some time working, the Barcelona noise monitoring network (NMN) performance has been recently reviewed in terms of its strengths and weaknesses and also in order to define future open challenges [40]. The main working lines nowadays in the Barcelona NMN are the cost reduction of the sensors and the minimization of the manual tasks in order to concentrate efforts in added value tasks focused on the noise monitoring system data.

The RUMEUR (Urban Network of Measurement of the Sound Environment of Regional Use) is a hybrid wireless network developed in the region of Paris by BruitParif [41]. This WASN includes both high-accuracy equipment for critical places (e.g., airports) and less precise measuring equipment placed in other locations, where the goal is only to evaluate the equivalent noise level of that environment, and in places with various power supply constraints. The authors obtain the measurements of the RUMEUR project from sound level meters installed in a sensor network to pursue the understanding of the measured signal and the development of assessment actions to mitigate noise and communicate the information about the soundscape in Ile-de-France to citizens and authorities [42].

The FI-Sonic project, which is based on the FIWARE platform (https://www.fiware.org/), is mainly focused on continuous environmental noise monitoring plus surveillance [43]. The project is focused on the development of the necessary technology to capture and process the sound using intelligent audio analytic, useful to update noise maps and also to identify and localize a group of sound events [44], ranging from the localization of sniper fire to people in distress.

2.2. WASNs Based on Customized Nodes

In order to satisfy the increasing demand of an automatic monitoring of the noise levels in urban areas [22], several WASN-based projects are being developed in different countries, designed and deployed ad hoc for their application; some of these projects include other environmental measurements besides noise pollution.

To reduce the burden of computational and energy-expensive operations of the sensor node and process them in the cloud, a customized noise level meter (http://www.sensornet.nl/english/) was developed in the SensorNet project [45]. The main project goal is to assess the environmental noise pollution in urban areas. The authors also detail several qualitative considerations and experimental results about the most suitable data collection protocol, in order to show the feasibility of wireless sensor networks.

The SENSEable project in Pisa (Italy) is based on the smart city concept to measure the sound level in several points across the city in real time [46], with the goal of involving citizens in city noise management. SENSEable presents an acoustic urban monitoring system based on low-cost data acquisition for pervasive outdoor noise monitoring [47]. The system is based on the use of noise sensors located on private homes in the center of the city of Pisa, providing a good model for the current acoustic climate of the city; nevertheless, the secondary goal of the project is to show a strong anthropogenic component which is not revealed by public strategic maps denoted as movida.

Also, the MONZA project (http://www.lifemonza.eu/) follows a similar approach to SENSEable [48]. Within this LIFE-funded project, a WASN has been recently deployed in Monza (Italy), which implements a low-cost sensing system [49], with the specific goal of comparing the noise levels before and after interventions in the framework of low-emission urban zones. The smart monitoring system consists of 10 low-cost monitoring devices installed in strategic locations, which acquire the noise time history, with data every second and an acoustic dynamic range of 70 dB, in a frequency range of 20 Hz to 20 kHz, of the sound pressure in broadband and in 1/3 octave band levels. The entire system is designed to minimize the transmission time per hour to the central server where the data can be visualized nearly real time.

The CENSE (characterization of urban sound environments) project is aimed at proposing a new methodology for the production of realistic noise maps in France [50]. The approach is based on an assimilation of simulated and measured data through a dense network of low-cost sensors. Farther than the elaboration of physical indicators, the idea of the project is also the characterization of sound environments. The project includes experts from environmental acoustics, data processing, statistics, graphical information system (GIS), sensor network design, signal processing, and even noise perception. The CENSE project also proposes the production of perceptive noise maps, by means of the development of soundscape models that use automatic identification of noise sources.

The IDEA (Intelligent Distributed Environmental Assessment) project [51] measures noise and air quality pollution levels in urban areas in Belgium. It is a cloud-based platform developed to integrate an environmental sensor network with an informative web platform, which is aimed at measuring noise and air quality pollution levels in urban areas [52]; the data used contains only 6 temporal contrast filters on 31 1/3-octave bands combined with 6 spectral contrast filters, resulting in a 768 dimensional feature space. The MESSAGE (Mobile Environmental Sensing System Across Grid Environments) project [53] also integrates diverse environmental measurements. It monitors noise, carbon monoxide, nitrogen dioxide, temperature, humidity, and traffic occupancy/flow, providing real-time noise data levels in the United Kingdom, with the case study conducted in London.

The UrbanSense project [54] is aimed at monitoring urban noise in real time together with other air pollutants in Canada. The scalable infrastructure designed for that purpose includes a wide range of outdoor sensors together with a data aggregation system and a web-based data management and visualization application in order to show real-time event-based data integrated in a single platform. The sensors are able to monitor pollutants such as CO2, CO, and noise (LAeq), with sampling rates configured to vary from 2 samples/sec to 1 sample every 17 minutes, as well as several meteorological conditions including wind speed and direction, temperature, relative humidity, and precipitation.

In [7], the urban acoustic environment of New York City is monitored using a low-cost static acoustic sensing network named SONYC (Sounds Of New York City); the goal of this project is to monitor the noise pollution in the city providing an accurate description of its acoustic environment. The SONYC project implements a smart, low-cost, static acoustic sensing network based on several consumer hardware (e.g., mini-PC devices and MEMS microphones), working at a sampling frequency of 44.1 kHz using 16-bit audio data. The acoustic sensor nodes can be deployed in varied urban locations for a long period of time, with the goal of collecting longitudinal urban acoustic data, in order to process it and give the interested stakeholders meaningful information to change policies and develop action plans.

The aforementioned RUMEUR project has evolved to Medusa [55], which tries to solve the fact that BruitParif could not resolve the noise source origin at any given time. Medusa solves that issue by means of a hardware system that combines four microphones and two optical systems in a way that it is now possible to represent noise levels on a 360° image of the environment. The source location is solved, but the computational load associated with the solution is high and mostly unaffordable for most of the low-cost acoustic sensor nodes.

Finally, some projects are focused on monitoring specific areas or infrastructures, such as highways. In [56], five points along the National Highway of Burdwan are monitored with sound level meters in order to register equivalent noise levels, besides conducting the corresponding statistical analysis. Several noise descriptors (e.g., L10, L50, and LAeq, among others) were measured in three different periods of the day. Those results were analyzed together with several physiological parameters (e.g., hearing impairment, blood pressure, and heartbeat) measured by means of an audiometer, a mercury sphygmomanometer, and a stethoscope, together with subjective surveys and interviews with the personnel.

Other noise monitoring projects take into account environmental data further than the equivalent noise level. In the smart sound monitoring project, De Coensel et al. conducted a study that crossed acoustic information with subjective perception surveys, in order to consider the typology of the acoustic events occurring in relation to the sleep quality [57]. A sound recognition system is applied to provide information about the detected sounds and establish a relationship between the perception surveys and the identified events related to road traffic noise [58]. However, the approach is only focused on the identification of the events of interest, not on the noise map generation process.

Achieving a good trade-off between cost and accuracy is also the core idea of the WASN design in the DYNAMAP project [59]. This project is aimed at the deployment of a low-cost WASN in two pilot areas in Italy, located in Rome [60] and Milan [61], so as to evaluate the noise impact of road infrastructures in suburban and urban areas, respectively. The DYNAMAP project is aimed at monitoring road traffic noise reliably collecting data at 44.1 kHz in order to remove specific audio events, thus making the removal of those events unrelated to road traffic (a.k.a. anomalous noise events) mandatory [62, 63] for the noise map computation [60], going further and even evaluating the impact of each anomalous noise event on the final LAeq level [64]. As far as we know, this is the very first project aimed at the monitoring of one specific noise pollutant in real-world environments, road traffic noise, which has proven to be the main source of noise pollution in urban areas, being, at least, as harmful as air pollution to citizens’ health [9, 65].

3. Acoustic Sensor Nodes: Accuracy and Computational Capacity

In this section, we review a key feature to consider during the design of the WASN: the memory and the processing capacity of the nodes (see Figure 1). When the sensor nodes are designed to run simple tasks (e.g., calculating LAeq during certain periods of time and sending it up to the cloud), only basic processing hardware is needed. In contrast, when the computation of spectral analysis or automatic classification is required, these algorithms can hardly be computed using low-cost equipment. The sensor nodes can be divided into three main categories according to their measurement accuracy and the computational capacity: (i) high-accuracy acoustic sensors, usually sound level meters, which are expensive and only provide equivalent noise level values; (ii) low-cost acoustic nodes, balancing accuracy and price but with a hardware platform allowing high computational capacity (Hi-Cap); and (iii) low-cost and low-capacity nodes (Lo-Cap), usually designed to measure values in remote locations or in places where only the LAeq measurement is required. There are projects that use a combination of (ii) and (iii) to design a complete network, with accurate signal detection algorithms deployed in more critical places and sensor nodes to compute LAeq in less relevant sites.

3.1. High-Accuracy Acoustic Sensor Nodes

The first category of sensor nodes is built to achieve high accuracy and reliability, together with low noise floor. To that effect, most of these acoustic sensor nodes are monitoring devices from Bruel & Kjaer [66] or Larson Davis [67], which are equipped with IEC class I microphones. Those WASNs working with this kind of sensors are mainly deployed to perform a detailed study of the acoustic environment of the city of interest.

Another example corresponds to the FI-Sonic project [43], where the sensor nodes work with ambisonic microphones and have a multichannel acquisition card (from 2 to 128 GB). The WASN includes a network interface (with a Wi-Fi/3G modem) and a media server, by means of its main processing unit, which also conducts all the audio analyses [44]. The collected information is used to create quasi-real-time dynamic noise and event maps, as well as to identify specific pretrained sound sources for surveillance purposes. The main problem associated with this first category of sensor nodes for WASNs is the price of the deployment of a large network with dozens of nodes, which may become prohibitive and at the same time inflexible in terms of implementation of signal processing algorithms in the device, thus only providing indicators related to the measured equivalent noise levels.

3.2. Low-Cost High-Capacity Acoustic Sensor Nodes

A second category of acoustic sensor nodes is designed to balance the accuracy and the cost of the entire network minimizing the price of each node and maintaining a reasonable accuracy in the measurements. These acoustic sensor nodes are usually deployed in quite large networks. In addition to price and accuracy considerations, they are also designed to allow the possibility of real-time signal processing locally in each network node.

The project in Xiamen City (China) [38] deploys a network of low-cost commercial sound level meters, with ZigBee technology and GPRS communication for data gathering, with the final goal of collecting the equivalent noise level in several parts of the city. Furthermore, we can find both those installed in the WASN of the IDEA [51] and MESSAGE [53] projects. They are based on a single-board computer using low-cost sound cards and low computational capacity; this kind of sensors allows the deployment of large acoustic sensor networks due to its affordable cost.

Most of the aforementioned environmental noise monitoring approaches are only focused on measuring the LAeq values; therefore, the nodes are only required to conduct their computation. When the application requires a higher complexity in the processing of the input acoustic signal, the computational capability of the nodes of the network should be increased accordingly. Nevertheless, some of the acoustic sensor designs of this second category have been developed ad hoc for each project. In [7], the urban sound environment of New York City is monitored using a low-cost static acoustic sensing network, including micro-electro-mechanical system (MEMS) microphones in order to conduct reliable measurements at class II level. These sensing devices currently incorporate a quad-core Android-based mini-PC with Wi-Fi capabilities to evaluate the acoustic signal and conduct data communications.

Other approaches consider hybrid networks, composed of both Hi-Cap and Lo-Cap nodes. In the WASN described in [68], the advanced nodes allow far more processing capabilities in comparison with the basic ones, using a small PC with a 2 GHz Intel Atom Processor running on a Linux operating system. The advanced nodes can both store and process the acoustic data and are developed with enough computational capacity and flexibility to perform several signal processing analyses.

The RUMEUR hybrid network [41] includes both low-accuracy equipment for secondary measurement sites and high-accuracy equipment for critical places, like airports, where the focus is to obtain detailed acoustic information due to the intense noise environment. The sensor nodes in the high-capacity part of the hybrid network use a class I microphone, and the signal processing in the device includes acoustic event detection [69]. The measurements are obtained from sound level meters, which are used to also assess actions to mitigate noise and communicate the information about the soundscape to concerned individuals [42].

Finally, achieving a good trade-off between cost and accuracy is also the central idea of the DYNAMAP project [60], which also has deployed a hybrid network in its two pilot areas. The sensor nodes designed for that network are low-cost and use class II MEMS microphones. The Hi-Cap nodes are based on an ARM-based core, allowing signal processing techniques to analyze and process the acoustic signals in real time and also to address the other node tasks such as data communications and evaluation of LAeq [70].

3.3. Low-Cost Low-Capacity Acoustic Sensor Nodes

A project [36] designs a WASN in order to measure the environmental acoustic noise. The sensor node is built on an ATmega128 and CC2420 platform; the protocol stack is based on CiNet with a global synchronization scheme. The A-weighting filtering (specifically the ITU-R 468 (http://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.468-4-198607-I!!PDF-E.pdf) is also implemented in the sensor node. The authors have compared the design with two standard sound level meters (CESVA SC-20c and Pulsar24), and their proposal shows less than 2 dB error in both short-term and long-term measurements. It is able to offer real-time data to the competent authorities [37].

As mentioned above, in [68], the authors present the design of an acoustic sensor network following a hybrid approach. The hardware platform for a basic node—in comparison with the advanced node previously described—is a low-power μC whose main goal is only to compute LAeq and transmit the collected LAeq data periodically.

The low-precision measurement equipment of the hybrid RUMEUR project [69] is aimed at only updating the noise map with the corresponding LAeq level [42]. The sensor nodes used in that part of the hybrid network are low-cost devices that obtain LAeq values with class II level microphones. They are required because, as stated in the description of the RUMEUR project in Section 2.1, the inclusion of class I high-precision sound level meters in the network is still limited by relatively high costs.

The DYNAMAP project, as a hybrid acoustic sensor network, also includes low-cost and low-capacity sensor nodes [70]. They are fed by solar panels, and so they are more flexible in terms of sensor node location in remote areas, in order to maximize the acoustic coverage of the work. The core of that type of sensors is a low-capacity microcontroller (μC), and these nodes compute in real time the equivalent level LAeq and send it every second to the central server. The DYNAMAP project can be understood as a step ahead from the preliminary results obtained by the SENSEable project deployed in the city of Pisa (all of them in Italy) [46].

4. Environmental Acoustic Event Detection

As mentioned before, most of the WASN-based environmental noise monitoring systems are designed to continuously measure sound levels where they are deployed in global terms. However, the extraction of specific information about the sound sources present in the acoustic environment is a key issue to meet the requirements of the legislation, besides allowing further detailed analyses beyond basic LAeq computation. In this section, we review recent representative works focused on the development of acoustic event detection (AED) algorithms in urban environments, and some of them are already working in real-operation mode, differentiating one-class novelty detection from multi-class-based classification approaches. The AED approaches are typically based on a two-main-stage process: the parameterization of the input audio—also known as feature extraction—and a machine learning approach, which is typically trained with some representative data of the acoustic problem of interest, i.e., following a supervised or semisupervised approach. Next, we detail those preliminary AED-based proposals that have been already implemented within a WASN.

4.1. Acoustic Event Detection in Urban Environments

First, some representative works describing AED approaches designed for different environmental noise-related applications are described. In [71], the authors introduce a novel AED approach for acoustic surveillance and evaluate its performance on a simulated real-life scenario. The main goal of the system is to identify abnormal audio events such as screams, shouting, or pleading within an outdoor public security context. The acoustic data is parameterized using different audio descriptors, such as mel frequency cepstral coefficients (MFCC), MPEG-7 low-level descriptors (LLD), intonation and Teager energy operator, and perceptual wavelet packets (PWP). The approach is based on probabilistic classifiers, considering Gaussian mixture models (GMM) and hidden Markov models (HMM), which are only trained with data from the regular acoustic environment (i.e., the majority class), thus following a one-class classification (OCC) scheme.

Following a similar approach, Aurino et al. apply support vector machines to identify hazardous situations [72]. Specifically, the system is trained to recognize gunshots, broken glasses, and screams, following a two-stage classification scheme: after classifying the short-term audio segments through an ensemble of OCCs, it integrates their outputs every second through majority voting. In [73], a multiclass AED approach is introduced to monitor traffic congestion in urban environments, which allows the possibility of detecting car crashes. The system is based on a two-stage HMM-based classification scheme, considering again MFCC, LLD, and PWP to parameterize the input acoustic data. However, it is to note that the database is synthetically generated considering specific SNRs using samples from several professional sound effect collections.

In [74], an AED approach with a similar goal is designed to detect tyre skidding and car crashes on low-cost hardware platforms for surveillance purposes. A bag-of-words representation is used to perform AED after training a pool of SVM-based classifiers that consider different feature extraction techniques including low-level features, such as volume, energy, zero crossing rate, and MFCCs or bark subbands. The work also analyzes the sensitivity of the classifier for different distances from the microphone. Nevertheless, as in other works, a synthetic audio database is built by mixing real-life acoustic data with the two classes of hazardous road events of interest, making the conclusions difficult to scale up to real-life operation contexts. In [29], instead of considering typical local temporal-spectral features, in [29] a multiclass AED that considers both local and global parameters is introduced. The proposal considers a mixture of expert models to be a machine learning approach, and it is tested using the 10-class UrbanSound8k dataset [75], which tries to emulate real-life conditions. In [76], a AED approach based on nonnegative matrix factorization (NMF) and short-term fast Fourier transform (FFT) is introduced with the aim of isolating the contribution of road traffic noise from the measurements of urban sound mixtures. This work has been recently applied to estimate road traffic sound levels [77], showing good results within a synthetically generated database following a similar approach as the one described in [78].

Finally, the reader is referred to the literature derived from the challenge named Detection and Classification of Acoustic Scenes and Events (DCASE), a competition that presents different challenges in each call [27], including the detection of acoustic events [79] on real-life acoustic data, such as the TUT database [80], which include different proposals based on deep neural networks (e.g., see [81, 82]. However, this machine learning approach is based on a huge amount of labelled data for training purposes, a requirement that may become very complex when dealing with real-life data and, in particular, when building representative datasets of anomalous acoustic events [62].

4.2. Acoustic Event Detection in Urban Sensor Networks

Next, several works that have preliminary implemented AED approaches in WASNs for smart cities are described. In [43], the AED is developed to identify and locate diverse acoustic events produced by different hazardous situations (e.g., gunshots, screams, horns, and road accidents) in a high-capacity FIWARE-based sensor network already described in Section 3. This approach was subsequently tested in a proof-of-concept WASN composed of 3 nodes, including ambisonic microphones and the AED being run on a centralized media server [44]. The AED is based on quadratic discriminant analysis and neural networks (NN) as the classification approach, with the input audio parameterized using low-level signal features and psychoacoustic parameters like standard MFCCs.

In [78], a preliminary study on the detection of anomalous noise events for the reliable tailoring of road traffic noise maps was developed and tested on a small synthetic database mixing real-life data with audio snips from Freesound. The AED was implemented considering two classification approaches—k-nearest neighbour (k-NN) and Fisher Linear Discriminant (FLD)—and two audio parametrization techniques—MFCC and Gammatone Cepstral Coefficients (GTCC) [83]. Nevertheless, that approach has been recently improved by the development of a two-class AED classifier, trained with acoustic data obtained from a real-life recording campaign within the DYNAMAP project [62]. The developed Anomalous Noise Event Detector (ANED) is based on a two-stage classification process based on MFCC parameterization and GMM as the core machine learning approach. The results show that this approach outperforms the OCC counterpart only trained with road traffic noise as the majority class. The ANED is currently working on the two WASNs of 24 nodes each, one in the pilot area of Rome (suburban) and another in Milan (urban).

In [84], an AED approach identifies target sounds from background noise to assign the measured sound levels to the present different sound sources. The AED is based on a binary classifier that discriminates the target sound from the background noise (e.g., traffic, wind, rain, thunder, and birds). Again, MFCCs are selected as the feature extraction technique, and the classifier is based on GMM and NN, trained from an annotated real-life dataset. However, the proposal has not been yet tested in an urban environment, since the authors have selected a rock crushing site, and only one acoustic sensor is considered (i.e., no network is deployed).

Finally, in [85], an AED implementation on 23-node WASN is described, aimed at the acoustic classification of moving army vehicles. The audio input is parameterized using FFT and the machine learning approach that NNs, besides GMMs and HMMs. This piece of research pays special attention to several types of phenomena directly affecting the operation of this kind of networks, such as possible sensor faults, hardware aging, or environmental changes, to name a few. The experiments validate the proposal in terms of fault detection capacity and its ability to classify the moving vehicles in the presence of sensor faults and environmental noise. However, only around 4 h of data is considered in the experiments, which makes it difficult to draw long-term conclusions.

5. Discussion

Although WASNs are becoming an incipient reality in some smart cities, there is still a long way to go to make the most of this IoT-based approach in order to monitor environmental noise dynamically, reliably, and pervasively. Table 1 classifies the reviewed WASN projects according to their main characteristics in chronological order. Most of these WASN-based projects have been deployed to validate the viability of their approach in some specific environment, e.g., in District 9 of Milan and the A-90 highway surrounding Rome as the two pilot areas of the DYNAMAP project with a hybrid network composed of 24 nodes each [59], or by distributing 112 devices across the city in a balanced manner in the Barcelona NMN [40]. Nevertheless, there are still several pilot projects, with a small number of nodes (e.g., 4) deployed for about two weeks [86], and this is considered a long-term measurement, far from what a 24-hour 7-day a week is needed to monitor an urban environment. To this aim, the performance and computational capacity of the sensors, together with the detection of sensor faults, aging phenomena, environmental changes, etc., is of paramount importance [85]. Another element to take into account in the design of the network, although in this case we have considered that it was a commodity, is the cloud connection of all the nodes of the WASN with a central server, to integrate all the data collected at each point and to show them if it is convenient. The design of the network must take into account the possible latency of the data coming from different points, and this will be variable depending on the type of data network implemented [87]; it can be 3G, 4G, or Wi-Fi, and its latency—although of orders of magnitude less than the sampling time—will have to be taken into account when integrating the data.

As discussed previously, WASNs have considered both high-accuracy commercial devices and low-cost sensors designed ad hoc. The latter have evolved to Hi-Cap low-cost nodes, which allow the implementation of some kind of signal processing algorithm to analyze the input acoustic data with a reasonable cost. This issue is of great relevance, since WASNs are asked to address and solve several open challenges for the complete monitoring of complex acoustic environments that can be found in urban environments, which should be considered according to the END, e.g., a highway close to an airport, a port that is embedded in a city neighborhood, and a train station placed close the port. Moreover, WASNs should be capable of distinguishing between these specific noise sources, if we want to use them to address the END and CNOSSOS-EU requirements. Most of WASN-based projects are focused on measuring the global equivalent noise level of the monitored acoustic environment without identifying the different noise sources that compose it. In the literature, we can find some seminal works towards this goal; e.g., the DYNAMAP project is being developed to monitor road traffic noise only, thus asking the WASN-based system to remove other noise sources for the RTN map computation. The hybrid WASN includes sensor nodes that are capable to run an ANED designed to remove nontraffic noise events from the LAeq computation. This approach opens up the possibility to tailor noise maps for each relevant noise source, beyond the noise level computation, which will provide the competent authorities with valuable data to develop specific noise policies. For instance, a major concern in cities is the noise derived from leisure and recreational activities due to summer festivals, neighborhoods with several pubs, etc. The so-called movida is quite common in Mediterranean cities, and it makes it necessary to explore WASN-based methods to monitor this type of noise and to adopt ad hoc strategies for the creation of reliable noise pollution maps and the subsequent action plans as it is one of the main sources of citizens’ complaints about noise after RTN [40].

Finally, it is worth mentioning that current studies about the effect of noise on the health of citizens and the derived legislation, typically based on static noise map values, could be improved dramatically with the ubiquitous deployment of WASNs permanently, because nowadays most of the pilots are only deployed for days or weeks [86]. From the collected acoustic information, annoyance maps could be generated to provide information not only about the objective noise level but also about the subjective impact of the noise pollution (see [88] and references therein). To this aim, it is worth mentioning the recently started ANIMA (Aviation Noise Impact Management through Novel Approaches) project (http://anima-project.eu/) aimed at the identification and dissemination of the best practices to lower the noise annoyance endured by communities around airports, involving citizenship. Last but not least, although the scope of this paper was laid out, there is an increasing trend to enroll citizens in the noise monitoring field. The reader is referred to [26] for a complete review of smartphone applications for crowd-sourced noise measurements.

6. Conclusions

In this work, we have reviewed the main approaches found in the literature focused on the design and development of wireless acoustic sensor networks for environmental noise monitoring in smart cities. As traditional static noise mapping has been conducted by means of expert-based sound level measurements, the initial WASN-based approaches opted to build the network using commercial devices. The measured equivalent noise levels, typically LAeq, were collected and sent to a central server automatically, thus substituting the participation of technicians in the measurements. Although these WASNs provided high-accuracy results, the cost of their nodes made them very expensive for large-scale installations. Later, several projects included the design of ad hoc acoustic sensors, with most of them focused on the development of low-cost sensors to allow the pervasive deployment of the noise monitoring network. Within this group of networks, we can find both low- and high-capacity nodes (sometimes mixed in hybrid networks), dynamically providing LAeq values, while some seminal WASN-based projects are also including some acoustic event detection techniques to obtain extra information from the measurements. In this context, it is worth mentioning that low-cost high-capacity sensors have started being used to monitor specific noise sources in urban environments (e.g., road traffic noise or specific events for surveillance) in order to address the requirements of END and CNOSSOS-EU legislation. Moreover, this opens up the possibility that these sensors could be specifically designed to monitor leisure and recreational areas or critical places such as hospitals and schools, among others. Finally, since noise pollution is one of the principal sources of health problems along with air pollution according to WHO, WASNs are envisioned to become a key IoT-based technology to address this problem in smart cities. In the near future, reliable and ubiquitous WASNs will be able to provide valuable information to control and mitigate environmental noise far beyond current studies, mainly based on static noise maps developed every five years. Nevertheless, further research should be conducted to improve the performance of WASNs in real-life operation conditions, especially if the data obtained from these networks will be used by the competent authorities to develop action plans, impose administrative penalties, etc. Therefore, it is worth noting that although WASNs are becoming an incipient reality, very few projects have been deployed in some smart cities around the world (most of them as pilots); thus, the complete exploitation of this technology still has a long way to go.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The research presented in this work has been partially supported by the LIFE DYNAMAP project (LIFE13 ENV/IT/001254). Francesc Alías acknowledges the support from the Obra Social “La Caixa” under grant ref. 2018-URL-IR1rQ-021.