Abstract

Traffic routing is a central challenge in the context of urban areas, with a direct impact on personal mobility, traffic congestion, and air pollution. In the last decade, the possibilities for traffic flow control have improved together with the corresponding management systems. However, the lack of real-time traffic flow information with a city-wide coverage is a major limiting factor for an optimum operation. Smart City concepts seek to tackle these challenges in the future by combining sensing, communications, distributed information, and actuation. This paper presents an integrated approach that combines smart street lamps with traffic sensing technology. More specifically, infrastructure-based ultrasonic sensors, which are deployed together with a street light system, are used for multilane traffic participant detection and classification. Application of these sensors in time-varying reflective environments posed an unresolved problem for many ultrasonic sensing solutions in the past and therefore widely limited the dissemination of this technology. We present a solution using an algorithmic approach that combines statistical standardization with clustering techniques from the field of unsupervised learning. By using a multilevel communication concept, centralized and decentralized traffic information fusion is possible. The evaluation is based on results from automotive test track measurements and several European real-world installations.

1. Introduction

In the context of the Internet of Things (IoT), the integral transformation process of urban infrastructure into intelligent and connected devices plays an important role for future mobility. Cities experience an increasingly high level of road congestion due to the focused traffic coming along with progressing urbanization. This congestion induces a high social, economical, and environmental cost. From a European Union (EU) perspective, it is amounted to about one percent of the GDP [1] and predicted to increase by about 50 percent by the year 2050 [2]. From an environmental point of view, carbon dioxide emissions and air pollution, especially locally in cities, are a significant social and health aspect [3]. With this new unprecedented level of traffic density, offline and online traffic planning including live routing techniques becomes a requisite for larger cities, being useful both for end users and for city planners. Especially in the case of real-time traffic monitoring and the subsequent information provision to drivers and public routing systems, proper conditioning of this traffic information in terms of accuracy, area coverage, and density is paramount. Traffic congestion can only be effectively reduced if this information base, which is used for the routing decisions, is delivered with a sufficient level of quality in these dimensions. Then, with a city-wide traffic status, large-scale end-to-end routing optimizations can be performed. The results can be deployed to the vehicle driver with provision methods such as in-car information systems or centralized traffic guidance for triggering the individual’s routing reaction. If this provision is not performed properly either due to an insufficient traffic information quality or improper consideration of group effects in the optimization, even detrimental effects can occur [4]. In this article, these requirements on traffic information quality and density are in focus, building a base for future routing decisions. Many currently used traffic sensing techniques for creating this live data basis will only have a negligible positive impact on the routing decisions if they are distributed sparsely and do not achieve sufficient coverage.

In general, two main types of techniques achieving more comprehensive traffic information have been established in the last years. A first category is the fixed or mobile infrastructural distribution of sensors, like inductive loops and traffic cameras. The second type is represented by systems relying on end-user based distributed information, for example, movement data from smartphone users or information gathered using V2X communication approaches. Details and characteristics of several techniques are discussed in this paper, considering not only metrics like traffic information quality and performance in different scenarios but also possible privacy issues which may arise with certain techniques.

The novel ultrasound-based traffic sensing technique, which is the focus of this paper, belongs to the category of infrastructural sensing. A significant difference to existing ultrasonic sensor systems is the placement of the sensors in a so-called sidefire setup. This means that the sensors are mounted in a height of at least three meters and face the street sideways with downtilted sensors, while being positioned on the side of the street. Previously existing systems were mainly single-lane ultrasonic sensors measuring the height profile top-down, for example, from a special sensor gantry above the street, with a simple first-reflection processing. The monitoring of multiple lanes is possible with our setup, while, at the same time, the requirements in terms of algorithmic evaluation and processing are increased in comparison to these previously existing systems. However, with these improvements, our approach allows operation in a highly reflective acoustic environment, which is present in urban areas and permits simply mounting the sensor on the side of the street. This is enabled by a new approach with a combination of statistical signal processing, clustering, and inference algorithms for traffic participant object detection.

This sidefire ultrasonic traffic sensing technique is part of a development which focuses on intelligent infrastructure solutions, more specifically on intelligent street lights. An exemplary integrated setup is shown in Figure 1. These light modules can be used to replace inner components of street lamps, for example, by retrofitting the widespread type of high-pressure sodium-vapor lamps. Then, they can act as a flexible platform for many applications by employing the processing, communication, and light control actuation capabilities. With the transition to LED-based lighting, which is triggered by high energy costs and several EU directives [5], a large proportion of the existing street lamps will have to be replaced in the next decade, offering potential for an integration of Smart City solutions. The seamless integration into infrastructure allows an almost complete coverage of relevant urban traffic to provide the required coverage density of traffic information. Furthermore, the major problem of power supply for sensing devices is alleviated, as the lamps are connected to the mains power, which eliminates the need for energy-constrained and expensive battery-powered devices.

The rest of this article is organized as follows. Section 2 provides the related work of urban traffic sensing techniques in general and ultrasonic sensing in specific. Section 3 describes the system architecture together with the specific scenario and typical sensor setup. Section 4 provides the algorithmic approach to the sensor signal processing system and object detection inference. In Section 5, the evaluation methodology and framework for reference data are presented. Section 6 gives results of real-world measurements and provides an evaluation from the perspective of detection and classification together with a state-of-the-art technology comparison. Section 7 then concludes the article.

Ultrasonic traffic participant detection falls into the field of traffic sensing with a large number of techniques available in research and in practical application. In this section, we therefore provide a state-of-the-art overview from two perspectives. First, we give a comprehensive overview of traffic monitoring techniques in general with a discussion of their performance, strengths, and weaknesses in order to show the technological gaps and potential. In the second part, specific technical approaches in ultrasonic traffic sensing, which are rare both in concepts and in practical implementations, are given. This application-centric perspective shows the drawbacks and unresolved problems in current state of the art.

2.1. Traffic Monitoring Techniques from a Broad View

In the introductory section, two main categories of traffic information acquisition systems were introduced. These are on the one hand infrastructure-based sensing systems, working in a temporarily or permanently fixed location. On the other hand, the type of end-user assisted sensing systems incorporates distributed information acquired from the drivers, their technical devices, and the vehicle’s internal sensors itself. In the following, a state-of-the-art overview together with a technological comparison is given for nowadays most relevant traffic monitoring techniques of both categories. A condensed overview of the properties is also given in Table 1.

The most widely used approach to traffic sensing is the use of infrastructure-based solutions, which represents a mainly centralized approach. In general, authority of the data is kept with the operator and, thus, several of the problems coming with user-centric techniques do not arise. However, the sensing systems have to be deployed over the whole city or at least at critical locations in terms of traffic. Achieving the required density for an extensive urban traffic monitoring is therefore paramount, which brings along high effort and cost as a centralized solution.

Camera-Based Systems. Frequently, camera-based systems are used for traffic monitoring. Here, we focus on systems with an automated analysis that is based on Computer Vision algorithms. With camera-based systems, a high detection accuracy plus a vehicle classification can be achieved. During nighttime, further solutions are required, for example, artificial illumination of the road and infrared or thermal imaging. Only in bad visibility conditions as fog, snow, or smog, the performance is limited [6]. The main concern with these systems, however, is the issue of privacy arising with video cameras in the public domain. On the one hand, the personal privacy of the citizens is affected; on the other hand, governmental regulations might even forbid the use of cameras in public areas, especially in this high-coverage case. Also, even with only a sparse coverage of video cameras in the public space, acceptance levels amongst citizens are low [7].

In-Pavement Sensors. Sensors in the urban area embedded into the pavement asphalt are often used in city intersection areas for traffic light control. The main in-pavement sensor technologies are inductive loops, piezoelectric sensors, and strain gauges [8]. They can also enable high-accuracy traffic measurements together with a classification by counting the axles of vehicles [9]. As a downside, the installation of pavement-integrated sensors is highly invasive and costly and requires per-lane road work with a possible temporary disturbance of city traffic.

Infrared Sensors. Passive and active infrared sensors are nonintrusive sensor types which can be mounted on the side or directly above the street. They offer detection with additional speed and vehicle class information in some realizations. With active sensors, multilane operation is possible. However, infrared sensors are susceptible to bad weather conditions like rain, snow, or fog. Furthermore, active infrared sensors require regular cleaning which might lead to lane closure during maintenance [10, 11].

Radar Systems. From the basic object detection principle, radar sensors represent the technology most comparable to ultrasonic sensors in the field of traffic sensing. High pulse repetition rates and ranges together with the possibility of obtaining directional information together with velocity data allow the monitoring of multiple traffic lanes and objects independent of the weather conditions [10, 12, 13]. As a major downside, the significant cost of the radar system, including front-end and signal processing, can compromise a city-wide system deployment for a sufficient coverage with these sensors.

Top-Down Ultrasonic and Radar. In the field of radar and ultrasonic sensors, several top-down sensor solutions are available, which require the placement above a specific street lane for single-lane car profile measurements [10, 1418]. These techniques allow obtaining the complete height profile of vehicles for detection and classification but require a gantry or bridge over the street for mounting the sensors, which is infeasible in many situations. Moreover, only a single lane can be monitored with the existing solutions [10]. For the ultrasonic sensors, the processing is strongly simplified, as with the available systems, only the distance to the first relevant reflection and not the complete impulse response is evaluated. The significantly lower propagation speed and resulting decay time of the impulse responses together with the requirements for a sufficient pulse repetition rate (typically ≥10 Hz) of ultrasonic sensors can lead to self-interference and also to issues with fast-moving traffic participants [18]. For ultrasonic sensors, previous studies have shown that the impact of weather effects such as wind, snow, and rain is only marginal [19, 20].

Crowdsourcing Systems. For the second category of end-user based distributed information systems for traffic monitoring, one of the most present techniques is the crowdsourcing of traffic monitoring to end-user smartphones and navigation devices in order to obtain real-time and high-density traffic flow information [21, 22]. This approach is used in several systems and solutions, for example, by “Google Maps” with the product “Google Traffic” or by “TomTom HD Traffic.” As a central advantage, no investments in additional sensors are required and the very high prevalence of these applications on end-user devices can be exploited by actively tracking the devices. However, the live tracking of the users is highly critical in terms of privacy aspects. Furthermore, these systems are susceptible to manipulative techniques and fake information due to trust issues with the user base, which has been demonstrated before [23].

V2X Communications. With the emerging field of V2X (vehicle-to-everything) communications, which is currently focused on on V2V (vehicle-to-vehicle) and V2I (vehicle-to-infrastructure) techniques, new possibilities for traffic information acquisition and exchange arise. Together with in-vehicle information for a direct delivery of inferred routing decisions to the driver and in a fusion with available travel plans on the car navigation device, an optimized information exchange and routing is enabled. Nevertheless, these types of systems are still to be realized in mass market together with suitable application scenarios and it will take time until they are sufficiently established amongst everyday vehicles. Moreover, reservations with regard to privacy and trust are also an important aspect for these systems. In literature, it is shown that, with a sufficient prevalence of cooperative V2V techniques in cars together with a high relaying communication range, dense traffic information can be obtained [24]. Therefore, vehicular communication techniques are a promising approach for the future. Certainly, as long as the density and proportion of vehicles supporting V2X communications is too low, a diverse traffic monitoring approach, which in parallel comprises infrastructural sensing techniques, has to be pursued for several decades.

2.2. Specific Advances in Ultrasonic Traffic Detection

In the previous introductory Section 2.1, the basic top-down ultrasonic sensing technology has already been introduced. However, the practical use is limited, as additional mounting efforts are required, if no infrastructure like bridges above the street or traffic light poles at intersections are available. Moreover, only single-lane detection is possible in this regard. Therefore, these solutions are limited in their practical applicability.

There also exist initial concepts that use ultrasonic sensors mounted on a lateral street position with a so-called sidefire orientation, facing the opposite street side either horizontally or in a downtilted position [19, 25, 26]. In the case of horizontal tilt, even a first simple multilane detection has been realized. However, all these concepts perform only an analysis of the first-reflection distance instead of the whole impulse response, which highly limits the practical use in a variety of scenarios. The urban scenario is a highly reflective time-varying environment, where first reflections of other objects such as moving trees, parking cars, and the scattering of the road surface are always present in the distance range with the relevant traffic participants for detection, the distance region-of-interest for measured signal reflections of objects. In addition, the horizontally facing type of sidefire sensors could be obstructed by other objects and traffic participants in urban scenarios due to their low mounting height.

3. System Concept and Architecture

Sensor systems have specific requirements regarding the setup, scenario, and working environment they are being operated in. In this chapter, the system setup and application concept, together with the generic requirements for our proposed ultrasonic traffic sensing system, are given. The scenario stays the same as initially proposed in Section 1: a Smart City street lamp system platform. Nevertheless, a wide variety of installation and application possibilities are given also without any ties to this integration concept.

3.1. Sensor System Setup

An exemplary setup of the smart street lamp platform was already shown with Figure 1. It features a single mounted ultrasonic sensor attached to the lamp head or the pole and an embedded system integrated into a customized LED-based lamp head. The digital core system contains modules for realizing processing and communication capabilities, enabling distributed processing in a local-level group of lamps in a street by building a dynamic mesh network and also allowing communication on the global and cloud level by employing mobile network machine-to-machine type communication (M2M) functionalities. The central component for the signal processing capabilities consists of a Texas Instruments Sitara system-on-chip (SoC) including an ARM Cortex-A8 core. Typical examples of possible applications with this system are location-based content delivery, attachment of modules for environmental or traffic sensing applications, and energy efficiency optimizations of street lighting. Due to the combination of infrastructure with an extensible platform, the systems can be widely distributed amongst the city, as almost all central areas are covered by street lamps. Furthermore, many of the constraints on processing and communications, which are typically imposed for Smart City traffic monitoring systems in IoT context, can be relaxed, for example, data and refresh rate restrictions due to battery power [5].

The typical seamless integration of the ultrasonic sensing concept into an urban street area is shown in Figure 2. In the so-called downtilted sidefire setup, the sensor is mounted in a height between and 8 m, which is above the typical height of vehicles in urban areas. The sensor head is then oriented diagonally downwards the street, with a downtilt angle between and . The ultrasonic sensor transducer together with the casing has a cone-shaped beam characteristic, with an opening angle typically between and depending on the scenario and lane coverage requirements. In the exemplary scenario shown in Figure 2, the multitude of reflective elements in the urban environment already becomes apparent. For example, the first reflections and scattering of the street surface can arrive earlier than reflections from relevant objects on distant lanes to the sensor. Additionally, reflective objects can exhibit short- or long-term time variation, like parked cars or trees moving in the wind. In these cases, simple distance-measurement based sensor techniques would already fail.

3.2. Ultrasonic Sensor Platform Module

In order to provide the capabilities for ultrasonic sensing, the core system is extended with a custom sensor platform module, which is shown in Figures 3 and 4. It provides four ultrasonic sensor channels for real-time arbitrary waveform generation and recording in half-duplex mode. The reception quality of the ultrasonic impulse response is with an input stage SNR of above 105 dB in the ultrasonic band. The analog driving circuit for the transducer is integrated into the ultrasonic transducer head itself and is adapted to the capabilities and characteristics of the transducer.

Control of the attached components, signal generation, and preprocessing capabilities is realized with an integrated Xilinx Artix-7 series FPGA. It provides communication capabilities to the core processing system over an Ethernet link, which is also used for the power supply of the sensor system extension. Together with the core system, GPS including the high-precision PPS signal is used for a timing synchronization and exact coordination of multiple sensors in the vicinity of a street. Interference can therefore be coordinated by a global optimization of the sensor timing patterns if the sensor heads are transmitting in the same frequency band.

3.3. General Signal Structure

For simplicity, the signal structure in the single-sensor case is discussed. The transmit signal consists of repeated sequences of the base pulse with an overall duration , yielding a pulse repetition rate of . This is also shown in Figure 5. is typically chosen between and 150 ms to ensure low self-interference levels between the different impulse response blocks. The repetition time should in general be larger than the RT60 reflections decay time of the acoustic channel [27, p. 98]. The RT60 time is a common measure in indoor acoustics but can also be applied in outdoor scenarios, where it typically yields much shorter values. The general pulse sequence is then defined as follows:

The base pulses itself consist of two specific duplexer phases. At the beginning, the transmit duplex mode is used and a transmit pulse part of length is emitted. Then, the duplex mode is switched to receive mode and records the channel impulse response for the rest of the period without further transmission. For the investigations in this article, the transmit pulse is chosen to be a windowed signal with a carrier frequency of = 40 kHz and an envelope window with length = 2 ms as a tradeoff between time and frequency resolution for the narrowband signal. The -th base pulse is then given by the following:

With the use of identical repeated transmit pulses in our case, . A Blackman window [28] function is chosen for because it does not exhibit discontinuities at the beginning and end of the time domain window, while exhibiting a high stop-band attenuation [29].

4. Processing Concept and Algorithms

The ultrasonic traffic sensing technique presented in this paper is based on a combination of statistical analysis and clustering techniques for object candidate detection. Together with a special signal chain structure, objects representing traffic participants can be detected, together with an extraction of further characteristics and object properties. In this section, the model for the received signal is given first. Then, the characteristics of the reflective environment are modeled and empirically analyzed. Based on these findings, statistical hypothesis testing is performed during operation and the resulting statistical information is combined with a modified clustering algorithm for object boundary detection. Then, the system concept for the extraction of further characteristics is given and the reduced-complexity embedded implementation for real-time operation is discussed.

4.1. Received Signal Model and Preprocessing

With the general signal structure described in Section 3.3, repeated measurements of the acoustic channel are performed with the ultrasonic sensor system. Based on the impulse responses as the one-dimensional received signals, which consist of a large number of reflections in the acoustic environment, object detection and further analyses can be performed together with the time-of-flight distance information.

The received signal from the sensor head is recorded continuously and then sliced into blocks of length , yielding a series of impulse responses , with being the th measurement interval of length . The parameter is the time position of the specific impulse response, which is mapped to an equivalent time-of-flight distance given the speed of sound = 343 m/s in the air medium at normal conditions. Due to the round-trip, the real distance to the reflective object is half the equivalent time-of-flight distance. By acquiring information on the current air temperature via sensor measurements or weather information, can be compensated for the current conditions. During the initial period of with , which is the transmission duplex mode, no useful signal is acquired, while the region represents the main signal of interest for further analysis. The single impulse response slices are then given by the following:

The equivalent time-discrete representation iswith the sampling time and the discrete-time repetition interval length .

As a first step of preprocessing of the sliced receive signal on the FPGA, each impulse response is convolved with a bandpass filter centered at the transmit carrier frequency with a bandwidth of 6 kHz for allowing the analysis of Doppler-shifted signals with typical urban vehicle velocities. Then, a time-discrete Hilbert transform [30, 31] is performed in order to acquire the complex-valued analytic signal:together with the instantaneous amplitude or real-valued envelope signal

This envelope signal is further processed to facilitate object detection in the following. Only in case of Doppler-based motion and velocity estimation, time-frequency information such as the instantaneous phase and frequency has to be incorporated. Both its two discrete dimensions, being the index of the impulse response (from now on called time dimension, with an equivalent sampling rate of ) and being the distance-equivalent impulse response time (from now on called distance dimension, with an equivalent sampling rate ), are derived from a single original time dimension in the received signal by slicing. Therefore, the distance dimension is fixed and limited to , while the causal time dimension in real-world application is extending with . With this two-dimensional mapping, object detection algorithms become applicable which consider a vicinity in both dimensions. This means that effects of an object creating a peak at several neighboring distance and time points are now properly represented as neighbors in the 2D space, in contrast to the original one-dimensional recorded sensor signal.

An example for the envelope signal is given in Figure 6, with a normalized logarithmic color mapping of the amplitude for improved visualization. The equivalent distance and impulse response time mappings are shown on the ordinate for reference purposes. In this plot, the scattering and reflections from the street surface can already be seen in a distance range of to 6 m in a comb-like pattern. Also, around the time positions at 7 s, 14 s, 15 s, 17 s, and 19 s, strong reflections from objects passing the sensor are visible.

4.2. Signal Statistics and Standardization

Given the high complexity of the reflective patterns in the sensor data, the statistics of the signal need to be described properly in order to perform an outlier or anomaly detection, which would indicate the presence of objects in the sensor field. The goal is to incorporate all noise components (measurement noise and other acoustic emissions in the ultrasonic band) and also the typical static and time-varying reflections (e.g., moving leaves of a tree caused by wind and reflection patterns of the street) for every specific distance point of the signal into the statistics. This allows the classification of segments of newly acquired impulse responses in terms of the data following the a priori distribution, called base distribution, which was estimated in the past, or whether an outlier is present.

In contrast to typical binary decision problems, here, only this base distribution without any presence of an object can be estimated, representing the null hypothesis. The alternative hypothesis of object presence is however different for every object. This problem can be solved using parametric or nonparametric hypothesis testing techniques for properly representing the underlying base distributions of the signal points and testing the outlier probabilities. However, for the real-time implementation on the embedded system of the sensor setup, we present a simpler and less computationally complex approach that is based on the distance-wise standardization of the signal based on the calculated statistics in a predefined time window in the past. These standardization techniques are often used in the field of data science as a preprocessing step for feature scaling. The distance-wise standardized envelope signal is calculated based on the information from the statistics memory with a window length of time dimension steps in the past (equivalent to a time range of in continuous time), yielding :

Under the simplified assumption that the underlying process of (with no objects present) for a given distance is an i.i.d. Gaussian process , the specific distance ranges of will follow the standard normal distribution The assumption is especially found to be valid for direct reflections in the closer range, shown in studies from other acoustic environments like underwater [32]. As a general approximation, it will later be used for approximatively calculating the initial detection thresholds for a given false alarm rate.

For the object detection, only the right tail of the distribution is of interest for outlier detection, as shadowing effects are not considered. Therefore, the complete standardized distribution data is clipped for values below zero, resulting in the final envelope with statistical standardization:

This yields a variable with a standard normal distribution left-censored at . The resulting probability distribution function (PDF) can be written as follows: and the cumulative distribution function (CDF) iswhere is the Gauss error function. The first terms of and are similar to a scaled half-normal distribution [33], with the addition of the clipped values being statistically censored and not truncated.

To summarize, is now used for all further object detection and processing together with the clustering techniques in the following. Simply put, it yields the positive outlier level of a newly acquired data point by the distance to the calculated mean in a measure of the multiple of standard deviations. This is based on the statistical history in a specific time interval and calculated separately for each discrete distance point. By having a limited time window for the statistics, the sensor system can adapt to changing environments without needs for recalibration. Optionally, for achieving a better robustness and generalization, these distance-wise statistics can be blurred with neighboring distance points.

4.3. Object Boundary Detection with Density-Based Statistical Clustering (DBStaC)

As a next step, the standardized data given in the previous section is used for object detection. The two-dimensional data is passed through a modified density-based clustering algorithm based on the original DBSCAN (Density-Based Spatial Clustering of Applications with Noise [34]), which is able to perform clustering on noisy data by aggregating core points of data peaks. In our case, these data peaks can be the result of statistical outliers due to physical objects passing by the sensor, in comparison with the base distribution of the signal when no objects are present. Our proposed modified algorithm in combination with the input preprocessing steps from (7) and (9), called density-based statistical clustering (DBStaC), can be used on both statistical hypothesis test results, and in our case of a simplified algorithm, data standardized based on the a priori statistics.

4.3.1. Choice of Clustering Algorithm

The choice of a suitable clustering algorithm from the field of unsupervised learning techniques for our application was subject to several requirements. While many clustering algorithms can aggregate nearby points in a space with arbitrary dimensionality and sparsely distributed data points, it is required here to incorporate the weight of data points, coming from our the nonsparse standardized and clipped data field described before in Section 4.2.

Furthermore, the algorithm should be able to detect important data peaks based on a local density, while, for the whole cluster, no penalty for an infinite extension in the time dimension should be given. This is necessary because the dwell time of an object in the sensor range is arbitrary, especially due to different movement speeds of traffic participants or even in a traffic jam situation.

As a third requirement, an anisotropy of the cluster properties and shape is desired, as we have two dimensions that are time (index of different impulse responses) and distance (specific point in the impulse response itself), which have highly different characteristics. A typical scenario where anisotropy would not be required is 2D image processing, where both image dimensions have similar properties.

4.3.2. Original Definition of DBSCAN

First, we want to provide a basic understanding of the original DBSCAN algorithm in general and the specific terms coming with the algorithmic definition. Then, we can introduce the modifications for our specific application. Readers interested in the complete formal description are referred to [34, 35], on which the following original definitions are based.

Core Point. The general principle of DBSCAN is the analysis of a set of points in a -dimensional space. Then, for any given point , the -neighborhood of this point is evaluated, meaning that all points with a distance belong to the neighborhood of , denoted . The distance metric is typically chosen to be the or norm, and is preset. This results in the score of a point being the number of other points falling into the -neighborhood. Then, the point is called core point if , with being a preset threshold for core point detection. All points in the -neighborhood of a core point are part of a cluster set , and they are called border points if they are not core points themselves.

Direct Density-Reachability. A point is directly density-reachable from a point if is a core point and is in the -neighborhood of . This property is symmetric if and are a pair of core points [34].

Density-Reachability. A point is density-reachable from a point if a chain of points with , exists where every point is directly density-reachable from [34].

Density-Connectivity. A point is density-connected to another point if there is a point such that both and are density-reachable from [34].

Cluster Definition. Given the set of all points , a cluster is a nonempty subset of which satisfies the following conditions [34]:(1): if and is density-reachable from , then (maximality).(2): is density-connected to (connectivity).

Any cluster is then uniquely determined by its core points.

4.3.3. Modified Core Point Definition for DBStaC

In our special case, the algorithm is modified to suit the needs for clustering our two-dimensional space; therefore in order to represent the time and distance dimension of our acquired data. Furthermore, our data points are weighted by their standardized value. Therefore, for a point , the weight is assigned, which is not an option in the classic DBSCAN algorithm. Instead of using the metric for determining the -neighborhood, anisotropy is introduced by choosing the -neighborhood as a rectangular area with dimension , symmetrically placed around the position of the core point, with and given. Then, all points with belong to the -neighborhood of and is a core point if

With this definition, all data points of our two-dimensional data set are now taken into account, not just by their count, but by the sum of their assigned weights in the specific -neighborhood. The implementation of the final clustering is omitted at this point and widely described in literature [36]. With the original algorithm implemented, adaptions only have to be made in the core point definition and neighborhood metrics.

4.3.4. False Alarm Rate

In the clustering process, every element of is tested for the core point property, requiring both compensation for multiple comparisons and an approximation of the false alarm rate for a single point itself. For the latter, we will provide measures for determining the false alarm rate under the approximative distribution assumptions from Section 4.2. With the threshold decision from (12) for a core point, given and the -neighborhood withpoints in the rectangular region, we want to calculate the false alarm probability:where is the event that no relevant object reflection yielding a core point is present, meaning that all summands follow the base distribution without outliers. is the sum of weights in the -neighborhood:

In Section 4.2, we showed that under the assumption that the original envelope signal is coming from a process with normal distribution, is standard-normally distributed after standardization, and follows a left-censored standard normal distribution. If the data points and the resulting are also hypothesized to be statistically independent and exhibit the identical PDF from (10), we can write for the resulting PDF of the summation of points in (15):Please note that the statistical independence of all is not given in the first place because the same data points in the two-dimensional field are affected by the sliding window in several and positions. Then, adjacent are correlated and a single peak in could yield multiple peaks in resulting in core points. However, as we seek to calculate the false alarm rate in general for this specific algorithm, adjacent core points would be clustered into a single cluster, raising only false alarm for a single object if .

Finally, neither for the left-censored standard normal distribution nor for the related half-normal distribution in general, the result of -fold convolution is known or given in literature in closed form for . Therefore, the final calculation is based on Monte Carlo simulations of these random processes. For the false alarm probability analysis, we are interested in the survival function (SF) , where is the CDF of . With , the survival function yields the false alarm probability per data point. The results from the Monte Carlo simulations for the survival function are shown in Figure 7 and were also verified with an additional false alarm Monte Carlo simulation.

4.3.5. Choice of Clustering Neighborhood

The neighborhood choice in the time direction, , affects the detection robustness and separability of vehicles with length following each other in a gap distance , with speed and passing the sensor lobe which approximately covers a section of length on the specific lane. Given the pulse repetition rate , no vehicle is covered by the sensors during the gap for pulse times:Also, considering the detection of a single vehicle, the vehicle is sampled in the lobe for pulses:Given , the full neighborhood extension in time direction is pulses. Therefore, is recommended for a good separation of the vehicle clusters in time direction in order to have no overlapping core points. Typical values of the time neighborhood are . With an urban area example of , = 4.5 m, = 40 km/h, and = 1.5 m, the recommended gap length becomes ≥ 4.83 m and a vehicle is covered by pulse times. The second neighborhood parameter in distance direction is mostly relevant when it comes to multilane separation of multiple vehicles. In general, our approach in this paper is to learn the typical vehicle speed and distance profile without a direct calculation, but by learning the scenario for a short training period (e.g., one hour) and then setting the neighborhood parameters in a parameter optimization, which is described in the following section.

4.4. Signal Processing Concept

In Figure 8, the previously described statistical object detection is integrated into the complete object detection concept with feature extraction and inference. It allows detecting candidates for detected objects representing traffic participants, which are then used to gather further information from several signal processing paths. The preprocessing and main processing stages are also divided in hardware into the sensor platform FPGA and the Sitara SoC, respectively.

4.4.1. Preprocessing

In contrast to the sensor platform structure described previously in Section 3, here, the signal flow and processing are in focus. As can be seen in the left part of Figure 8, the sensor platform is configured by the main system on the Sitara SoC and performs the whole signal transmission and reception. The preprocessing steps of filtering, Hilbert transformation, and possible sample rate adjustments are performed using a dedicated architecture on the FPGA, before slicing the blocks from the different sensors into single impulse responses, which are then transmitted to the main processing system with the Sitara SoC.

4.4.2. Object Detection and Extraction

On the right side of Figure 8, the complete object detection and feature extraction concept is shown. It is based on four main information paths going into the cluster analysis and feature extraction block at the end, whose output is then used for inference of detected traffic participants. By performing a fusion of these four paths, all necessary information for the analysis can be provided. The different components of the processing system are described in the following.

Statistical Object Detection with Resulting Object Candidate Boundaries. The basic principle of object clustering on the standardized data for boundary detection has already been described before in Sections 4.2 and 4.3. The only addition is an optional nonlinearity after standardization or the optional hypothesis testing (which is not considered here). The nonlinearity can be modified to perform further transformations on the standardized data. With the DBStaC algorithm, the envelope signal at the beginning of the statistical object detection block is analyzed by the statistics module together with its statistics memory. Only a predetermined amount of past data can be stored in this fixed-size memory for calculating the base statistics. The boundaries of resulting clusters of the algorithm are taken as boundaries for further analyses in the time-distance-field of the different data paths. Inside each cluster’s area which was determined by the object boundary information path, the cluster analysis stage performs an analysis of the statistical, envelope, and time-frequency-information path.

Statistical Feedback. The statistical feedback path coming from the inference stage is important to keep the base distribution in the statistics memory accurate to be mostly restricted to data without objects present. This is an optional feature and especially helpful with high fluctuations of traffic density in front of the sensor, as the detected objects are removed from the statistics memory and the outlier analysis of the DBStaC-based object detection is more accurate and robust against drift.

Statistical Information and Envelope Information. The standardized data and the original envelope signal are directly fed into the cluster analysis stage. For feature extraction and accurate information on the object reflection characteristics, properties like the accurate position of the object’s first reflection and the mathematical two-dimensional center of mass of the signal or the geometry are analyzed. Especially with a priori geometric information on the system setup in a specific scenario, effects caused by the beam width of the transducer can be exploited. Even if a car moves perfectly sideways through the cone-shaped beam of the sensor, it forms a typical arch-shaped pattern, which could already be seen in the initial example signal, Figure 6. This arc-shaped reflection pattern is also known from marine sonar systems as “fish arc” and caused by the fact that at the point in time when a vehicle first enters the beam, the reflection path is diagonal and is not fully straight until the car is directly in front of the sensor. This can be used to support the information on vehicle size and type.

Time-Frequency Information. The time-frequency information is mostly irrelevant as long as the sensor is oriented perpendicular to the vehicle movement on the street. However, in case that the sensor is oriented partially sideways or a second sensor with such orientation is available, velocity information can be gathered by analyzing the instantaneous frequency of the signal for Doppler shift estimation.

Inference Stage. The inference stage performs the final decision whether the cluster detected in the DBStaC stage originated from a traffic participant in the sensor range and allows the estimation of further object parameters and a simple classification of the vehicle type using a supervised learning stage with a Support Vector Machine. As our analyses in this article are focused on the DBStaC object candidate detection algorithm itself, no further rejection of objects is performed by the inference stage in this case. Only objects detected outside of the specific lanes, for example, on the sidewalk, are rejected.

4.5. Implementation Details

The main parts of the processing chain shown on the right side of Figure 8 were implemented in C++, including optimized versions for embedded system operations, while parts of the postprocessing and learning techniques as part of the cluster analysis and inference stages were implemented in Python. The core processing module can be used for both algorithmic and performance analysis purposes on general Linux/BSD systems and the embedded Linux system running on the ARM Cortex A8 as part of the Sitara SoC. For algorithmic evaluation and parameter optimization in the next section, capabilities of the core module will also be used as part of an evaluation system. The numerical precision requirements for embedded system operation were relaxed from 64-bit floating point to 32-bit floating point, largely without compromising performance, in order to exploit the NEON floating point accelerator of the Sitara SoC. For the processing chain with the DBStaC algorithm, real-time operation is then possible for two attached ultrasonic sensors. This is the case for typical parameter sets, as the algorithmic runtime of DBSCAN is not fixed and is largely dependent on the choice of parameters (size of -neighborhood and decision thresholds) and traffic situation yielding different cluster sizes. Edge cases like extremely large clusters, for example, due to a traffic jam situation with very long dwell times of an object in the sensor range, have to be taken care of in the practical implementations.

For the sensor platform with the FPGA shown on the left side of Figure 8, dedicated accelerators for filtering and processing are used together with a MicroBlaze soft core for the coordination of waveform transmission and data acquisition. The configuration is controlled by the main processing system side with the Sitara SoC. A timing reference signal is acquired using GPS together with the high-precision PPS reference, which allows global synchronization of multiple sensors attached to multiple platforms, with an optimization of the transmission and interference patterns. Exchange of resulting information, for example, for sensor fusion and distributed processing for trajectory calculation of objects passing by multiple systems, is possible on both the level of a local wireless meshing between the systems and a global communication using cellular networks.

5. Evaluation Scenarios and Methodology

Evaluation of the complete system concept for traffic participant detection requires analyses both in real-world scenarios and in isolated conditions with synthetic scenarios, such as on automotive test tracks. A variety of European-wide test measurements have been performed with the system, described in Section 5.2, which will be analyzed and discussed in Section 6. For the reference data acquisition, a video camera is running in parallel which is afterwards used to label, annotate, and classify traffic participants on the time line. These labels can then be utilized to evaluate the performance in terms of several metrics for quantifying detection performance, parameter estimation, and classification. In general, it is desirable to have specific correctness information for every object instead of just taking the overall numbers of detected objects for the analysis, which is supported by an exact pairing of detected objects and reference data in the following. Then, parameter sets for the system are optimized with regard to different performance metrics using a central (hyper)parameter optimizer as part of an evaluation framework, which is described in the following. This process can also be deployed to a high-performance cluster (HPC).

5.1. Parameter Space Exploration and Optimization Methodology

In the following, the framework for parameter space exploration of the whole system with its corresponding algorithms is specified. The basic structure is shown in Figure 9.

Parameter Set Evaluation Concept. As a main principle, (hyper)parameter sets are given to a worker shown in the central block in the diagram. Each worker then evaluates a single parameter set (deployed with the Coordination subsystem) for real-world recorded sensor data using the C++-based main processing chain and Python-based postprocessing as part of the Processing subsystem. The processing principles were described in the implementation description of this article in Section 4.5. Then, in the Evaluation subsystem, the objects resulting from the processing stage are taken together with data from an object reference database. This allows the analysis of several performance metrics, such as the detection score and the correct lane assignment, which yields a performance score or optimization loss as a quality indicator of the parameter set. This is then reported back to the Coordination subsystem of the worker.

Reference Database. On the left side of the diagram, the sensor and reference data are shown. Reference object data is stored in a mySQL database. A reference video is recorded in parallel to a sensor data recording. This video is then manually processed using a labeling tool which is also shown in the diagram. The tool allows labeling traffic participant objects on multiple lanes in a timeline view and adding annotations to these objects such as the vehicle type, vehicle size, movement direction, and velocity, which is stored in the database. By analyzing the raw video material itself, the tool also provides the activity level in the video material which supports finding the next vehicle passing by in less frequented time ranges.

Cluster-Label-Pairing for Scoring. Both the resulting clusters from the processing of the raw sensor data and the objects in the reference database are events extended in the time dimension, with a specific start and end time. In the detection performance analysis of the system, each cluster (in case of perfect detection) would be paired with the corresponding reference label object. As the exact cluster-label-pair assignment is not the case in a real scenario, there can be ambiguities or false-positive/false-negative cases. Therefore, a bipartite cluster-label-graph is built up from the cluster and reference data, with edges between the nodes of specific cluster-label candidate pairs. Then, a maximum cardinality bipartite matching is performed. With the resulting cluster-label-pairs and the known sets of all reference and cluster objects, all scorings such as precision, recall, accuracy, and -score can be calculated and reported back for the specific parameter set.

Coordination of (Hyper)parameter Sets. Given the possibility of using a worker process to evaluate parameter sets with the described procedure, the exploration of the (hyper)parameter space needs to be coordinated and optimized. This is achieved by using the hyperopt Python package [37], which allows the definition of several (hyper)parameters with their according distributions and properties and then to automatically explore this space using simulated annealing, random search, and tree of Parzen estimator techniques for optimization. The parameter sets and their according loss results are stored in a MongoDB database. New parameter sets are integrated into a base configuration file which is then assigned to the worker process. In order to speed up the optimization and training procedure, a large number of workers are deployed on an HPC. Typically, good convergence is achieved after 10,000 optimization iterations. If a faster convergence is desired, starting points for the parameters can be calculated with the investigations done in Section 4.3. Typical (hyper)parameters, which are part of the optimization process, are as follows:(i)Dimensions of -environment for the DBStaC algorithm and the core point threshold level (see Section 4.3.3)(ii)Properties of the statistics memory, such as the statistics memory size (past information, see Section 4.4.2) and a storage subsampling factor(iii)Optional use of statistical feedback (see Section 4.4.2) in the scenario for base distribution stabilization(iv)Optional clipping threshold for calculated statistical norms in standardization to stabilize against outliers(v)Optional use of matched filtering for the received envelope signal

5.2. Test Scenarios

In Table 2, the different test scenarios are given with their characteristics and description, covering a variety of urban environments and synthetic measurements on an automotive test track. The scenarios are identified by a specific name and will be referenced in the following, when performance results are given.

6. Results and Discussion

Results of the field tests and evaluations are given in this chapter. First, the according performance metrics are introduced and specified, followed by the results for the different scenarios and finally a discussion of the results.

6.1. Evaluation Metrics

In this paper, the main focus lies on the detection of traffic participants as objects and their correct assignment to the lanes of the street. For describing performance in our object detection scenario, no calculation of the true-negative count is possible, as we can have an arbitrary number of objects in our data timeline. Therefore, the widely used score is chosen as the main performance metric for detection, being the harmonic mean of precision and recall. The precision is defined as the number of true positives () divided by all positive outcomes:In contrast, the recall is defined as the number of true positives divided by the sum of true positive and false-negative () outcomes:Then, the score is defined as follows:A further advantage is that the combination of precision and recall into the score allows being independent of requirements biases, which can be a focus on either precision or recall performance, for example, when more false positives () or negatives are acceptable in a specific scenario. score calculation only requires the information of true positives, false positives, and false negatives. These numbers are calculated based on the bipartite cluster-label-graph assignment introduced in Section 5.1.

The second important metric is the assignment of objects to the correct lanes for determining the direction of the traffic flow. With multiple lanes as a multiclass problem, the score cannot be used directly. As an alternative, the weighted score is used, which first calculates the score for every lane (correct or incorrect assignment to specific lane) and then calculates a weighted mean of all lane scores, with the number of true reference object instances as the weight.

6.2. Discussion of Test Results

The performance evaluation results for the different scenario with optimized parameters are shown in Table 3. Both detection and lane assignment scores are always larger than , being a very good performance level in real scenarios and satisfying the initially stated high requirements on traffic information quality. The single-sensor ultrasonic traffic participant detection is therefore possible with high performance without exploiting any directional information of the signal. This is facilitated by the algorithms proposed in this paper, which solved several prevailing problems in ultrasonic sensing techniques.

The multilane assignment performance is also very high and of special importance. As no directional or velocity information is available with these sensors, the known information of a fixed lane’s driving direction yields the final traffic flow information with direction. A possible problem is showing up in the scenario Urban1 with the lowest weighted score for assignment of : people were driving in the middle of the two-lane street several times, which compromised lane assignment performance.

Further future long-term analyses are required to analyze the long-term stability in scenarios with a highly fluctuating traffic flow and day-and-night cycle. However, with the system structure which relies on the standardized data together with the statistics memory, the sensitivity to long-term sensor drift and also production variations is limited. Further investigation is also required on the impact of the statistical feedback for stabilization of the statistical base information. As the parameter optimization is based on only a few parameters, the susceptibility to overfitting effects is limited in the shown results. Still, for future investigations, also the test performance with regard to future measurements in the same scenario is of high interest.

7. Conclusion and Future Work

In this paper, it was shown that sidefire ultrasonic sensing is a viable option for multilane traffic participant sensing, giving very good performance results in real-world urban scenarios with a single sensor. The functionality is enabled by a novel combination of standardization techniques based on windowed statistics and a modified density-based clustering algorithm, together called DBStaC. Several evaluations were performed, both in real urban environments and for special cases on an automotive test track. The proposed system is integrated into an evaluation and parameter optimization methodology, whose reference system is flexible to be extended with further object characteristics in future.

The ultrasonic sensor system deployed together with the street lamp platform was presented to be a Smart City technology suitable for high-coverage deployment in cities, enabling traffic monitoring and further applications. With this platform, distributed processing and sensor fusion can expand the possibilities of using available traffic participant information even more, for example, for future applications like trajectory prediction. Further future work can be the use of ultrasonic sensor arrays for acquiring directional and velocity information. In the field of algorithmic improvements, the investigation of more advanced hypothesis testing techniques and channel statistics analyses could yield further performance improvements. Also, the comparison of the presented algorithmic concept with state-of-the art supervised machine learning algorithms such as recurrent neural networks is planned.

Conflicts of Interest

The authors declare that they have no conflicts of interest.