Abstract

This work aims at classifying the road condition with data mining methods using simple acceleration sensors and gyroscopes installed in vehicles. Two classifiers are developed with a support vector machine (SVM) to distinguish between different types of road surfaces, such as asphalt and concrete, and obstacles, such as potholes or railway crossings. From the sensor signals, frequency-based features are extracted, evaluated automatically with MANOVA. The selected features and their meaning to predict the classes are discussed. The best features are used for designing the classifiers. Finally, the methods, which are developed and applied in this work, are implemented in a Matlab toolbox with a graphical user interface. The toolbox visualizes the classification results on maps, thus enabling manual verification of the results. The accuracy of the cross-validation of classifying obstacles yields 81.0% on average and of classifying road material 96.1% on average. The results are discussed on a comprehensive exemplary data set.

1. Motivation

In 2006, bad conditions of road infrastructure were one of the causes of 50% of fatal accidents in France [1]. In 2016, four accidents in Germany were caused exclusively by road surface damage [2]. Road traffic authorities are aimed to improve and automate monitoring the road state to detect and repair road damages to enhance the safety of road traffic. Based on the detection results, specific and cost-optimized maintenance of roads can be ensured. Furthermore, suppliers of navigation systems can profit from the available information of the road state, because roads in bad condition may be neglected in route planning [3]. Automotive manufacturers can use the collected data to control adaptive vehicle suspensions and to display warnings in real time [4].

Contrary to physical modeling, data-based estimation of the road state does not require any comprehensive system characterization, such as vehicle, road, sensor, and environment. Moreover, modeling of a full vehicle requires five acceleration sensors or gyroscopes to measure vertical accelerations of unsprung masses and accelerations and rotations of the vehicle body [5]. To monitor road sections, a vehicle can be used as a mobile sensor platform that records both vehicle dynamics and the environment, such as the road state. The road state can be estimated using cameras or inertial measurement units to record rotation speeds and accelerations of the vehicle. Such sensors are already integrated into modern vehicles having an active or adaptive body control or new lighting systems. Inertial sensors are even part of the standard equipment of new vehicles and data can be fused with GPS data for more accurate positioning, an example being the new Audi A7. Previous studies revealed that measurements made by these sensors allow for the derivation of road features, such as potholes or mends of asphalt roads [68].

The inertial sensor is inexpensive and part of the standard equipment of many vehicles. Its data include information of minor unevenness of road surfaces that causes the vehicle to vibrate. Inertial sensors, however, only provide data on the road section just crossed. Cameras, by contrast, record the complete road section in front of the vehicle, including the neighboring lane. However, they are integrated into high-class vehicles only. Cameras currently used in vehicles are of limited accuracy and can detect potholes with a minimum depth of about 3 cm only.

Presently, the state of motorways is measured automatically using expensive and complex measurement vehicles, while that of roads in urban and rural areas is determined manually [9]. These methods are associated with a high expenditure. Due to manual evaluation, it takes a long time until the road network quality is updated. Safety-relevant damage may be detected too late. This may have severe consequences, such as traffic accidents or cost-intensive and complete renewal of the road.

For road maintenance, some countries determine the stochastic road profile depth or international roughness index (IRI), as outlined in [10]. However, the latter is often calculated for 100 m intervals only. As a result, certain obstacles, such as potholes, are not detected. In countries pursuing a systematic road maintenance scheme, not only the IRI but also individual obstacles are measured. This also is the objective of the present study.

Approaches to automatic road state monitoring using inertial sensors exist, e.g., [6, 1113]. They only concentrate on single road features (such as potholes), do not have any representative dataset, or are based on data measured under restricted conditions, e.g., in speed limit areas or on certain sections only. Moreover, the validation phase only covers checks as to whether the road damage detected actually is damage or not (true or false positives), but not whether road damage was overseen (false negatives).

Road construction offices also need to know the material (road surface), as repairs on different surfaces produce different results and may cause different types of damage [14]. It is also important to distinguish between safety-relevant damage that has to be repaired within 24 hours and damage that is not relevant to safety and the repair of which can be planned and postponed.

The main contribution of this paper is to evaluate the principle feasibility of automatic road surface and road damage measurement with an inertial sensor in the vehicle body. Therefore, this work is aimed at(i)designing a processing chain to evaluate road data based on measurements of inertial sensors,(ii)automatically recording an adequate dataset,(iii)developing and evaluating a method to estimate road surfaces and damage, and at(iv)integrating the algorithms developed into a graphic user interface for evaluation of datasets with alternative parameterizations by nonexperts as well.

The methodology will be presented in Section 2. Section 3 will outline the implementation derived, while Section 4 will explain the results based on a first dataset. The result, its applicability, and open problems will be discussed in Section 5.

2. Methods

2.1. Design

Figure 1 presents an overview of the method to evaluate the road state [15]. In a first step, the road state is to be measured by suitable sensors. For measurement, acoustic sensors, such as the sensors described in [16], acceleration sensors and gyroscopes, cameras, and similar devices, can be used. As a result, several synchronized time series will be obtained. To obtain a representative reference data set, sensor data have to cover a maximum of framework conditions, e.g., variations of external temperature, driver, and speed. Every point of time/road section has to be assigned a label, e.g. type of road surface, simultaneously or afterwards. In this way, a data set with correct allocations of sensor data to labels is obtained (ground truth). By means of data mining, models can be designed (offline) for retrospective evaluation (offline) or classification during driving operation (online). The results of the classification models then have to be visualized and evaluated on the basis of map material. To estimate the information on the road surface and event or damage that is of relevance to road construction offices, two separate classification routines have to be developed.

2.2. Data Acquisition

The data measured by the sensors installed in the vehicle, e.g., GPS and inertial sensors, are encoded on the CAN bus and cannot be read without the communication matrix that is available to the control system developer and automotive manufacturer only. Hence, an inexpensive measurement system similar to the inertial sensor incorporated in the vehicle is proposed for the easy measurement and readout of data. Measurements cover the position and dynamics of the vehicle, in particular vertical dynamics caused by unevenness [17]. In addition, the data may be assigned labels during measurement already. The measurement system (Figure 2) mainly consists of a GPS receiver (Adafruit ultimate GPS Hat) and a MEMS inertial sensor (LSM9DS1) measuring accelerations and rotation rates of the vehicle along all three axes. The sensor data are acquired using a Raspberry Pi and stored as a csv-table in fused form. As soon as the engine of the vehicle is turned off, the UPS is activated and data can be transmitted via WiFi to a central data base, if the Raspberry Pi is connected to a known WiFi network.

The GPS receiver has a sample rate of 10 Hz, a position resolution of 3 m, and a speed resolution of 0.1 m/s. As the inertial sensor is a low-cost MEMS sensor, the sample rate is not uniform. This has to be compensated by a filter in data processing. The sample rate is about 220 Hz. The accuracy of the acceleration sensor is 0.05 m/ and of the gyroscope 0.003 degree/s. Without data transfer, a 32 GB memory card can record data for up to 1000 h.

For allocating labels to data, different approaches are presented in literature. For example, a microphone records the (road) damage report of the passenger [13]. This method, however, is subject to several deficiencies. Among others, the labels are recorded much later than the actual road damage and the soundtrack is not synchronized with the sensor data. Reference [6] uses “loosely labeled” training data. Here, only the number of classes but not the exact position is recorded for large road segments.

The measurement system developed for this study is based on two buttons. A pressed button annotates the damage class (event) or the change of road surface (material). For every measurement drive, data with a certain material (e.g., if asphalt 1, otherwise 0) and an event (e.g., if pothole 1, otherwise 0) are recorded. After the measurement drives, binary coding of the data of the respective files is transformed into the coding given in Table 1.

The unix time , ID for the sensor, speed , position and time stamp of the GPS , , and , accelerations and rotation rates along all three axes, and the two labels for the event and material are recorded and stored in a csv-table on the measurement system (Figure 2). The measurement system is installed near the static center of mass of a BMW 116d in the console between the driver’s seat and passenger’s seat. Orientation of the sensor axes corresponds to the vehicle axes according to ISO 8855:2011. For the method to be generally applicable, measurement data are recorded on randomly selected roads in the region of Karlsruhe, Germany. The speed, road condition, and environmental conditions (e.g., measurement drives in good and rainy weather) are varied strongly. In total, reference data are recorded for a period of three months on a distance of more than 200 km. The data are recorded on eight days (three times a whole day) by three different drivers. Acquisition of reference data is a time-consuming process, as the materials and events have to be crossed under variable environmental conditions and at variable speeds. In particular, individual events, such as potholes, of various types have to be found in the road network and crossed several times with variable approach angles and vehicle tracks.

Lacking GPS data due to variable scanning rates are reconstructed by linear interpolation. As the measurement series are not recorded at a constant sample rate, resampling is required. By resampling, the data are converted from the time domain () to the space domain (). In [18, 19], it was shown that the response of the vehicle to the excitation of the road depends on speed and that presentation in the space domain reduces this effect. All-time series are resampled with a (spatial) frequency of 100 . The section driven is calculated from the time stamp and speed with the help of the implicit Euler method. Calculation via GPS would also be possible but lacks precision.

The classes of materials and events are encoded by natural numbers (Table 1). Light damages are general types of unevenness, which are not safety-relevant and include minor faults and repairs. Manhole cover, railway crossing, and speed bump are construction obstacles. A speed bump is defined as an elevated construction transverse to the driving direction. The pothole represents a fault of at least 2 cm in depth. The latter event is safety-relevant and should be repaired within a maximum of 24 h. For every sample point of the reference data set, two labels are annotated for the material and event .

2.3. Signal Processing
2.3.1. Overview

To derive information on the road surface or material and event/damage from the sensor data recorded, the data streams first have to be transferred to a feature space. Feature extraction calculates representative and useful individual features from complete or partial measurement series. Without knowing the physical model of effects of asphalt changes or road damage on the sensor, it is recommended to calculate a large set of features and to check their suitability for the classification problem based on data with the corresponding labels (ground truth). Efficient feature calculation is needed for calculation on mobile devices (e.g., microcontrollers).

2.3.2. Generation of New Time Series

To describe the road state, acceleration in vertical direction () and rotation speeds in longitudinal and transverse direction ( and ) are very important [6]. Furthermore, the roll and pitch acceleration ( and ) as well as the jerk () of the vehicle are done using the derivation of the vertical acceleration in time domain. The space series data of the vertical roll and pitch acceleration is transformed into frequency domain with the short-time Fourier transform, which contains the short-term distance-localized frequency content of the signal. Hereby, features based on specific frequency bands can be investigated. Hence, the three distance series data are extended by the following data streams, which lead us to 7 data streams in total:(i)vertical acceleration,(ii)roll acceleration,(iii)pitch acceleration,(iv)deviation of vertical acceleration,(v)short-time Fourier transformed vertical acceleration,(vi)short-time Fourier transformed pitch acceleration and(vii)short-time Fourier transformed roll acceleration.

2.3.3. Feature Extraction

The features are calculated for windows with a specific length in distance domain and a specific overlap. A window denotes all indexes with the running index , window index , and window length . The window overlap corresponds to those values from the window that are contained in the previous window , i.e., . If a longer distance is chosen, short amplitudes, for example, due to potholes, have a weaker impact on the value of features, which incorporate the overall signal, such as the standard deviation. These short amplitudes can be captured by shortening the window size or using features, which calculate extrema.

For the feature extraction for material and events we use window sized of 50 m or 5 m, respectively, and an overlap of 20%.

From the distance series data, we calculate the standard deviation as well as peak-to-peak. The root mean square value or effective value for specific frequencies and the spectral centroid is extracted from the short-time Fourier transformed data streams for the following spatial frequency bands (1 ):

The vehicle velocity has a strong sensitivity on the vehicle vibration. Previous research suggests performing a linear regression with each feature as the dependent variable and the velocity as the independent variable [12]. The velocity dependency is then removed by subtracting the estimated linear equation from the corresponding feature. However, the vehicle vibration and the extracted features are not linear dependent on the velocity. The dependent parameters are incorporated and the mean velocity is calculated for each window as additional feature. To allow nonlinear relationships a kernel function of higher order can be applied for the classification.

Of the GPS latitude and longitude time series, the medians in every window are used for later visualization.

2.3.4. Classification

Based on the extracted individual features and the corresponding labels, two classifiers are designed for material and event. For the design and application of classification, a combination of feature selection, feature aggregation, and classifier is chosen.

For the surface classification, the five best individual features each are determined using the multivariate analysis of variances (MANOVA) method, for event classification the ten best features are selected. For visualization purposes, the selected individual features are then aggregated to two features using linear discriminant analysis (DA), which can also minimize the calculation expenditure. A support vector machine (SVM) classifier with polynomial kernel function with order 2 is used. Validation is carried out with the help of cross-validation with 5-folds.

2.3.5. Performance Measures

From the correct and false predicted instances, we can calculate a confusion matrix for classes . In the confusion matrix, presents the true positives for class . The other elements in column are called false negatives, in row false positives and in the diagonal true negatives.

From the confusion matrix, one can calculate multiple performance measures to evaluate the model, such as recall with for class , the overall accuracy of the classifier with , or the precision . The precision presents the fraction of retrieved instances that are relevant and can be seen as the probability of the classifier to predict class as class for . An overview for performance measures for different calculation problems can be found in [20].

3. Implementation

To facilitate operation by non-experts, the methods are implemented in a graphical user interface called Vehicle Learner Toolbox, which is available in [21]. It is based on MATLAB and implements several machine learning operations of the freely available toolbox SciXMiner [22] (formerly, Gait-CAD [23]). The Vehicle Learner Toolbox provides the possibility to(i)import vehicle sensor data in different file formats,(ii)compress the imported data and automatically extract various features,(iii)train a classifier model with a wide-ranging set of options,(iv)test the trained classifier with a test set,(v)visualize the results with the help of plots and maps.

A project folder can be selected and sensor data can be imported in the corresponding frame Data (Figure 3). There is the option to assign the sensor data to specific vehicles, since they vary in suspensions, damping, and other parameters, which have an impact on the vibration behaviour. Therefore, in the following data processing, feature selection and classification can be performed for data from specific vehicles. The import allows  .csv and  .xlsx file format with the following column headers:(i)timestamp (unix timestamp)(ii)x-, y-, z-accel (the acceleration values in each direction)(iii)x-, y-, z-gyro (the gyroscope values in each direction)(iv)gps-timestamp (format: YYYY-MM-DDThh:mm:ss,000Z)(v)lat, lon (position in latitude and longitude)(vi)speed (in m/s)(vii)m, e (material and event labeling, if the data is not labeled, these columns should only contain zeros).

Since the GPS data is acquired with a lower sample rate compared to the inertial sensor, these data are automatic interpolated. Furthermore, the sensor signals are subject to noise [24] and are automatic smoothed during the import process with the following filters. Despite the noise of a MEMS gyroscope visible as spikes in the signal, it is well known for its good accuracy in short term [25]. A suitable filter for this purpose is the median filter, which is robust against outliers and removes noise while preserving high frequency content. Since the data from the MEMS accelerometer do not show such spikes but contains more noise in the short term [25], a Savitzky-Golay FIR smoothing filter is applied. It fits a polynomial of a specified degree to frames of noisy data and minimizes the least-squares error [26]. Therefore, the filter outperforms standard averaging FIR filters, which might remove high frequency content with the noise.

There is also the possibility to import tire cavity sound data along the inertial sensor data for road roughness estimation, as presented in [16, 27], but is not substance in this paper. Moreover, the imported data set can be categorized as training, testing or unlabeled data.

Furthermore, the parameters for the window profile, such as length of road segments and overlapping factor of these windows, can be determined, as well as the resampling frequency. The standard window profiles are material with a window length of 50 m and event with a window length of 5 m.

After the import and preprocess of the data, new time series data are calculated and features are automatically extracted, as proposed in Sections 2.3.2 and 2.3.3. The code to calculate new data series or features can be easily added in the corresponding MATLAB function.

The proposed data mining methods (Section 2.3.4) can be applied in the toolbox under the menu Supervised Learning (Figure 4). In the first step, a training data set must be generated. There are two different ways to accomplish this. Either an external data set containing features can be imported or the imported data within the toolbox can be used and modified by choosing the time interval of the data acquisitions or the area. Furthermore, data annotated with specific labels can be excluded from the classification. For our example, all data with labels 0-unknown were deleted (Table 1). Another option is to thin out classes with significant more data points than other classes to allow an approximately uniform distribution of data points among the classes to prevent over-fitting of specific classes. Furthermore, systematic errors during labeling the data can be removed; e.g., if the trigger to annotate the data was activated too early or too late the annotation can be moved or data points with the wrong annotation can be excluded. After the generation of the data set to be processed, the settings for the classifier can be determined under the tab Train Classifier (Figure 4).

In the first step the vehicle and the training data set must be set. Afterwards a new classifier model can be created or an existing model can be selected. The next section contains the settings of feature selection (e.g,. MANOVA) and aggregation (e.g., discriminant analysis), as proposed in Section 2.3.4. Reducing the amount of features highly influences the classification result by reducing the chances for overfitting. It is possible to cross-validate the training process by setting the k-folded-cross-validation value to higher than 1. The last section offers a variety of settings for the classifier, e.g. for a SVM, including the kernel function and penalty term. Afterwards, the classifier can be trained and data can be plotted on open street maps. Furthermore, the confusion matrix and the total loss is shown in the MATLAB console.

For testing new data, a data set with modifications in time range and area to be analyzed can be generated as described for training, and a trained classifier must be selected. If the test data set is labeled, the output of the prediction is again a confusion matrix and the classification error. Moreover, the results can be visualized and plotted on open street maps, as it will be presented in Section 4. The trajectories will be cut into segments of different color referring to the corresponding classes, which are predicted.

4. Results

4.1. Event Classification

The accuracy of the cross-validation of classifying events yields 81% on average without feature aggregation. The aggregated feature space and the lines of the function to classify the events is shown in Figure 5(b).

The illustration of the classification shows that road segments in good condition, with light damages, speed bumps, and potholes, can be separated well. This indication is proofed by the quantitative results, listed in Table 2.

The precision and recall for the mentioned classes is above 70%, whereas the performance measures for manhole cover and railway crossing is below 62% on average.

The most important features, determined with MANOVA, are(i)peak-to-peak of pitch acceleration(ii)peak-to-peak of roll acceleration(iii)maximum of jerk in vertical direction(iv)root mean square (RMS) of the vertical acceleration(v)speed

By comparing each class with each other, it emerges that the peak-to-peak value of pitch and roll acceleration are mainly responsible to separate events, which occur on(i)both vehicle lanes (railway crossing, speed bump),(ii)on only one side of the vehicle (manhole cover, pothole),(iii)or have only little impact on the vehicle vibration (light damages, road segments in good condition).

In addition, the average RMS of the vertical acceleration is important to separate light damages and road segments in good condition. Furthermore, potholes and manhole covers are dividable through the maximum RMS of the roll acceleration for the frequency range 15 to 25 . However, latter events are often misclassified as segments in good condition or light damages. Speed bumps and railways crossings are separable by the value of the peek-to-peek of the pitch rate, whereas railways crossings are also often misclassified as light damages.

To test the classifier, a data set of more than 200 km of street data is classified and plotted on open street maps. The results are promising and represent the actual street condition in many occasions. A few examples of classified areas are shown below.

The first example shows the event classification results on two different high speed roads (Figure 6). The upper one with Label 1 is a freshly renovated asphalt highway with close to no damages and the lower one with Label 2 is a poorly patched asphalt road with a lot of medium and severe damages. The classification successfully predicted the upper roadway as good street. Most parts of the lower street were predicted as light damage and some points even as potholes. The results represent the road condition very accurate. The only noticeable misclassification is railway crossing that was predicted once (Label 3).

The second example presents data acquired in an urban area in Karlsruhe, the predictions are shown in Figure 7. The roads in this area are poorly preserved and there is a speed bump at a pedestrian crossing (Label 1). The classification model correctly predicts the speed bump (Label 1) for all overdrives and a pothole (Label 2) on both driving directions.

The third interesting sector is shown in Figure 8. Potholes (Labels 2 and 3), which were at the edge of the driving line, were overdriven multiple times and the classifier predicts the severe damage accordingly. Sometimes the output at the road segments is not pothole but light damages or even good road condition. The reason might be that the pothole was avoided by the driver.

The railway crossing (Label 1) is more elevated than other crossings and miss-classified as speed bump in few cases.

4.2. Road Surface Classification

When classifying road surfaces, cross-validation yields 96.1% accuracy on average without aggregation of features. The aggregated feature space is shown in Figure 5(a). The figure indicates, that the misclassifications are asphalt classified as damaged asphalt or damaged concrete and vice versa. The illustrated results are underlined by Table 2, where the precision and recall for cobblestone is above 99.0%, whereas the performance measures for asphalt, damaged asphalt, and damaged concrete are between 92.0 and 97.6% percent.

The three best individual features for the classification of road surface according to MANOVA are(i)RMS of the roll acceleration for frequency range from 5 to 15  on average(ii)standard deviation of the pitch rate(iii)stand deviation of the RMS of the vertical acceleration for frequency range 15 to 25 

The values of RMS of the roll acceleration and vertical acceleration separate the classes smooth surfaces, damaged asphalt and cobbled stone. The values are greatest for cobbled stone and low for smooth surface.

Standard deviation of pitch rate separates the classes damaged concrete from all other classes. The reason are probably poor and aged concrete joints.

The material classifier was applied to the same data set described for event classification. The classifier was able to reflect the road surface very precisely. The following figures display the performance on different surfaces. Analog to the event classification shown in Figure 6, the material classifier could distinguish between both roads and correctly classified them as smooth surface and damaged asphalt, respectively.

In contrast, Figure 9 shows a long highway segment with aged concrete and distinctive concrete joints, which have to be maintained shortly. Except for one short segment, which was classified as smooth surface, the road state was correctly predicted.

The classification results of data acquired in the urban area of Karlsruhe (Figure 10) show two correctly predicted areas of cobblestone (Label 1 and 2). The remaining road segments are correctly classified as segments with light damages or in good condition. Especially latter class was correctly predicted for a road segment, which was recently renewed (Label 3). One miss-classification of cobblestone can be found close to Label 1. However, this road segment is highly damaged with multiple potholes, which have a high impact on the vehicle vibration similar to cobblestones.

5. Conclusions and Outlook

The results show that the system presented in this paper can classify both road materials and events. The features selected by MANOVA are in agreement with the theory of vehicle excitation. Material classification performs well according to the results of the cross-validation and the test data.

There are multiple miss-classifications of the prediction of events, especially for structural obstacles, such as manhole covers and railway crossing. However, these events might be marked on a map and excluded from classification and investigation, as the main objective is to detect road damage.

One reason for misclassifications of the events good condition, light damages, and pothole might be false manual annotating, since there is sometimes only a fine line between the degree of damages, or the events were not fully overdriven, especially for potholes.

As the system is of modular design, the number and type of sensors and sensor modality can be varied. When adapting feature extraction, also camera recordings might be useful. Transferability to other vehicles with different chassis and dimensions has not been examined so far. Presumably, the algorithms of parameters are adapted to the vehicle with which the learning data set was recorded. Here, fusion of learning data sets from several vehicles and an accordingly adapted classification routine might help. It can be assumed that the results will be slightly worse.

Generally, the inertial sensor represents a very good option to collect information on the tire/road contact at low costs and over wide areas. Use of information of several vehicles can compensate the drawback of some drivers passing by safety-relevant damage that, hence, is not measured by the sensor. Moreover, obstacles at the roadside are not crossed and, hence, cannot be detected.

Fusion of camera and inertial sensor data probably would be the optimum solution for a mobile determination of the state of road traffic infrastructure. For road construction offices, use of a low-cost and computationally efficient system, consisting of an inertial sensor, Raspberry Pi, and simple signal processing, is sufficient and can be recommended.

Data Availability

The raw data used to support the findings of this study have been deposited in http://doi.org/10.5281/zenodo.1461243 [28]. The data can be processed with the presented toolbox available in http://doi.org/10.5281/zenodo.1216187 [21].

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was funded by the Federal Ministry for the Environment, Nature Conservation, Building and Nuclear Safety, Germany, within the Environmental Research Plan 2014 (Project no. 3714541000).