Abstract

This study establishes an artificial intelligence (AI) model for detecting pothole on asphalt pavement surface. Image processing methods including Gaussian filter, steerable filter, and integral projection are utilized for extracting features from digital images. A data set consisting of 200 image samples has been collected to train and validate the predictive performance of two machine learning algorithms including the least squares support vector machine (LS-SVM) and the artificial neural network (ANN). Experimental results obtained from a repeated subsampling process with 20 runs show that both LS-SVM and ANN are capable methods for pothole detection with classification accuracy rate larger than 85%. In addition, the LS-SVM has achieved the highest classification accuracy rate (roughly 89%) and the area under the curve (0.96). Accordingly, the proposed AI approach used with LS-SVM can be very potential to assist transportation agencies and road inspectors in the task of pavement pothole detection.

1. Introduction

Roads are essential components of the national infrastructure. Evaluating road condition is a crucial task of transportation agencies that are responsible for establishing maintenance schedules and allocating maintenance budgets [1]. The correlation of road deterioration and the increasing number of traffic accidents leads to the fact that road safety has become a common concern in many countries [2]. The problem of asphalt road degradation has a very negative impact on the economic development for developing countries where financial resource for pavement maintenance is often insufficient. Therefore, it is of practical need to improve the effectiveness of the asphalt pavement maintenance process.

The process of road safety survey generally consists of the detection of the defects (e.g., cracks and potholes) existing on the road section and evaluation of the magnitude of the defects [3]. Among several forms of pavement distresses, potholes are important indicators of the road defects, and they should be detected in a timely manner for the tasks of asphalt-surfaced pavement maintenance and rehabilitation [4]. The reason is that this form of defect significantly delays traffic and brings about a hazardous condition for drivers.

A pothole is commonly defined as a bowl-shaped depression on the pavement surface with a minimum plane diameter of 150 mm [5]. Generally, structure aging, heavy traffic condition, poor drainage, thin asphalt layer substructure, and weak substructure can be the causes of pothole appearance [6]. In developing countries, the pavement pothole is often detected manually by inspectors of local transportation agencies during periodical field surveys. Although this conventional method can help to acquire accurate evaluation of potholes, it also features low productivity in both data collection and data processing. The reason is that one pavement inspector can only inspect less than 10 km per day [7]. With a large number of road sections needed to be inspected routinely, the automation of the pothole detection becomes a pressing need for transportation agencies. Moreover, the productive pavement surveying process significantly leads to economic gain. It is because, if rehabilitation process is performed timely, pavement restoration cost can be saved by up to 80% [8].

In recent years, the advancement of image processing techniques and the availability of low-cost visual sensing equipment have paved the way for various methods of automatic pothole detection. The feasibility of automatic pothole detection approaches stems from the fact that the textures of potholes are recognizably different from the background of the pavement surface. Generally, computer-based pothole detection can be divided into 3D reconstruction-based and 2D vision-based approaches [3]. The 3D reconstruction-based methods are established by 3D point clouds that are provided by stereovision algorithms with the employment of a pair of video cameras [9]. As stated by Koch et al. [3], the stereovision-based methods require a complete 3D reconstruction of the asphalt pavement surface. These methods also necessitate expensive equipment which is significant hindrances for researchers in developing countries. Furthermore, irregular texture and color of the asphalt pavement create a significant challenge to the performance of 3D reconstruction-based approaches [3].

Accordingly, two-dimensional (2D) pavement images are widely used in practice for pavement pothole detection. Karuppuswamy et al. [10] proposed an integration of a vision and motion system to detect simulated potholes. Koch and Brilakis [11] relied on the elliptic shape, grain surface texture, and image segmentation to identify potholes in 2D images. Computer vision-based models that employ median filtering and morphological operations have been put forward by Lokeshwor et al. [12] and Radopoulou and Brilakis [13]. Koch et al. [14] established a pothole detection model that relies on the techniques of texture extraction and comparison between pothole pixels and healthy pavement pixels. Lokeshwor et al. [15] proposed an adaptive thresholding technique for segmenting distress pixels from the background pixels. The histogram shape-based thresholding and maximum entropy method have been employed in the previous works of Ryu et al. [6]. Ouma and Hahn [16] rely on wavelet transform and fuzzy c-means clustering to separate defect and nondefect pavement pixels.

Recent review works [3, 17, 18] have pointed out an increasing trend of applying image processing technique and artificial intelligence (AI) method for enhancing the accuracy and productivity of the task of interest. Moreover, irregular background illumination and complex pavement texture/color are still major challenges that computer vision-based methods have to overcome. Hence, other advanced approaches of image processing and AI should be investigated to construct automatic pothole recognition models.

The current study is dedicated to establishing a new AI-based model for automatically recognizing pothole objects in asphalt pavement images. The steerable filter is employed to create a salient map for distress detection. In addition, the Gaussian filter is utilized for image denoising, and the integral projection is employed to exhibit the characteristics of the salient map constructed by the steerable filter. The features extracted by the aforementioned image processing techniques are used by the artificial neural network and the least squares support vector machine. A data set of image samples with two class labels (nonpothole and pothole) has been collected and used to train and validate the performance of the two supervised learning algorithms.

The rest of the paper is organized in the following way: The second section presents the research methodology. The next section describes the structure of the proposed model for pavement pothole recognition. Experimental results and performance comparison are reported in the fourth section, followed by conclusions of the study in the final section.

2. Methodology

2.1. Image Processing Techniques
2.1.1. Gaussian Filter (GF)

In image processing field, GF is a widely used preprocessing technique to reduce image noise and remove redundant details [19]. Particularly for the task of pothole detection, GF can be helpful to blur the asphalt background texture and facilitate further analysis of the digital image. The GF is essentially a 2D convolution operator that uses the kernel that represents the shape of a Gaussian function. The formula of a Gaussian function in a 2D space is given as follows:where denotes the standard deviation of the GF.

Since the image is stored as a collection of discrete pixels, it is necessary to employ a discrete approximation of the Gaussian function before performing the convolution operator on the image. For more details of the discrete approximation of the GF, readers are guided to the previous works of Gonzalez et al. [19]. Figure 1 illustrates the effect of image smoothing using the GF with different values of standard deviation parameters.

2.1.2. Steerable Filter (SF)

A SF is an orientation-selective convolution kernel image processing algorithm put forward by Adelson and Freeman [20]. SF is highly useful for the task of image enhancement because this algorithm is able to distinguish the objects of interest and the surrounding background [21]. SF has been successfully employed in various fields including recognition of object tracking, road crack detection, and many computer vision problems [2226]. In this current study, a linear combination of Gaussian second derivatives is used as a basic filter.

The equation of the 2D Gaussian at coordination in the digital image is provided as follows:where is the variance of the 2D Gaussian function.

The second derivative of the function is shown as follows:

The formulation of the steerable filter is given in the following equation:where denotes the orientation of the filter.

The filter response for a whole digital image is graphically presented in Figure 2 with and . It is noted that the response at coordination is attained by the convolution operator as follows:where “” is the image convolution operator.

2.1.3. Integral Projection (IP)

IP is a commonly used image processing technique in the field of automatic face recognition system [27]. Due to its simplicity and discriminative power, this technique is very potential to be applied to the task of pothole detection. Given a grayscale image , the horizontal and vertical IPs are defined as follows:where and are the horizontal and vertical IPs, respectively. and denote the set of horizontal pixels at the vertical pixel and the set of vertical pixels at the horizontal pixel , respectively.

The results of IP analysis for several pavement images with the size of 150 × 150 pixels are presented in Figure 3. As can be observed from these images, a healthy pavement image is characterized by recognizably stable signals of both horizontal and vertical IPs. On the other hand, each IP of an image containing a pothole features a peak; moreover, the location of these two peaks should be relatively close to each other. It is noted that if an image contains a crack pattern, the IPs are not stable. In this case, there should be a peak of intensity along one axis as shown in Figure 3(b). Based on such observations, IP can be effective in characterizing images with and without the pothole.

2.2. Artificial Intelligence Approaches
2.2.1. Artificial Neural Network (ANN)

ANN is a popular AI approach for pattern recognition. This approach stems from biological neural networks in the natural world [28]. Through the supervised training process, an ANN model is capable of making inference via a large aggregation of neural units called artificial neurons. ANN is very similar to the way a biological brain solves pattern recognition problems with a large number of connected biological neurons [29]. An ANN model consists of multiple nodes, which simulate biological neurons of the human brain.

A neuron can process information and exchange information with other neurons through axons. Each link or axon is featured by a weight value. Thus, ANN is able to learn a discrimination function by adapting the values of these weights. An ANN model typically includes an input, a hidden, and an output layer (Figure 4).

Providing that the learning task is to approximate a classification function , where denotes the number of input attributes and represents the number of class labels, the ANN structure is shown in the following equation [30]:where and denote weight matrices of the hidden layer and the output layer, respectively. represents a bias vector of the hidden layer; is a bias vector of the output layer; denotes an activation function (e.g., log-sigmoid). Generally, the weight matrices and the bias vectors of the ANN can be effectively trained using the error backpropagation framework [28, 31].

2.2.2. Least Squares Support Vector Machine (LS-SVM)

LS-SVM, proposed by Suykens et al. [32], is a least squares version of the standard support vector machine (SVM) algorithm. A notable advantage of LS-SVM is that the model structure of LS-SVM is learned by solving a linear system instead of a nonlinear optimization problem in SVM. During the learning process, LS-SVM first maps the data from the original space to a high-dimensional feature space via a mapping function [33, 34]. The LS-SVM method then constructs an optimal separating hyperplane by adapting the parameters of its normal vector and bias. A typical learning process of a LS-SVM classifier is displayed in Figure 5.

Given a training data set of where represents the number of training data points, denotes the data dimension, is the two class labels of interest, the LS-SVM for a 2-class pattern recognition can be stated as follows [35, 36]:where denotes the normal vector to the classification hyperplane and is the bias; represents error variables; denotes a regularization constant.

Accordingly, the Lagrangian is given by the following equation [32]:where is a Lagrange multiplier; denotes a kernel function.

After the KKT conditions for optimality are applied, the optimization described in (9) corresponds to solving the following linear system [32]:where ; ; and . Additionally, the kernel function is applied as follows:

Finally, the LS-SVM classification model can be obtained as follows:where and denote the solution to the linear system shown in (11). In addition, the kernel function that is often used in LS-SVM is the radial basis function (RBF) kernel [37].

2.3. The Collected Data Set of Pavement Images

Because ANN and LS-SVM are supervised learning algorithms, a data set of asphalt pavement images with ground truth conditions of pothole and nonpothole has to be collected for model training and validation. This study has collected images of asphalt pavement using a digital camera during field surveys. The two class labels of nonpothole and pothole are assigned by the inspector. To facilitate the speed of image processing steps, the image size of each sample is determined to be 150 × 150 pixels. Thus, image cropping and resizing are applied if necessary.

In total, 200 image samples are prepared within which each class label has 100 samples. In order to establish and verify the prediction models, the collected data set has been divided into two subsets: the training set (80%) and the testing set (20%). The training set is employed in the model construction phase. The testing set is reserved to examine the predictive capability of the trained models. The collected image data set is illustrated in Figure 6.

3. The Proposed Approach for Asphalt Pavement Pothole Detection

This section describes the overall structure of the proposed approach (Figure 7) which is named as steerable filter and artificial intelligence-based pothole detection model (SF-AI-PDM). The model includes three main modules: (1) image acquisition and feature extraction, (2) data set construction, and (3) AI model training and prediction. SF is an essential part of the first step.

Besides the SF algorithm, the GF is used as a method of image denoising and IP is employed to extract the properties of the salient map computed by the SF. It is noted that the GF is applied to denoise both the original image (GF level 1) and the SF-based salient map (GF level 2). The GF level 1 aims at removing irregular texture on the asphalt pavement background. Meanwhile, the GF level 2 is dedicated to enhance the SF-based salient map by reducing the noisy feature. In the third module, ANN and LS-SVM are employed to generalize a classification boundary used to recognize pothole patterns. It is noted that SF-AI-PDM has been programmed in MATLAB environment with the support of the MATLAB image processing toolbox [38].

At the first step of the first module, the proposed approach employs the GF to denoise the original digital image. Herein, the parameter which is the standard deviation of the GF used at level 1 should be selected. Based on several trial-and-error runs with the collected image samples, the value of is found to be the appropriate one. In the next step, the SF is used to create a response map. With the orientation of the filter , the parameter of SF is experimentally set to be 2. The parameter is found to sufficiently separate the pothole pattern out of the pavement background.

To enhance the quality of the SF-based response map, GF is applied. The value of the standard deviation of GF () at the second level is also selected to be 3. Based on the smoothed response map, the IPs of the image are calculated. As stated earlier, the image size is 150 × 150 pixels. Thus, if no simplification measure is employed, the number of IP-based features can be quiet large (i.e., 300 features). To reduce the number of features, a method which is similar to the moving average technique is applied. In detail, the average value of 5 consecutive pixels along the horizontal and vertical axes of an image is computed to establish the contracted IP. This moving average technique can help to alleviate local fluctuations in the original IPs. Another benefit of this technique is that the number of features is reduced from 300 to 60. This reduction of the number of features is very helpful for the machine learning algorithms since they can avoid the curse of dimensionality [28].

The whole process of feature extraction is illustrated in Figure 8. It can be observed that the contracted IPs with 60 features () still preserve the distinctive characteristics of the original IPs with 300 features. Another observation is that both GF level 1 and level 2 have critically diminished the noisy patterns in the original digital image and the SF-based response map. Particularly for the second level of the GF, this technique helps to highlight the pattern of the pothole. Consequently, the features extracted from the IPs can have high discriminative power to distinguish pothole and nonpothole classes.

After the feature set is determined, the data set of image samples is established in the second module. Accordingly, the data set is separated into two sets: training set (80%) and testing set (20%). The training set is used for model establishment; the testing set is employed for model verification. With the separated data sets, the AI methods of ANN and LS-SVM are employed to generalize a decision boundary that classifies the instances of nonpothole and pothole classes. The training phases of ANN and LS-SVM require the setting of several model parameters. In addition, to quantify the classification performance of these two AI approaches, evaluation metrics must be employed. These two issues are going to be addressed in the next section of the study.

4. Experimental Result and Comparison

As aforementioned, the data set consists of 200 image samples with the size of 150 × 150 pixels. The data set is separated into the training set which occupies 80% of the data and the testing set which contains 20% of the data. The first set is used to establish the model; the second set is reserved for investigating the predictive performance of the models. It is noted that a single run of the experiment may not reliably reveal the model predictive performance due to the problem of randomness in data separation. Hence, the performances of the AI approaches (ANN and LS-SVM) are evaluated via a repeated subsampling process which includes 20 runs. In each run, 20% of the data set is randomly taken out to form the testing data set; the rest of the data set is used as the training set.

It is also proper to note that, before the training and predicting phase, the Z-score transformation has been employed to normalize the whole data set. The data normalization aims at fending off the situation in which input variables with large magnitude dominate ones with small values. Moreover, the implementations of the two AI methods necessitate the specification of several tuning parameters. For the purpose of setting those parameters, the original data set is divided into a training set (80%) and a verification set (20%). The model’s tuning parameters corresponding to the best predictive performance on the verification set is selected as the optimal ones.

The implementation of ANN requires determining the number of neurons in the hidden layer and the learning rate. As suggested by Heaton [39], the number of neurons is roughly selected to be . Herein, and represent the number of neurons in the input and output layers, respectively. In addition, the number of neurons in the hidden layer should not larger than 1.5 × NL since overfitting often happens with an ANN with a surplus number of neurons. The second tuning parameter of ANN is the learning rate in which values can be selected from a set of [0.001, 0.01, 0.1, 1]. Other parameters of ANN including the type of the activation function and the number training epochs are selected to be log-sigmoid function and 3000, respectively. For the case of LS-SVM, this AI method necessitates an appropriate determination of the penalty constant and the kernel function parameters. In this study, these two tuning parameters of LS-SVM are determined via a grid search algorithm described in the previous research work of Hoang and Tien Bui [40].

Moreover, to quantify the predictive capability of AI models used for pothole detection, the classification accuracy rate (CAR) is calculated as follows:where and represent the number of image samples being correctly classified and the total number of image samples, respectively.

Besides CAR, the true positive rate (TPR) (the percentage of positive instances correctly classified), the false positive rate (FPR) (the percentage of negative instances misclassified), the false negative rate (FNR) (the percentage of positive instances misclassified), and the true negative rate (TNR) (the percentage of negative instances correctly classified) should also be used [41]. The four indices are computed in the following way:where are the values of true positive, true negative, false positive, and false negative, respectively.

Furthermore, the four rates of TP, FP, FN, and TN can be graphically summarized in the form of a receiver operating characteristic (ROC) curve. The ROC curve is drawn based on the sensitivity (true positive rate) and the specificity (false negative rate). Using the ROC curve, an index called the area under the curve (AUC) can be calculated to express the model classification capability. It is noted that AUC ranges from 0.5 to 1. AUC = 1 indicates a perfect classification model, and AUC = 0.5 indicates an incapable classifier with random predictions [42].

Experiments with various settings of model parameters indicate that ANN with the number of neurons of 40 and the learning rate of 0.01 delivers the most desirable outcome. In addition, the regularization parameter of 500 and the kernel function parameter of 100 are found to be appropriate for the LS-SVM model. With such parameter settings, the model prediction performances obtained from 20 runs are summarized in Table 1 with the average (mean) and standard deviation (SD) values of each performance metric. It is observable that LS-SVM (CAR = 88.75% and AUC = 0.96) has achieved a better prediction accuracy than ANN (CAR = 85.25% and AUC = 0.92). The TPR and TNR of LS-SVM (0.82 and 0.96) are also superior to those of the ANN (0.81 and 0.90). The prediction capabilities of the two AI models in the form of ROCs are graphically presented in Figure 9. Experimental results indicate that LS-SVM is a more suitable AI approach for the collected image data set of asphalt pavement.

5. Conclusion

This research establishes an automatic approach for asphalt pavement pothole detection. Image processing techniques including GF, SF, and IP are used synergistically to extract features from pavement digital images. Two levels of GF are utilized as an image denoising technique. SF assisted by GF is used to generate a pothole resilient map. IP analysis based on such map is performed to numerically present the feature of an image with the particular interest in pothole recognition. A simple moving average technique is put forward to reduce the number of the extracted features from 300 to 60. Based on the image features, two AI approaches of ANN and LS-SVM have been employed to construct classification models to predict the existence of pothole on the pavement surface. Experimental results with a repeated subsampling procedure with 20 runs confirm that ANN and LS-SVM are capable AI methods for pothole detection. It is because the CARs of both methods are higher than 85% and AUC values surpass 0.9. Moreover, LS-SVM has been identified as the better approach for the task of pothole detection with a desired accuracy of approximately 89%.

With good predictive accuracy, the proposed AI model is very potential to be employed by transportation agencies and road inspectors to enhance the productivity of pavement inspection tasks with the specific focus on the pothole. The first future direction of the current study may include the evaluation of other advanced AI methods and their ensemble learning strategy to meliorate the pothole detection accuracy rate. The second future direction of the current model is to utilize advanced image processing methods for estimating the size of potholes. In addition, the integration of the current AI model with other sophisticated image analysis techniques to enhance the feature extraction stage is also worth investigating.

Data Availability

The data can be sent if requested by contacting the author at [email protected].

Conflicts of Interest

The author declares that there are no conflicts of interest regarding the publication of this manuscript.