#### Abstract

Periodic surveys of asphalt pavement condition are very crucial in road maintenance. This work carries out a comparative study on the performance of machine learning approaches used for automatic pavement crack recognition. Six machine learning approaches, Naïve Bayesian Classifier (NBC), Classification Tree (CT), Backpropagation Artificial Neural Network (BPANN), Radial Basis Function Neural Network (RBFNN), Support Vector Machine (SVM), and Least Squares Support Vector Machine (LSSVM), have been employed. Additionally, Median Filter (MF), Steerable Filter (SF), and Projective Integral (PI) have been used to extract useful features from pavement images. In the feature extraction phase, performance comparison shows that the input pattern including the diagonal PIs enhances the classification performance significantly by creating more informative features. A simple moving average method is also employed to reduce the size of the feature set with positive effects on the model classification performance. Experimental results point out that LSSVM has achieved the highest classification accuracy rate. Therefore, this machine learning algorithm used with the feature extraction process proposed in this study can be a very promising tool to assist transportation agencies in the task of pavement condition survey.

#### 1. Introduction

The acceptable level of road serviceability is very crucial to ensure the economic growth and the safety of passengers. Therefore, transportation agencies periodically survey and collect the pavement condition data. Accurate and timely recognition of pavement distress as well as pavement health monitoring measures has increasingly become an integral part of the regional road maintenance system [1]. It is because early detection of pavement distress can help to develop cost-effective rehabilitation methods and prevent the reduction in service life of pavement structures [2].

Cracks are widely considered to be an important indicator of road surface degradation. The causes of cracks in asphalt pavement can be vehicle overload, inclement climatic conditions, and aging of road structure [3]. Detection of cracks in pavement surface is highly useful for the task of road maintenance. The reason is that if cracks are recognized timely and accurately, the maintenance cost of road can be saved significantly [4].

In developing countries, roads are usually surveyed manually by human inspectors. This traditional approach of road inspection is time-consuming and subjected to variation in assessment outcomes. Therefore, automatic pavement condition inspection and evaluation have become a common desire of transportation agencies. To construct automatic pavement assessment systems, researchers and practitioners extensively rely on image processing techniques within which 2-dimensional images are the input information. Various intelligent methods are then employed to enhance and transform the images to highlight the objects of interest which are pavement cracks [5]. Instead of analyzing the whole image, a set of useful features can be extracted from the image to detect the status of crack and to distinguish the type of cracks [2, 6].

During the last three decades, various research works have dedicated to establishing pavement crack detection models. Kaseko et al. [7] performed a comparative evaluation of neural network classifier and the traditional classifiers of the Bayes classifier and the k-nearest neighbor (k-NN); this work highlighted the potential of neural network in pavement crack classification. Lee and Kim [8] proposed a simple method for feature extraction used for crack class categorization; this method is based on the concept of Crack Type Index. Edge detection methods that employed Canny and Sobel algorithms have been used for crack detection [9]. Wang et al. [10] attempted to use wavelet transform approaches to recognize the existence of cracks and observed promising outcomes. Ying and Salari [11] applied the beamlet transform-based technique to extract linear features of crack objects; this technique has the advantage of crack feature extraction in the presence of noise.

Gavilán et al. [12] established an adaptive road crack detection system which employed Support Vector Machine (SVM) ensembles. Ouma and Hahn [4] constructed an automatic recognition approach of linear cracks based on the wavelet-morphology and circular Radon transform methods. Mokhtari et al. [13] utilized artificial neural network (ANN), decision tree, and k-nearest neighbors to classify pavement images into “Crack” and “No Crack” labels; the ANN was proved to be superior to the decision tree and k-nearest neighbors. Tiled fuzzy Hough transform was applied to detect near straight segments of cracks embedded in pavement textures [14]; this study confirms that the fuzzy Hough transform is effective in diminishing the contribution of texture and noise pixels. Li et al. [3] constructed an automatic method used for both detection and segmentation of pavement cracks using the steerable matched filtering and an active contour model. Fujita et al. [15] proposed a linear SVM based classification model that uses a set of hand-crafted features extracted from digital images.

Cubero-Fernandez et al. [6] extracted the characteristics of images using various techniques including logarithmic transformation, bilateral filter, Canny algorithm, a morphological filter, and Projective Integral method; a Classification Tree is then utilized and applied to categorize the images containing cracks. Zhang et al. [16] and Gopalakrishnan et al. [1] relied on the Convolutional Neural Network (CNN) to classify crack patterns; CNN is the approach that incorporates the feature extraction and the image classification process. However, CNN requires a significant amount of training samples to construct a robust classifier and therefore consumes a considerable computational cost. A SVM based method that takes into account the information of neighboring pixels has been recently introduced by Ai et al. [17]. Hoang and Nguyen [2] employed the image processing methods of Steerable Filters and Projective Integral for the feature extraction task as well as machine learning for classification task. Although the machine learning based method proposed in Hoang and Nguyen [2] has a good performance, this method cannot effectively recognize diagonal crack patterns.

Based on recent review works, the trend of applying automatic methods for pavement condition assessment is increasingly observed in the academic community due to the affordable cost of image acquisition equipment and the rapid advancements of image processing techniques [18, 19]. Nevertheless, automatic crack detection and classification still face significant challenges including the complexity of the pavement texture, unexpected objects, nonuniform illumination, weak signals of crack patterns, the inhomogeneity of cracks, and the diversity of crack patterns [2, 5, 15]. Therefore, more studies should be dedicated to improving the effectiveness of pavement classification models. This improvement can be achieved either through the enhancement of the feature extraction phase or through the identification of more suitable machine learning approaches.

Based on such motivations, this study proposes an alternative tool for automatic pavement crack classification that employs image processing and machine learning methods. The current study extends the body of knowledge in the following aspects:(i)To deal with the complex and noisy texture of the pavement background, image processing techniques including Median Filter, Steerable Filter, and Projective Integral are used in the feature extraction phase.(ii)Six machine learning algorithms including Naïve Bayes Classifier, Classification Tree, Backpropagation ANN, radial basis function ANN, SVM, and Least Squares SVM are employed to categorize pavement images into five classes: alligator crack, diagonal crack, longitudinal crack, noncrack, and transverse crack. This study compares the performances of these classifiers to identify the most appropriate one.(iii)In addition, to specifically improve the accuracy of detecting diagonal cracks, a rotated Projective Integral method is employed.

The subsequent part of the article is organized as follows: The second section reviews the research methodology; the third section presents the processes of image acquisition and feature extraction followed by the experimental result and comparison; the last section summarizes the study with several remarks.

#### 2. Research Methodology

##### 2.1. Image Processing Techniques

###### 2.1.1. Median Filter (MF)

MF, a nonlinear image filtering technique, is an effective approach to noise removal. This image filtering technique is widely used in the field of image processing because it has the advantage of edge preservation. Basically, MF replaces each pixel in the image with the median of its neighboring pixels [20]. The number of the neighboring pixels is determined by the window size (e.g., 3x3 or 5x5 pixels). As demonstrated by Arias-Castro and Donoho [21], MF can be better than Gaussian blur at noise removal and edge preservation edges for a fixed window size. Rababaah [22] experimentally compared the performances of several image denoising techniques and found that MF is the most suitable method for processing asphalt pavement images.** Figure 1** illustrates the effects of MF on denoising a pavement image with different window sizes.

###### 2.1.2. Steerable Filter (SF)

The Steerable Filter (SF) [23] is essentially an image enhancement technique that employs orientation-selective convolution kernels. As demonstrated in the previous works of Cubero-Fernandez et al. [6] and Hoang and Nguyen [2], this image enhancement technique is particularly useful for differentiating the crack patterns and the background texture of asphalt pavement. In addition to crack detection, SF has been successfully employed in other tasks of the computer vision field such as object tracking, text classification, and distress recognition [3, 24–26].

It is noted that, in the SF algorithm, a linear combination of Gaussian second derivatives is used as a basic filter. For an image* I*(*x*,*y*), a* 2D* Gaussian distribution at a certain pixel coordination is expressed as follows:where* r* denotes a tunable parameter of the Gaussian function variance.

The SF formulation with the orientation of is expressed as follows:where* G*_{xx},* G*_{xy}, and* G*_{yy} represent the Gaussian second derivatives and their formulas are shown below:

It is worth noticing that when the value of the Gaussian function variance (*r*) is fixed, the final filter response is a combined result of SF with an orientation set . The value of is selected from a set of . The SF response of an asphalt pavement image containing cracks is illustrated in** Figure 2** with different value of the parameter* r*. The final SF response at the pixel location of in the image* I* is computed as follows: where “” is the convolution operator.

###### 2.1.3. Projective Integral (PI)

In the image processing field, PI is a simple yet effective method to characterize the shape as well as the texture within an image. This method has been widely utilized in the field of face or facial emotion recognition [27]. PI has recently demonstrated its important role in pavement distress classification and recognition [2, 3, 6].

With a digital image* I*(*x*,*y*), the horizontal and vertical PIs are commonly employed. The formulas used to compute these two aforementioned PIs are shown as follows:where HP and VP represent the horizontal and vertical PIs, respectively.* x*_{y} and* y*_{x} denote the set of horizontal pixels at the vertical pixel* y *and the set of vertical pixels at the horizontal pixel* x*, respectively.

As shown in the previous work of Hoang and Nguyen [2], HP and VP are highly useful for the task of recognizing the alligator crack, longitudinal crack, noncrack, and transverse crack. The reason is that an alligator crack case and a noncrack case are often characterized by relatively stable PIs in both horizontal and vertical axes; however, the average value of the first case is higher than the latter case. On the other hand, a longitudinal crack case and a transverse crack case ideally feature one peak of intensity in VP and HP, respectively.

However, these two PIs are not sufficient to identify the diagonal crack. An example of image analysis using PI is provided in** Figure 3**. It is clearly shown that HP and VP of an image with an alligator crack are very similar to those of an image with a diagonal crack. Therefore, to obtain a more discriminative PI-based feature, this study employs the PI in the two diagonal directions, denoted as diagonal projections (DPs) 1 and 2. As can be seen in** Figure 3(b)**, the PI of an image containing a diagonal crack has relatively stable HP and VP; however, one of its two DPs features a peak of intensity.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

##### 2.2. Machine Learning Approaches Used for Pavement Crack Classification

###### 2.2.1. Naïve Bayes Classifier (NBC)

NBC is a simple method used for pattern classification. For a two-class problem, this algorithm assigns the input pattern to one of two-class* C*_{m} (*m* = 1, 2). The class label of the input pattern is computed in the following manner [28, 29]:where denotes the posterior probability of the class* C*_{m}*. * is the likelihood which is the class-conditional probability density function of the input pattern* X*. is the prior probability of the class* C*_{m}. denotes the evidence factor.

The evidence factor is basically a scale factor employed to guarantee that the posterior probabilities sum to one [30]. is calculated as follows:

In addition, it is usually shown that the input pattern X is a* D*-dimensional vector. Thus, each element of* X* is denoted as* X*_{j} where* j *= 1, …,* D*. To compute , NBC relies on the assumption that the probability distributions of attributes* X*_{j}, within each class, are independent of each other [29]. Hence, the class-conditional density is expressed in the following manner:where is the probability distribution of the attributes* X*_{j} in a particular class* C*_{m}. Moreover, the density is assumed to be a Gaussian distribution.

###### 2.2.2. Classification Tree (CT)

CT, proposed in Breiman et al. [31], is a popular method for data classification [32, 33]. This algorithm is a method for discovering structural patterns in data and presenting the data in the form of a tree-like structure. In the training phase, a CT model is established by splitting subsets of the collected data set using all input variables to build two child nodes [34]. The most appropriate input variable is selected via an impurity function. The CT algorithm aims at creating data subsets which are as homogeneous as possible for each class label. The Gini impurity function is often employed to quantify the data homogeneous property; this function is expressed as follows [35]:where a Gini impurity index of data subset is calculated in the following manner [36]:Here denotes the number of classes and represents the ratio of presence of class in this set.

After being constructed, a CT model consists of a root node, a set of internal nodes, and a set of terminal nodes (also called leaves). Each node in the tree represents a binary decision that separates the input variable into either one of the two class labels. Therefore, the data classification process is performed in a top-down manner from the root node to the terminal node.

###### 2.2.3. Backpropagation Artificial Neural Network (BPANN)

BPANN is a machine learning based classifier inspired from biological neural networks. This algorithm simulates the knowledge acquisition and inference processes of the human brain [37]. Based on previous studies [38–41], BPANN is proved to be highly effective in dealing with complex nonlinear data modeling problems. A BPANN consists of the input, hidden, and output layers. The hidden layers contain a set of artificial neurons; the interconnected artificial neurons play the crucial role of identifying the structure in the data to compute the class labels of each data instance in the output layer.

Using BPANN, a data classification problem boils down to establishing a discrimination function where* D* is the number of input patterns and* O* denotes the number of class labels.

The BPANN model used for pattern classification is given as follows [42]:where* W*_{1} and* W*_{2} denote weight matrices of the hidden layer and the output layer, respectively;* N* is the number of artificial neurons in the hidden layer;* b*_{1} = [*b*_{11},* b*_{12,}…,* b*_{1N}] and* b*_{2} denotes a bias vector of the hidden layer and of the output layer, respectively.* f*_{A} represents an activation function (e.g., log-sigmoid function).

The value of* N* should be selected appropriately to ensure the predictive capability of BPANN. As suggested by Heaton [43],* N* can be roughly selected to be ; moreover, an BPANN model with often brings about no predictive improvements. It is noted that model parameters of BPANN, stored* W*_{1},* W*_{2},* b*_{1}, and* b*_{2}, are adapted via the backpropagation process [44, 45].

###### 2.2.4. Radial Basis Function Neural Network (RBFNN)

A RBFNN [46] is a feedforward neural network model that employs radial basis functions as activation functions. Its structure is organized into an input layer, a hidden layer, and one output layer. A RBFNN carries out pattern classification by measuring the similarity between the query inputs with a set of prototypes stored in this model. In essence, each of the* N*_{NR} neurons in the hidden layer represents a prototype used for performing classification. A prototype is characterized by a centroid in the learning space.

The data similarity is dependent on the Euclidean distance between the two data points and computed via the radial basis function [47]. The radial basis function is expressed as follows:where* c*_{j} is the coordination of the centroid,* x* denotes an input pattern, and is the norm between the input data and the centroid.

In the final model, the RBFNN output is calculated through a sum product of the network’s weight and the input vector [48]:where is the network weight which can be adapted via the orthogonal least squares learning algorithm [49] used in the network training phase.

###### 2.2.5. Support Vector Machine (SVM)

SVM, proposed by Vapnik [50], is a binary classification method which originates from the statistical learning theory. Generally, the task at hand is to construct a decision boundary that separates the data into two classes. This decision boundary is generalized from a set of training data points with input data and a set of class labels . The SVM first employs a nonlinear mapping function and the kernel trick to increase the data dimension [51]. Subsequently, this method constructs a hyperplane that plays the role of a decision boundary.

The SVM training phase boils down to solving the following optimization problems:where * R*^{n} is a normal vector to the hyperplane and* b** R* denotes the model bias; is a slack variable;* c* represents a penalty constant; and is a nonlinear mapping from the input space to the high-dimensional feature space.

Based on the kernel trick, it is not necessary to find an expression of . It is only required to compute the dot product of in the input space; this dot product is called a kernel function expressed as follows:

The widely employed kernel function is the radial basis function (RBF) [52, 53]. Find the optimal solution; the optimization problem in** (17)** is converted into its dual form which is essentially a quadratic programming problem [51]. Accordingly, the SVM model employed for classification problem is compactly shown as follows:where denotes the solution of the dual form of the optimization problem given in** (17)**.* SV* is the number of support vectors which is actually the number of positive .

###### 2.2.6. Least Squares Support Vector Machine (LSSVM)

First described by Suykens and Vandewalle [54], LSSVM is a powerful approach for data classification. This machine learning method can be considered to be a least squares version of the standard SVM proposed by Vapnik [50]. A significant advantage of LSSVM is that its training phase is accomplished by solving a system of linear equations instead of the quadratic programming problem required by SVM. This fact considerably enhances the computational efficiency of LSSVM and superior performance of this machine learning algorithm has been reported in various applications [55, 56].

The LSSVM learning process can be described as follows:where is the normal vector to the classification hyperplane, and denotes the bias; is error variable; represents a regularization constant.

To solve the above constrained optimization, the Lagrangian is applied as follows:where denotes a Lagrange multiplier; is a nonlinear mapping function.

Using the KKT conditions for optimality, the previous optimization is equivalent to solving a linear system [57]. Accordingly, the LSSVM model for binary classification can be attained as follows:where and* b* are the solution of the aforementioned constrained optimization problem. is the kernel function which is applied in a similar manner to the standard SVM.

#### 3. Acquisition of Pavement Images and the Feature Extraction Process

##### 3.1. Pavement Image Acquisition

Because all the six machine learning methods (NBC, DT, BPANN, RBFNN, SVM, and LSSVM) are supervised algorithms, a data set of asphalt pavement images with the corresponding ground truth labels must be prepared to construct and validate the machine learning based crack classification models. To establish the data set, images of asphalt pavement have been collected during field surveys in Da Nang city (Vietnam). Image samples are captured using digital camera at the distance of about 1.2m above the road surface.

To accelerate the computational process, the images are resized to be 100x100 pixels. There are five classes of pavement condition, namely, alligator crack (AC), diagonal crack (DC), longitudinal crack (LC), noncrack (NC), and transverse crack (TC). Each class label contains 300 image samples; therefore, the total number of data instances in the collected data set is 1500. In this study, medium and large cracks are the objects of interest. The image data sets are illustrated in** Figure 4**. For the purposes of model construction and verification, the data set has been divided into two subsets: the training set (80%) and the testing set (20%). The training set is employed to construct the six-machine learning model and the testing set plays the role of novel input patterns to exhibit the predictive performance of the crack classification model.

##### 3.2. Image Feature Extraction

This step aims at creating a set of features used by the machine learning approaches in the task of pavement crack classification. The acquired input image is transferred through a series of processing steps to enhance its representation; the whole feature extraction process is presented in** Figure 5**. The digital image is first processed by MF to remove the unwanted dot noise and partially diminish the background texture of asphalt pavements. The smoothed image is then enhanced by the SF algorithm which has the purpose of highlighting the crack patterns. The map created by the SF response is employed to construct four PIs, namely, HP, VP, and two DP (DP1 and DP2). The DP1 and DP2 are specifically used to deal with diagonal crack recognition. To compute these two DPs, the map of the SF response is rotated with the angles of +45 and -45 to create two rotated SF maps (demonstrated in** Figures 6** and** 7**). The two DP1 and DP2 are obtained by computing the HPs of the two rotated SF maps. As can be shown in** Figure 6**, if the angle between the crack line and the horizontal axis is +45°, the intensity of DP2 has one peak value. On the contrary, the DP1 features one peak value of intensity if the angle between the crack line and the horizontal axis is -45° (demonstrated in** Figure 7**).

Since the image size is 100x100, the number of features created by the four PIs is 400. This number of features is relatively large and can impose certain difficulty for the six machine learning algorithms due to the curse of dimensionality [58]. Therefore, it can be beneficial to reduce the size of the feature set. To contract the features obtained from the PIs, a simple moving average technique is applied. More specifically, the average value of* W* consecutive values along the PIs is computed to create PIs with fewer data points (see** Figure 8**). For instance, if* W* = 10 then the total number of features in the contracted PIs is reduced from 400 to 40. Observably, the contracted PIs still preserve important characteristics of the original PIs. Moreover, the moving average technique can be useful to diminish local fluctuations appearing in the original PIs. The reduced set of PI-based features then serves as input pattern to characterize the four types of cracks (AC, DC, LC, and TC) as well as the state of no detected cracks (NC).

#### 4. Experimental Result and Classification Performance Comparison

As stated previously, the data set including 1500 image samples is used to create and verify the performance of the six machine learning models. The data set is divided into a training set (80%) and a testing set (20%). The first set is employed in the model construction phase; the second set is used to demonstrate the model generalization capability. Since a single run may not reflect the true performance of each machine learning approach due to the randomness in the data selection process, this study repetitively performs the training and testing processes 30 times. The model performance is then evaluated by averaging the outcomes obtained from 30 times of training and testing data sampling processes.

In the feature extraction phase, based on several trial-and-error runs, the most suitable window size of MF, the Gaussian function variance parameter (*r*) used in SF, and the window size (*W*) of PI-based feature dimension reduction are chosen to be , 1.5, and 10, respectively.

It is noted that six classification models (NBC, DT, BPANN, RBFNN, SVM, and LSSVM) are employed in the experiment. Besides the two models of BPANN and RBFNN which can be directly used for multiclass classification, the two-class classification versions of NBC, DT, SVM, and LSSVM are extended with one-versus-one (OvO) strategy [59] to deal with the multiclass nature of the pavement crack classification at hand. This strategy is selected due to its good performance; the OvO strategy also helps to avoid the problems of imbalanced data sets [58, 60].

The NBC, DT, BPANN, RBFANN, and SVM models are implemented in MATLAB environment via the Statistics and Machine Learning Toolbox [61]. The LSSVM performs its training and predicting phases via the toolbox developed by [62]. To employ the DT, BPANN, RBFANN, SVM, and LSSVM models, it is necessary to select their tuning parameters. In this section, the tuning parameters that lead to the best testing performance of models are selected. For the DT model, the minimal number of observations per tree leaf is selected to be 1 as suggested by the MATLAB toolbox [61]. The crucial parameter of BPANN is* Nr* (the number of neurons in the hidden layer). In the experiment, as suggested by Heaton [43], this parameter of BPANN is allowed to vary from 2/3*D* +* O* to 1.5*D* (where* D* is the number of input variables and* O* is the number of the output classes). In addition, the Scaled Conjugate Gradient algorithm with the maximum number of training epochs = 3000 is employed to construct the BPANN model. In the case of RBFANN, the number of neurons in the hidden layer (*M*) and a spread parameter (*SP*) must be specified [61]. For the data set in this study, the suitable value of* M* is searched in the ranges of with an interval of 50 and the possible values of SP are within the set of . The regularization parameter and the kernel function parameter of SVM and LSSVM are set via the grid search process described in the previous work of Hoang and Bui [63].

Moreover, to quantify the predictive capability of the six machine learning models, the classification accuracy rate (CAR) for a particular class* i* is calculated as follows:where and represent the number of data samples in the class* i*th being correctly recognized and the total number of data samples in the class* i*th, respectively.

The overall classification accuracy rate (CAR) for all the five class labels is computed as follows:

The pavement crack classification results of the six machine learning algorithms obtained from the repeated sampling of data with 30 runs are summarized in** Table 1**. Focusing on the average values of CARs, LSSVM has obtained the highest overall CAR (92.62%), followed by SVM (91.91%), BPANN (84.79%), CT (76.84%), NBC (75.54%), and RBFNN (74.81%). The average CARs of LSSVM in predicting data in the classes of AC (95.33%), DC (94.00%), NC (91.17%), and TC (94.94%) are also the most desired outcomes. Only for the data in the LC class, SVM (88.06%) is better than LSSVM (87.67%). LSSVM (91.17%), SVM (91%), RBFNN (87.94%), NBC (87.94%), and BPANN (85.44%) show good performances in predicting the NC class. When predicting the data with the ground truth label of NC, only the CAR of CT is lower than 80%. In addition,** Figure 9** shows the box plots of prediction results of the six machine learning models.

Moreover, to better verify the statistical difference between each pair of machine learning models used in the task of multiclass pavement crack classification, the Wilcoxon signed-rank test (WSRT) is used. It is noted that WSRT is a nonparametric statistical hypothesis test that is commonly employed for confirming the statistical difference of models’ predictive capability. Herein, the significance level of the test is set to be 0.05. Using WSRT, the* p* values are calculated based on the 30 runs of experiment with each model. If* p* value of the test is smaller than 0.05, it is confident to state that the predictive performances of the two machine learning models are statistically different. The* p* values obtained from the hypothesis test are reported in** Table 2**. It is clearly shown that the performances of LSSVM and SVM are significantly better than BPANN, CT, NBC, and RBFNN. The result of BPANN is statistically better than those of CT, NBC, and RBFNN. In addition, the CT model is superior to NBC (*p* = 0.023) and RBFNN (*p* = 0.002). NBC and RBFNN have competitive performances (*p* = 0.304) in modeling the data set at hand.

Moreover, this study also analyzes the effect of the window size parameter (*W*), used in reducing the size of the PI-based feature set, on the performance of the machine learning model. Since LSSVM has the highest overall CAR, this algorithm is selected in this analysis. The values of* W* = 10, 5, 4, 2, 1 result in the feature number (FN) = 40, 80, 100, 200, 400, respectively. It is noted that* W* = 1 means that the data set contains all 400 original features obtained from the four PIs. The prediction results corresponding to different sizes of the feature set are reported in** Table 3**. It can be observed that the reduction of FN by means of moving average method has not deteriorated the prediction accuracy. In fact, as confirmed by WSRT reported in** Table 4**, the reduction in FN leads to improvements in the model performance. The model using a data set of 40 features has a better CAR (92.62%) than that obtained from the model employing the original 400 features (90.88%). There is only a small difference between the results obtained from LSSVM models that use 40 and 80 features (*p* = 0.923). However, there are statistical differences between the aforementioned results with those obtained from models using 100, 200, and 400 features (*p* < 0.05).

In addition, it is beneficial to investigate the effect of the DPs on the model performance. Herein, the feature set that contains DPs is compared to the feature set that has no DPs. The latter feature set, used in the previous works of Cubero-Fernandez et al. [6] and Hoang and Nguyen [2], only consists of the horizontal and vertical PIs. The LSSVM model is also used in this analysis for result comparison. The performances of the LSSVM that uses and does not use DPs are shown in** Figure 10**. Notably, focusing on the DC class, the result of the model using DPs (94.00%) is significantly better than that without the inclusion of DPs (87.17%). Moreover, it is interesting to observe that the model using DPs also outperforms that not using DPs in classifying data of other class labels (AC, LC, NC, and TC).

#### 5. Conclusion

To improve the accuracy of the pavement crack classification task, this study constructs an intelligent model that combines image processing and machine learning approaches. The image processing techniques of MF, SF, and PI are used to extract features from digital images. A data set of 1500 images with five class labels of AC, DC, NC, LC, and TC has been prepared. The six machine learning algorithms NBC, CT, BPANN, RBFNN, SVM, and LSSVM have been employed to construct pavement classification models from the collected data set. Experimental results point out that LSSVM and SVM are the most capable machine learning algorithms for classifying the current data set of pavement images. The performance of LSSVM is slightly better than that of SVM. The overall classification accuracy rates of LSSVM and SVM are 92.62% and 91.91%, respectively. In addition, experiments with LSSVM show that the inclusion of DPs can clearly improve the prediction performance of the machine learning model. Accordingly, the LSSVM using the feature extraction method proposed in this study can be a promising alternative for assisting transportation agencies in the task of pavement condition survey.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this research work.