Effective road maintenance requires adequate periodic surveys of asphalt pavement condition. The manual process of pavement assessment is labor intensive and time-consuming. This study proposes an alternative for automating the periodic surveys of pavement condition by means of image processing and machine learning. Advanced image processing techniques including fast local Laplacian filter, Sobel filter, steerable filter, and projection integral are employed for image enhancement and analysis to extract useful features from digital images. Based on the features produced by these image processing techniques, adaptive boosting classification tree is used to perform pavement crack recognition tasks. A dataset of image samples consisting of five classes (alligator crack, diagonal crack, longitudinal crack, noncrack, and transverse crack) has been collected to construct and verify the performance of the adaptive boosting classification tree. The experimental results show that the proposed approach has achieved a high crack classification accuracy which is roughly 90%. Therefore, the newly developed model is a promising alternative to help transportation agencies in pavement condition evaluation.

1. Introduction

To ensure the safety and the serviceability of the road network, periodic survey and assessment of the pavement condition is a crucial task done by transportation agencies around the world [1]. Based on such periodic surveys, various pavement distresses can be identified and documented. This information serves as important input information used in the task of determining pavement rehabilitation methods and allocating resources for concurrently demanding pavement maintenance projects.

Based on a recent statistics done by the Central Intelligence Agency, the total length of road networks in the world has amounted to 64,285,009 km; such length of roads demands an enormous cost for maintenance and upgrading tasks [2]. In Vietnam, according to the report of the General Statistics Office of Vietnam in 2010, the total length of asphalted roads has reached 93,535 km [3]. Due to a large number of existing and road sections and the rapid extension of road networks per year, management and maintenance of asphalt pavements become challenging tasks.

The aging and deterioration of pavements are mainly caused by surface fatigue and shear development in the subgrade, subbase, base, or surface layers [4]. The most easily observable form of pavement deterioration is cracks. Cracks are widely considered to be the most important indicator of pavement condition because this type of distress directly affects in pavement serviceability and structural integrity [5, 6]. Therefore, timely detection of pavement cracks is necessary to evaluate the pavement surface condition and to develop appropriate mitigation measures in order to restore the acceptable quality of roads. It is noted that besides cracks, there are other forms of pavement deteriorations including potholes (small or medium bowl-shaped depressions), depressions (depressed areas in the pavement surface), ruts (channelized depressed areas), upheaval areas, and raveling (disintegration of aggregate particles in the pavement surface). However, the detection of these forms of pavement damages is not within the scope of the current study.

In developing countries like Vietnam, the common approach of the road surface condition is visual inspection performed by human. This method is only effective for surveying a small quantity of road lengths. Nevertheless, the manual process of road inspection is notorious for its low productivity and variations in surveying outcomes due to human subjectivity [7]. Therefore, a robust method for automatic recognition of pavement cracks can help us to expedite the pavement assessment process, enhance the evaluation accuracy, and guarantee the consistency in the assessment result. Such method is currently a pressing need of transportation agencies in many countries.

Since images of pavements can provide a direct presentation of pavement defects related to cracks, two-dimensional digital images have been the subject of interest for many researchers and practitioners. It is because the pavement cracks can be identified via the image pixel intensities and the shape of the crack objects. However, recognizing cracks from digital images is by no means an easy task. This fact is due to the noisy and complex background texture of the asphalt pavement, the heterogeneous pixel intensity, and the inconsistency of the illumination condition [7].

To overcome the aforementioned challenges, a large proportion of the research works have been dedicated in developing automatic pavement crack detection models using image filtering methods. In image processing and computer vision, filtering is a popular approach specifically used to alter the presentation of images and enhance certain features of images. Mahler et al. [8], Kirschke and Velinsky [9], and Cheng et al. [10] proposed algorithms based on intensity thresholding for crack detection; these models rely on the assumption that cracks have a lower intensity value than those of the pavement background. Lee and Kim [11] proposed a simplified method for crack category realization based on the concept of crack-type index; this method relies on the image thresholding technique and information obtained from neighboring pixels to determine the state of crack and noncrack.

An improved Canny edge detection algorithm and an edge preservation filtering algorithm for pavement edge recognition had been proposed by Zhao et al. [12]. Zou et al. [13] put forward a function to measure the difference of image intensity and used this function to enhance the image thresholding outcome. Zhang et al. [14] employed a set of predesigned filters to extract cracks from the background. Salman et al. [15] and Eduardo et al. [16] have proposed automatic crack detection models based on the Gabor filter. Li et al. [17] employed two techniques of steerable matched filtering and active contour model for the task of crack detection and segmentation.

It is noted that besides image filtering methods, there are many other potential algorithms used for crack detection investigated by various scholars; these algorithms include wavelet transform [18], beamlet transform [19], wavelet-morphology-based detection [20], weighted neighborhood segmentation [21], deep learning [2, 2224], fuzzy Hough transform [25], probabilistic generative model [7], and optimized minimal path selection [5]. However, these aforementioned algorithms are not within the focus of the current study.

In addition, recent literature reviews show an increasing trend of combining image filtering and machine learning to develop an intelligent model capable of not only detecting cracks but also categorizing the types of cracks. Rababaah [26] carried out a comparative work which investigated the performance of multilayer perceptron neural network, genetic algorithms, and self-organizing maps in pavement crack classification. Mokhtari et al. [27] recently employed neural network models to tackle the problem of interest; this study concluded that neural network models are more capable than other learning strategies of decision tree and k-nearest neighbor. Banharnsakun [28] combined the advantage of metaheuristic and neural network for pavement surface distress detection and classification; the metaheuristic of artificial bee colony was used in the phase of image segmentation, and the subsequent classification task was performed by neural network.

Recently, an efficient approach which can perform both detection and classification of pavement cracks was proposed by Cubero-Fernandez et al. [29]; this study incorporated various image processing techniques of logarithmic transformation, bilateral filter, Canny algorithm, and a morphological filter in the feature extraction phase; a classification tree is utilized to construct the crack categorization model using the extracted features. Fujita et al. [30] and Wang et al. [31] proposed crack classification models that employed the support vector machine. Hoang and Nguyen [32] relied on the steerable filter to extract useful features from pavement images and employed machine learning algorithms including support vector machine, neural network, and random forest to classify the images into categories of longitudinal, transverse, and alligator cracks, as well as the status of the intact pavement.

Based on recent review works [33, 34], there is an increasing trend of applying image filtering and machine learning approaches in pavement crack classification. However, due to the aforementioned challenges of crack detection from noisy and complex background texture of asphalt pavements, other advanced image processing techniques should be investigated to improve the accuracy of the automatic crack recognition process. This study constructs and compares the performances of four feature extraction methods that rely on image filtering techniques of fast local Laplacian filter (FLLF), Sobel filter (SBF) for edge detection, and steerable filter (STF) as well as projection integral (PI). The fast local Laplacian filter (FLLF) is applied as a preprocessing step to better smooth the image and highlight edges in the image. SBF and STF are employed to create crack prominent maps. PI finally utilizes these prominent maps to produce a vector of features. The adaptive boosting classification tree (Adaboost CTree) is selected to use the feature vectors extracted from the image for categorizing the pavement crack status. The aforementioned hybrid filtering approaches (FLLF-based SBF and FLLF-based STF) have demonstrated positive effects on the classification performance of Adaboost CTree.

The rest of the study is organized as follows: Section 2 reviews the research methodology, followed by the section that describes the collected dataset of asphalt pavement images (Section 3). Section 4 describes the proposed approach of automatic pavement crack detection, followed by the experimental results (Section 5). Section 6 summarizes the current study with several remarks.

2. Research Methodology

2.1. Image Filtering Approaches
2.1.1. Fast Local Laplacian Filter (FLLF)

FLLF is an edge-preserving image filtering technique [35]. This image processing technique is an improved version of the standard local Laplacian filter (LLF) which was developed by Paris et al. [36]. LLF is the algorithm based on the Laplacian pyramid which is widely employed in the tasks of decomposing images into multiple scales and image analysis. In image processing field, pyramid representation is a form of multiscale signal representation in which a digital image is processed by repeated smoothing and subsampling. In FLLF, the output image is obtained by collapsing the output pyramid. To implement FLLF, one first needs to specify a remapping function r and intensity threshold Sr. Accordingly, the process of FLLF can be divided into three major steps [37]:(i)First, FLLF uses point-wise nonlinearity function r(.) which depends on the Gaussian pyramid coefficient  = Gl[I] (x, y) where l denotes the level of the Gaussian pyramid and (x, y) represents the position of the pixel to process input image I. For various values of , this approach obtains a large number of intermediate images.(ii)Second, FLLF integrates all of these intermediate images, and computes each output coefficient Ll[O] (x, y) of the Laplacian pyramid of the transformed image.(iii)Third, the method collapsed the output pyramid L(O) to obtain the output image O.

Paris et al. [36] proposed the remapping functions in the following formula:where denotes the coefficient of the Gaussian pyramid. α determines the amount of detail increase (0 ≤ α< 1) or decrease (α> 1), β governs the dynamic range compression (0 ≤ β< 1) or expansion (β> 1), and Sr represents the intensity threshold to separate details in the image from edges.

Aubry et al. [35] stated a general form of the function r(.) as follows:where f denotes a continuous function. In fact, Equation (1) is a special case of Equation (2) in which .

For the purpose of image enhancement and image smoothing, Aubry et al. [35] defines the function f aswhere denotes the Gaussian function expressed in the following form:where mf is a parameter denoting the amplitude magnification factor.

In essence, mf affects the smoothing of details, and Sr characterizes the amplitude of edges in I [38]. The effects of FLLF on asphalt pavement images with different scenarios of the amplitude magnification factor mf and the intensity threshold Sr are illustrated in Figure 1. It is noted that before being analyzed by FLLF, the images are preprocessed by the median filter to remove the dot noise. In this study, based on several trial and error experiments, the median filter with a window size of 5 × 5 pixels has been selected for noise suppression.

2.1.2. Steerable Filter

The steerable filter (STF), proposed in the previous work of Freeman and Adelson [39], is a popular technique for image processing. This technique relies on orientation-selective convolution kernels to highlight edges in digital images. STF is highly helpful for the task of analyzing patterns existing on the surface of asphalt pavements; this technique has been successfully employed in crack classification [17, 29, 32] as well as other types of pavement distress [40].

To implement the SF technique, a linear combination of the Gaussian second derivatives is employed as the basic filter. A 2D Gaussian at a certain pixel coordination (x, y) within an image I is demonstrated in the following equation:where r is a free parameter which is the variance of the Gaussian function.

Applying the STF technique with different values of the angle , a set of different filters can be obtained as follows:where Gxx, Gxy, and Gyy denote the Gaussian second derivatives. The formulas of these derivatives are presented as follows [32]:

It is noted that the value of is often varied from 0° to 360°. Figure 2 provides examples of STF responses of a pavement image with different selections of r. In this figure, the original image has been preprocessed by the median filter with the window size of 5 × 5 pixels. As can be seen from the examples, a too small value of r leads to very weak signal of crack patterns. On the contrary, if r is too large (e.g., r = 2.0), the background texture of the asphalt pavement becomes more visible and this may hinder the crack detection and classification process.

In addition, the resulting STF response of a digital image I is calculated compactly in the following equation:where “∗” denotes the convolution operator.

2.1.3. Sobel Filter for Edge Detection

As described in the previous work of Sobel [41], the Sobel filter (SBF) is a widely used technique for detecting edges in digital images. This technique reveals edges in an image by smoothing the image before computing the derivatives in the direction which is perpendicular to the derivative. To implement SBF, the filter hx is employed to smooth the image in the x direction:

The convolution and the smoothing operators are both linear and can be combined in the following way:

In the same manner, the filter that computes the partial derivative in the y direction is computed as follows:

At each pixel within a digital image, the resulting gradient approximations are combined to yield the gradient magnitude which is calculated in the following way:

It is worth noticing that a threshold value Ts must be prespecified to obtain the image with detected edges. If the Sobel gradient values of pixels are smaller than the threshold value Ts, they are replaced by these threshold values [42]. Figure 3 presents the analysis results of edge detection using the Sobel algorithm of a pavement image in which different values of the threshold Ts are attempted. As can be seen from the examples, if Ts = 0.01, the resulting image is filled with edges detected from the background texture. On the contrary, if Ts > 0.1, virtually no signal of edges are captured by the algorithm. It deems that Ts = 0.05 is the right value because the edges of the true crack existing in the image has been revealed.

2.1.4. Projection Integral Technique

The projection integral technique (PIT) is an effective technique used in shape and texture recognition [43]. This image analysis technique has been recently successfully employed in pavement crack classification [29, 32]. Using this method, the image is first converted from color image to grayscale image. The average value of gray intensity at each location of the image along an axis is computed to obtain a projection integral (PI). Therefore, PI is always associated with a certain axis. PIs along the horizontal and vertical axes are often computed and employed in object recognition. Horizontal PI (HPI) and vertical PI (VPI) are calculated in the following way:where HPI and VPI are the horizontal and vertical PIs, respectively. xy and yx represents the set of horizontal pixels at the vertical pixel y and the set of vertical pixels at the horizontal pixel x of an image I(x, y), respectively.

Besides the two commonly employed HPI and VPI, the diagonal PI (DPI) can also be helpful in the task of pavement crack classification. It is noted that, for each image of pavement, there are two DPIs. To compute these two DPIs, the map of the SF response is rotated with the angles of +45 and −45 to create two rotated SF maps. The two DPI1 and DPI2 are obtained by computing the HPIs of the two rotated SF maps. The illustrations of PIs of the pavement images are provided in Figures 4 and 5. It is noted that, in Figure 4, the four PIs of an image are produced from the STF response. In Figure 5, the salient crack map created by SBF is employed to compute the four PIs.

2.2. Adaptive Boosting Classification Tree

Classification tree (CTree), developed by Breiman et al. [44], is an effective data mining approach widely employed for data categorization [45, 46]. The CTree algorithm automatically reveals the hidden structural patterns in data and expresses the discovered patterns of the data as tree-like structures [47, 48]. CTree belongs to the group of supervised learning methods. Therefore, a training phase that requires a set of labeled data must be performed to construct the data categorization model. During the training phase of CTree, the training dataset is splitted into subsets using all predictor variables to create two child nodes in the tree-like structure [44]. It is noted that the most suitable predictor variables used for splitting operation is chosen by computing the value of an impurity function.

The data splitting process occurred in the training phase has the purpose of putting data into subsets that are as homogeneous as possible for each data category. The Gini impurity function is widely used to quantify the data homogeneous property; the Gini impurity is shown in the below equation [47]:where a Gini impurity index of data subset is computed as follows [49]:where represents the number of data categories and denotes the ratio of present of class in this set.

When the training phase is successfully accomplished, a CTree model is represented by a root node, a set of internal nodes, and a set of terminal nodes. It is noticed that each node in the tree is essentially a binary decision that categorizes the predictor variable into either one of the two class labels. Thus, CTree carries out the data classification process in a top-down manner from the root node to the terminal node.

Moreover, in data mining, adaptive boosting [50], or AdaBoost for short, is a well-known ensemble learning strategy for enhancing the classification accuracy of a classifier through the process of adaptive reweighting and combining a set of individual models [51]. AdaBoost ensemble of CTrees can be defined as a combination of multiple CTrees in which the final prediction result is obtained by combining the outputs of individual trees. Based on previous works [5256], ensemble models have demonstrated better performance than individual models in a wide range of applications. The AdaBoost algorithm is demonstrated in Figure 6.

3. The Dataset of Asphalt Pavement Images

Since Adaboost CTree is a supervised learning approach, a dataset of pavement images with the prespecified ground truth categories must be prepared for the model construction and testing phases. To achieve this goal, the current study has collected pavement images in Da Nang city (Vietnam). The images are acquired with the employment of the digital camera held at the distance of about 1.2 m above the road surface. To speed up the phases of data processing and data classification, the images are resized to be 100 × 100 pixels. There are five classes of pavement conditions; they are alligator crack (AC), diagonal crack (DC), longitudinal crack (LC), noncrack (NC), and transverse crack (TC). Each group of images has 400 samples; hence, the total number of data samples in the collected dataset set is 2000. The image dataset are demonstrated in Figure 7. It is noted that each pixel represents an area of approximately 3.6 × 3.6 mm; therefore, the pavement area contained in each image sample is about 360 × 360 mm.

4. Automatic Pavement Crack Recognition Using Fast Local Laplacian-Based Steerable and Sobel Filters Integrated with Adaptive Boosting Classification Tree

This section describes the structure of the proposed automatic model for pavement crack categorization. The proposed model combines the advanced image processing techniques and the machine learning method of Adaboost CTree. Advanced image processing techniques consist of FLLF, STF, SBF, and PIs. It is noted that the original pavement images have been preprocessed by the commonly used median filter to remove dot noise of the image background. FLLF is then used to concurrently smooth the image and highlight the edges. After being processed by FLLF, the enhanced image is either manipulated by STF or SBF to create a salient map of cracks. Based on such salient map, PIs of the image are computed to serve as input features used by Adaboost CTree to classify the image into AC, DC, NC, LC, and TC categories. The overall picture of the proposed model is presented in Figure 8.

The model basically includes two modules: feature extraction based on the image processing technique and data classification based on Adaboost CTree. It is noted that the proposed model including the two modules has been constructed in the MATLAB environment with the employment of the Image Processing Toolbox [38] and the Statistics and Machine Learning Toolbox [57].

It is noted that in the feature extraction step, the maps created by the STF and SBF responses are used to compute four PIs, namely, HPI, VPI, and two DPIs (DPI1 and DPI2). The PIs of the pavement images yielded from STF and SBF are illustrated in Figures 9 and 10, respectively. Based on these figures, it can be shown that an image with a longitudinal crack generally produces a distinctive peak in its VPI. On the contrary, an image with a transverse crack yields a distinctive peak in its HPI. Moreover, the average values of PIs of images containing alligator cracks are higher than those of images containing no cracks. Notably, the two DPIs are especially useful in recognizing diagonal cracks. To compute these two DPIs, the maps of the STF and SBF responses are rotated with the angles of +45 and −45 to generate two rotated crack maps. The two DPIs are attained by calculating the HPIs of the two rotated crack maps.

As mentioned earlier, with the image size of 100 × 100 pixels, the number of features generated by the four PIs is 400. With this size of features, the predictive capability of the Adaboost CTree model may be hindered due to the problem of the curse of dimensionality [58]. Thus, it can be of great usefulness if the feature size can be reduced. To do so, this study employs a simple moving average approach within which the average value of W consecutive values along a PI is calculated to create PIs with sampled data points. This process of feature reduction is illustrated in Figure 11. With W = 10, the total number of features in the reduced PIs is reduced from 400 to 40. Compared with the original PIs, the smoothed PIs have fewer data points and most importantly still present essential features of the original PIs.

Accordingly, the reduced PIs are employed to general numerical features as input patterns which are used by Adaboost CTree to recognize the types of cracks (AC, DC, LC, and TC) as well as the condition of intact pavement (NC). It is noted that, in this study, the Adaboost CTree model has been used with the one-versus-one (OvO) strategy [59] to cope with the multilabel data classification. The reason of selecting OvO is that this strategy can deliver good prediction performance and can help us to avoid the imbalanced data classification problem [58, 60].

5. Experimental Results

To construct and verify Adaboost CTree, the collected image dataset has been separated into two sets: the training set (70%) and the testing set (30%). The first set is used to establish the learning model, and the second set is employed to inspect the predictive performance of the Adaboost CTree-based crack categorization model. Moreover, because one time of training and testing may not express the true predictive capability of the newly developed approach due to the problem of randomness in the selecting data, a repetitive data subsampling has been carried 20 times. The Adaboost CTree performance is assessed by averaging the prediction results attained from the 20 times of training and testing phases.

Moreover, the image processing techniques used in the feature extraction phase require the specification of several tuning parameters. In this study, these parameters are selected via several trial and error experiments with the collected pavement images. The setting of the parameter of the image processing techniques is as follows:(i)The window size of the median filter is 5 × 5 pixels(ii)The amplitude magnification factor mf = 3 and the intensity threshold Sr = 0.15(iii)The variance of the Gaussian function used in STF is 1.5(iv)The threshold value Ts of SBF is 0.02(v)The window size (W) used to smooth the PIs is 10

In addition, to express the predictive capability of the Adaboost CTree-based crack recognition model, the classification accuracy rate (CAR) for a class label i is computed by the following equation:where and denote the number of data samples in the class ith being correctly recognized and the total number of data samples in the class ith, respectively.

The overall classification accuracy rate (CAROverall) for all the five class labels is calculated by the following equation:

The prediction accuracy of the Adaboost CTree models with the two employed SBF and STF for creating the salient crack maps is reported in Table 1. These two models are denoted as Adaboost CTree-SBF and Adaboost CTree-STF. As can be seen from this table, Adaboost CTree-STF has the CARs of the AC class (93.17%), DC class (89.54%), LC class (89.58%), NC (84.92%), and TC class (91.38%). These outcomes are better than those yielded by the Adaboost CTree-SBF with the CARs of the AC class (90.50%), DC class (80.83%), LC class (83.71%), NC (79.38%), and TC class (88.08%). The overall CAR of Adaboost CTree-STF (89.72%) is also higher than that of Adaboost CTree-SBF (84.50%).

Furthermore, Figure 12 demonstrates the box plots of prediction results of the Adaboost CTree-SBF and Adaboost CTree-STF classification approaches. To further validate the statistical difference of the Adaboost CTree-SBF and Adaboost CTree-STF, the Wilcoxon signed-rank test (WSRT) is employed in this section. WSRT is a popular nonparametric statistical hypothesis test which is often employed for result comparison [61]. In this study, the significance level of WSRT is chosen to be 0.05. If the value computed from the test is smaller than 0.05, it is able to confirm that the pavement crack classification results of the Adaboost CTree-SBF and Adaboost CTree-STF are statistically different. With value = 0.00008, it is confident to state that Adaboost CTree-STF is significantly better than Adaboost CTree-SBF.

6. Conclusion

This study proposes an integration of image processing and machine learning approaches for automatic pavement crack recognition. Advanced image processing techniques including FLLF, SBF, STF, and PI are employed to extract numerical features from digital images. The Adaboost CTree utilizes the extracted features to perform crack recognition tasks. A dataset of 2000 image samples with five classes of asphalt pavement conditions (AC, DC, LC, NC, and TC) has been collected to train and validate the proposed integrated approach. An experiment using a random subsampling process and WSRT points out that Adaboost CTree-STF is significantly better than Adaboost CTree-SBF.

Since the current practice of pavement survey in Vietnam still heavily relies on the survey of human inspectors and manual data analysis processes, the new approach based on the Adaboost CTree classification model integrated with STF can provide a helpful tool to accelerate the periodic surveys of roads by boosting the productivity of the data acquisition and analysis processes. Thus, the newly constructed model can be highly useful for the local transportation agencies and authorities to manage their road sections effectively.

Based on the collected image samples, the smallest crack opening that the Adaboost CTree-STF model can detect is about 8 mm. Since the ability to detect small cracks can be essential for early warning of pavement deterioration, image samples with thinner crack opening should be collected in a future work to enhance the applicability of the current model. Moreover, since the current stage of the study is performing preliminary survey on pavement conditions. The details of crack length and opening have not yet been available for analysis. Therefore, the current model can be extended by employing image thresholding and image segmentation techniques to separate the crack objects from the pavement background. Accordingly, information regarding the length and the opening of cracks can be measured. In addition, other developments of the current study may include the investigation of other novel machine learning approaches in the task of asphalt pavement crack recognition and the extension of the current dataset to include other type of cracks (e.g., reflective cracks or block cracks) as well as other forms of pavement defects (such as potholes, ruts, depression, upheaval, and raveling) to enhance the applicability of the current prediction model.

Data Availability

The dataset used in the study is provided in the supplementary file.

Conflicts of Interest

The authors confirm that there are no conflicts of interest regarding the publication of this manuscript.

Supplementary Materials

The supplementary file contains the dataset used in this study. In this file, the first 40 columns are the input features of the data (which are the projection integrals); the last column is the class labels (1 = alligator crack, 2 = diagonal crack, 3 = longitudinal crack, 4 = noncrack, and 5 = transverse crack). (Supplementary Materials)