Abstract

During the phase of periodic survey, sealed crack and crack in asphalt pavement surface should be detected accurately. Moreover, the capability of identifying these two defects can help reduce the false-positive rate for pavement crack detection. Because crack and sealed crack are both line-based defects and may resemble each other in shape, this study puts forward an innovative method based on computer vision for detecting sealed crack and crack. This method is an integration of feature extraction based on image processing and metaheuristic optimized machine learning. Image processing is used to compute features that characterize visual appearance and texture of the pavement image. Subsequently, Salp Swarm Algorithm integrated with multiclass support vector machine is employed for pattern recognition. Based on experimental results, the newly developed method has achieved the most desired predictive performance with an accuracy rate = 91.33% for crack detection and 92.83% for sealed crack detection.

1. Introduction

Asphalt pavement is one of the most important components of transportation network. Nowadays, intensive traffic loads and inclement weather conditions expedite the deterioration rate of asphalt pavement roads. For instance, the American Society of Civil Engineers and the U.K. Institution of Civil Engineers have both graded their nation’s pavement infrastructure with a D which indicates poor condition [13]. Thus, the assessment of pavements’ conditions as well maintenance of its safety and serviceability are crucial tasks of transportation management agencies around the world.

To reach such goals, pavement management systems are required to collect and analyze input information regarding pavement surface condition [4, 5]. One of the major outputs of these systems is to establish a proper maintenance plan that requires the shortest rehabilitation time and incurs a minimum cost [6, 7]. To do so, it is a requirement that the pavement management systems receive accurate evaluation results of asphalt pavement conditions in a timely manner.

However, such a requirement is increasingly harder to accomplish for contemporary transportation management/maintenance authorities. One typical reason is that the manual procedure of pavement condition survey is still the dominant method, especially in developing countries. Although the manual procedure performed by road maintenance technicians or transportation inspectors can help to obtain accurate assessment outcomes, it is also highly labor-intensive. Consequently, the conventional pavement condition survey is a time-consuming process that requires significant efforts in both data collection and data processing. The limited number of experienced technicians/inspectors and the sheer number of existing asphalt pavement roads make the task of timely condition survey a severe challenge [8].

To tackle the aforementioned challenge, researchers and practitioners have increasingly relied on automated pavement health monitoring systems. These systems have harnessed cutting-edge computer vision technologies as well as advanced image processing techniques/algorithms to collect, analyze, and make inference from digital asphalt pavement images. As pointed out by Dong and Catbas [9], the advantages of such state-of-the-art methods are safety for inspectors, fast data processing, low equipment cost, high degree of automation, and low interference to the daily operation of the surveyed road sections.

Hence, advanced methods based on 2D pavement analyses have been developed to recognize various pavement distresses [1012]. Among such distresses, cracking (refer to Figure 1(a)) is observably the most widely encountered one in asphalt pavements [8, 1315]. The crack is an early signal of the deterioration of pavement surfaces and crack repair is required to ensure road serviceability and prevent the appearances of more intensive cracks, large craters, and potholes. Therefore, as stated in Ref. [16], accurate and timely detection of crack is essential for the process of asphalt pavement condition survey. Such detection can significantly help to conserve maintenance budgets, establish proper plans of maintenance/rehabilitation, and assure the long-term serviceability of asphalt pavements [17].

During road maintenance, engineers often resorted to crack sealing (refer to Figure 1(b)) to rehabilitate pavement surface. In crack sealing operation, an adhesive sealant is placed into cracks. This process helps to prevent the infiltration of moisture and harmful objects into the asphalt pavements [18]. Crack sealing is a cost-effective pavement rehabilitation method that can retard pavement’s deterioration and prolong pavement service life. However, a sealed crack itself is considered a form of surface defect and the detection of its appearance is also important for pavement management system [19]. In addition, sealed cracks and cracks are treated differently in pavement condition assessment [20]. Therefore, both of these two forms of pavement distresses should be detected accurately via surveying methods based on computer vision based.

Although great achievements have been made with asphalt pavement crack detection [21], the number of research works dedicated to recognizing both the crack and sealed crack defects is still limited [7]. As pointed out in Ref. [20], the two main challenges in distinguishing both cracks and sealed cracks under the same framework are the complicated texture of the pavement background and similar gray intensity as well as shape of crack and sealed crack. Thus, advanced image-processing techniques can be used for visual segmentation and isolating crack objects. Furthermore, image texture descriptors with good discriminative power can be helpful for characterizing the coarseness/fineness of the image region and thereby classifying cracks/seal cracks.

In addition, the integration of image processing and machine learning has proven to be effective for developing automatic computer vision models in structural health monitoring [2224]. Therefore, this study puts forward an automatic method for categorizing crack and sealed crack image samples by the utilization of image processing and image texture analysis. The employed image processing techniques include Gaussian steerable filters [25, 26] and projection integrals [27, 28]. The first technique is utilized for delineating crack objects and asphalt pavement background. The latter one is highly useful for shape description. To distinguish between a crack and a sealed crack, this study relies on image texture descriptors of statistical properties of color channels (red, green, and blue channels) [29] and attractive-and-repulsive center-symmetric local binary pattern (ARCS-LBP) [30].

The combination of the employed image analysis methods including the Gaussian steerable filters, projection integrals, and texture descriptors aim at tackling the aforementioned challenges pointed out by Zhang et al. [20]. The Gaussian steerable filters and projection integrals are used for distinguishing cracks/sealed cracks from pavement background. In addition, texture information computed with the statistical properties of color channels and ARCS-LBP can help identify cracks and sealed cracks. This study relies on the nonlinear classifier of support vector machine (SVM) [31] to analyze a set of numerical features extracted by the aforementioned image processing and texture descriptors. This classifier is used to classify image samples of asphalt pavements into three categories of “noncrack,” “sealed crack,” and “crack.”

The SVM is selected in this study because its capability in nonlinear and multivariate data classification has been demonstrated in various studies [3234]. Furthermore, this work employs the Salp Swarm Algorithm (SSA) [35] to optimize the performance of the SVM-based crack/sealed crack classification model. The SSA is a novel nature-inspired metaheuristic used for global optimization. In this study, we formulate the SVM model selection coupled with the model training phase as an optimization task which can be effectively solved by the SSA metaheuristic. In order to train and validate the usefulness of the proposed framework, we have collected an image dataset including 300 samples of asphalt pavement surfaces during field trips in Danang city (Vietnam).

The rest of the article is organized as follows: The next section reviews previous studies pertinent to the current research. The research methodology including the employed image-processing techniques, image texture descriptors, machine learning, and metaheuristic is presented in the third section. The fourth section describes the proposed method for automatic detection of pavement cracks and sealed cracks. Experimental results and discussion are reported in the fifth section, followed by the section of concluding remarks.

Due to the importance of the crack detection problem in periodic survey of the pavement surface, various sophisticated methods based on image processing and machine learning have been proposed to tackle the problem of interest. Herein, we focus on previous works that employ image-processing techniques, image texture analyses, and machine learning models for automatic detection of asphalt pavement distresses, especially pavement cracks. Hu et al. [36] trained a SVM-based model based on a set of texture features and shape descriptors; the model is used for recognizing cracks in pavement surfaces. Gavilán [37] constructed an image-processing method for detecting different crack types; this method employs SVM for pattern recognition and projection integral analysis for extracting signals of pavement cracks from digital images.

A sophisticated method that relies on segment pattern analysis and linear discriminant analysis has been proposed in Ref. [38] to extract features from pavement images; the authors then employed machine learning models of neural network, SVM, and nearest-neighbor classifier. Prasanna et al. [39] constructed a crack appearance vector consisting of intensity-based, gradient-based, and scale-space features used for crack status categorization; the authors relied on SVM, AdaBoost, and random forest for carrying out pattern recognition tasks. Kamaliardakani et al. [19] developed an automatic sealed crack detection based on image-processing techniques of image segmentation with a local minimum approach and the Otsu method for image thresholding; the authors relied on a subsequent optimal threshold identification approach based on Bayes minimum error [40] for performing image segmentation; moreover, morphological operations of filling and closing [41] are used for image processing and revealing cracks.

A crack classifier named Random Structure Forest has been proposed in Ref. [42] to identify cracks in patches of pavement images; morphological operations have also been used for removing tiny redundant objects and enhancing the crack detection outcome. Zhang et al. [43, 44] relied on convolutional neural network (CNN) classifier to recognize pavement cracks; the CNN was found to be highly suited for the collected image dataset. Yokoyama and Matsumoto [45] also constructed CNN based models used for classifying crack status; the authors observed that stain in the background surface can deteriorate the performance of the CNN-based crack detection process.

Radopoulou and Brilakis [3] use the semantic texton forests algorithm as a supervised classifier for pavement crack detection. Performance of the CNN model is compared to that of edge detection algorithms in Ref. [46]; the research finding is that the employed machine learning method significantly outperformed the conventional edge detectors by a classification accuracy rate of 12%. Hoang and Nguyen [47] utilize image-processing techniques and machine learning models of the SVM, neural network, and random forest for crack pattern recognition; experimental results confirm the superiority of the SVM over other two approaches. A method named crack deep network is proposed in Ref. [7] for recognizing crack and sealed crack from complex pavement background.

Han et al. [48] proposed an advanced Otsu method integrated with edge detector and tree-based classifier for crack detection in highway; Gaussian function-based spatial filtering and top-hat transform are also employed for image enhancement and performances of various edge detectors including Prewitt, Sobel, Gauss–Laplace (LoG), and Canny are assessed; the research finding is that the tree-based classifier is capable of recognizing crack patterns effectively. Ranjbar et al. [15] relies on transfer learning used with pretrained deep neural network models for pavement crack detection; however, the proposed method has not considered sealed cracks as objects of interest. Hough transform technique has been used in Ref. [8] to reveal essential features of cracks; subsequently, SVM, neural network, and tree-based classifiers are used to classify crack patterns. Chen et al. [17] introduced the second-order directional derivative to characterize the crack structure; the method has achieved good detection accuracy; nevertheless, the model performance used in sealed crack detection has not yet been investigated.

Based on the current literature, there is an increasing trend of applying advanced image processing and machine learning methods in pavement crack detection. This trend has also been pointed out by the reviewing works of Zakeri et al. [49]; Hsieh and Tsai [21]; and Cao et al. [50]. In addition, competent machine-learning methods such as the SVM, shallow neural network, tree-based classifier, and CNN are dominantly employed. The method of the SVM has been shown to be superior to other employed models [37, 47, 51].

Different from the SVM, the CNN, which is also a supervised learning approach, does not require handcrafted features such as image texture or shape descriptors. It is because a series of convolutional layers within a CNN structure is capable of automatic feature computation [9]. Nevertheless, a CNN model typically demands a sufficiently large amount of labeled data for effectively training its structure which may contain a huge number of tuned parameters [15]. When the number of training samples is limited, the performance of the CNN-based models can be inferior to those that rely on handcrafted features [52, 53]. However, comparative studies regarding the predictive performance of CNN and machine learning model employing image processing-based feature extractor used for crack and sealed crack detection under the circumstance of limited labeled data are rarely reported. Thus, the current study attempts to fill this gap in the literature by proposing an integrated model that employs image processing-based feature extraction and SVM-based crack/sealed crack recognition. In addition, we also put forward a novel framework that utilizes the nature-inspired metaheuristic of SSA for optimizing the crack and sealed crack detection model.

3. Research Methodology

This section of the article presents the research methodology of the current study. The research methodology includes four main sections: image acquisition, image texture computation, model optimization, and model construction. The overall research methodology is depicted in Figure 2. The subsequent parts of this section review the image texture descriptors’ use for feature extraction, the machine learning, and the metaheuristic algorithm employed for model optimization. Since crack pixels generally contrasts with those of pavement surface and pavement images are often perturbed by complex background and noise [17, 54], this work relies on the methods of Gaussian steerable filters and projection integrals to isolate crack objects and analyze their visual appearance.

Furthermore, considering the fact that cracks and sealed cracks can have similar morphological features (e.g. length and orientation), using visual appearance alone may not be sufficient to deliver the desired detection performance. This study proposes to integrate image texture descriptors including the statistical properties of color channels and ARCS-LBP for detecting both crack and sealed crack objects. The novel integrated feature extraction operator is therefore crucial in our study because it helps to recognize objects of interest within cluttered scenes of asphalt pavement surfaces [55]. In addition, SVM, as one of the most capable classifiers [56], is used to construct optimal hyperplanes that divide the input features into three classes of noncrack, sealed crack, and unsealed crack. To optimize the SVM performance, this study also proposes to incorporate the SSA metaheuristic [35] into the model construction phase of the SVM model.

3.1. Image-Processing Techniques
3.1.1. Gaussian Steerable Filter (GSF)

GSF, first introduced in the previous works of Refs. [25, 26], relies on orientation-selective convolution kernels to achieve both goals of noise suppression and edge feature enhancement. An illustration of the GSF application in processing asphalt pavement image is provided in Figure 3. This image-processing technique has been effectively employed for pavement crack detections [5759]. To construct a set of GSF, a linear combination of Gaussian second derivatives is employed.

Given an image sample with (x, y) denotes a pixel’s coordinates, a 2-dimensional Gaussian with variance of a pixel is given by Ref. [26]:

The 1st order derivatives employed to calculate the filters with rotation angles of 0° and 90° are expressed as follows:

For an arbitrary orientation β, a Gaussian steerable filter is expressed as Ref. [26]:

It is noted that when the value of the Gaussian function variance is fixed, the final filter response is a combination of GSF with a set of orientation β. The value of β is often selected from a set of angles, i.e., . The final GSF response at the pixel location (x, y) within an image I can be obtained via the following formula:where “∗” denotes the convolution operator.

3.1.2. Projection Integral (PI)

In the image-processing field, PI [27, 60] is a technique employed for analyzing visual appearance of objects within a scene of interest. Particularly for civil engineering applications, PI has been successfully applied for structural defect detections [28, 47, 61, 62]. Given an image sample, four PIs, which include a horizontal PI (HPI), a vertical PI (VPI), and two diagonal PIs, can be computed.

The HPI and VPI (refer to Figures 4(a) and 4(b)) are computed as the summation of pixels’ intensity along a thread xy or yx as follows [63]:

To recognize diagonal cracks on pavement surface, two diagonal PIs (refer to Figures 4(c) and 4(d)) named as Diagonal Projection Integral 1 (DPI1) and Diagonal Projection Integral 2 (DPI1) are used. The two DPIs are computed as follows [63]:

3.2. Image Texture Descriptors
3.2.1. Statistical Measurement of Color Channels

Since properties of pixels’ color channels are useful to delineate asphalt pavement defects [64], this study computes statistical measurements of object’s colors and integrates these measurements into the feature extraction phase. Given a image sample I, it is necessary to construct the first-order histogram P(I) that contains the statistical distribution of color intensity [29]. Based on the constructed P(I), three statistical measurements of mean , standard deviation , and skewness can be computed as follows:where NL = 256 are the number of discrete intensity values for 8-bit images. c denotes the color channels of red, green, and blue.

3.2.2. Attractive-and-Repulsive Center-Symmetric Local Binary Patterns

Attractive and Repulsive Center-Symmetric Local Binary Patterns (ARCS-LBP), proposed in Ref. [30], is an improved version of the original Center-Symmetric Local Binary Patterns (CS-LBP) [65]. The ARCS-LBP method considers four triplets corresponding to the vertical, horizontal, and two diagonal directions via the inclusion of the intensity of the central pixel (refer to Figure 5). Therefore, the newly proposed approach inherits the advantages of good local structure description, tolerance against illumination changes, and fast computation from the original CS-LBP. ARCS-LBP also enhances CS-LBP’s discrimination capability in terms of gradient and texture analysis.

To define the relationship between a central pixel and its surrounding neighbors, the attractive and repulsive binary thresholding functions are employed:where [, , ] denotes one of the four triplets that correspond to the horizontal, vertical, and two diagonal directions within an image neighborhood.

Generally, a local structure within a neighborhood is considered to be attractive if the central pixel has a lower gray intensity than an adjacent pixel. On the contrary, this structure is repulsive if the gray intensity of the central pixel is higher than that of a surrounding pixel. Accordingly, the Attractive Center-Symmetric Local Binary Patterns (CSBPA) and Repulsive Center-Symmetric Local Binary Patterns (CSLBPR) can be defined as follows:where  = [26, 25, 24, 23, 22, 21, 20] and and are the attractive and repulsive binary gradient vectors, respectively. These two gradient vectors are computed by using the aforementioned attractive and repulsive binary thresholding functions. Besides and , El merabet et al. [30] also propose to integrate indices of average local gray level and average global gray level into the texture computation process.

3.3. Salp Swarm Algorithm (SSA)

To improve the performance of machine-learning models, metaheuristic algorithms have been employed to automatically fine-tune their hyper-parameters. It is because the hyper-parameters play an important role during the learning phase of the machine-learning model. Accordingly, previous works have resorted to metaheuristic algorithms to tackle the task of interest. Recently, various novel algorithms have been successfully utilized including artificial bee colony [66], differential flower pollination [67], Henry Gas Solubility Optimization [68], Harris hawks optimization [69], symbiotic organisms search [70], monarch butterfly optimization [71], and social spider optimization [72]. However, the task of hyper-parameter adjustment is data-dependent and challenging because the hyper-parameters are continuous values; thus, there is an infinite number of possible candidates. Therefore, an investigation of the capability of other advanced metaheuristic algorithms in optimizing machine learning models is a pressing need.

The SSA [35] is a novel nature-inspired metaheuristic that has demonstrated its capability in solving complex optimization problems. This metaheuristic approach possesses good performance in both exploration and exploitation phases [73, 74]. The SSA metaheuristic has been employed successfully in various problems including optimization of machine learning models [75, 76]. Therefore, this study relies on SSA to optimize the SVM-based detection of crack and sealed crack.

The SSA algorithm is inspired by the swarming behavior of salp swarms when they navigate and forage in oceans [35]. In this study, the SSA metaheuristic is employed to optimize the machine-learning model used for the task of classifying crack and seal crack data samples. To explore the search space, the swarm is divided into two groups: leader salp and its followers. Given an optimization problem with n decision variable and an objective function F(x), the leader position is revised according to the following equation:where denotes the leader salp in its jth dimension. Fj is the position of the food source. ubj and lbj represent the upper and lower boundaries of the jth dimension, respectively. c1, c2, and c3 are three random numbers.

In addition, the parameter c1 strongly affects the leader’s position updating. Therefore, c1 determines the balance between exploration and exploitation processes; it is computed as follows:where l denotes the current searching iteration and L specifies the maximum number of searching iterations.

The SSA metaheuristic employs Newton’s law of motion to update the position of the followers in the swarm. The position of the followers is revised as follows:where denotes the new position of a follower in its jth dimension; t is time; is the initial speed of the salp; and a = .

Since the variable of time (t) is equivalent to the searching iteration (l) and the initial speed  = 0, the above equation can be stated as

Finally, if a member of the swarm goes beyond the prespecified lower and upper boundaries, the following rules are applied to amend its position:

3.4. Support Vector Machine (SVM)

Based on the robust statistical learning frameworks, SVM [31] remains one of the most capable nonlinear and multivariate classifier used for the complex pattern recognition task [32,77]. The main advantage of SVM is good capability of data generalization, guarantee of global convergence in model training, and handling of high dimensional datasets. SVM is based on the structural risk minimization framework and maximum-margin concept [78]; therefore, this machine-learning method has strong resilience against noise and data over-fitting. These facts make SVM a good candidate for constructing a data-driven model used in the detection of crack and sealed crack patterns. In this study, SVM analyzes the features extracted by the aforementioned image-processing techniques and image texture descriptors. Based on collected asphalt image samples, the trained model is then used to deliver the output class labels of noncrack, sealed crack, and unsealed crack (refer to Figure 6).

To construct the classifier, a SVM model for the two-class recognition problem must be established. In addition, to handling nonlinear separable datasets, the SVM method relies on kernel functions to construct a mapping from the original input space to a high-dimensional feature space. Within this high-dimensional feature space, data drawn from different class labels can be separated by a hyperplane (refer to Figure 6). For nonlinear data classification, radial basis kernel function (RBKF) is often employed [79]; its formula is expressed as follows:where is a hyper-parameter of the kernel function.

The construction of a SVM model can be converted to the following constrained optimization task:subjected to , , where Rn and b R denote the parameters of the hyperplane. is the vector of slack variables. C and are the penalty coefficient and the nonlinear data mapping function, respectively.

The standard SVM designed for a two-class recognition task can be easily extended for a multiclass recognition task via one-versus-all framework [80]. Based on this framework, the multilabel classification task is divided into several two-class recognition tasks. Each two-class recognition task is handled by a separated SVM model. The final class output is determined by computing confidence scores of each data sample corresponding to each class label [81]. In this study, the performance of the SVM-based classifier is further enhanced by using the SSA metaheuristic. The hybrid machine learning metaheuristic framework is described in the subsequent section of this paper.

4. The Proposed Computer Vision Approach for the Detection of Pavement Crack and Sealed Crack

This section of the article presents the proposed approach for recognizing pavement crack and sealed crack. This method is an integration of image-processing techniques, image texture description, and metaheuristic-optimized machine learning model. An overall view of the newly constructed framework is illustrated in Figure 7. The proposed method, named as computer vision-based salp swarm algorithm optimized support vector machine (CV-SSA-SVM), can be divided into three main steps: (i) image sample collection and labeling, (ii) feature extraction based on image processing and texture computation, and (iii) machine learning model optimization and prediction.

It is noted that the feature extraction module has been programmed in Microsoft Visual Studio with Visual C# .NET framework 4.7.2. The SSA metaheuristic is adopted from the source code provided by Mirjalili [82]. The SVM model optimized by the SSA metaheuristic is constructed in MATLAB environment with the assistance of the Statistics and Machine Learning Toolbox [83]. In addition, the optimized computer vision-based model which employs the module of texture computation and the SSA optimized SVM model has been coded and complied in Visual C# .NET framework 4.7.2 with the help of built-in functions in the Accord.NET Framework [84].

4.1. Image Sample Collection and Labeling

The first step of this study is to collect image samples of pavement surfaces that contain sealed cracks and unsealed cracks. To fulfill such a step, field surveys in Danang city (Vietnam) has been carried out to gather image samples of asphalt pavement surfaces. The image samples consist of three categories: “noncrack” (class label = 0), “sealed crack” (class label = 1), and “crack” (class label 2). The number of collected image samples is 300; hence, the number of samples in each category is 100 to guarantee a balanced classification problem. To expedite the image processing and texture computation processes, the size of image samples has been fixed to be 64 × 64 pixels. Figure 8 demonstrates the collected image samples. Notably, the digital images in this study have been captured by the 18-megapixel resolution Canon EOS M10 and the 16.2-megapixel resolution Nikon D5100. Moreover, the labels of the image samples have been assigned by human inspectors.

4.2. Feature Extraction Based on Image Processing and Texture Computation

An overview of the feature extraction process is provided in Figure 9. Given an image sample, GSF is employed for edge amplification. This image-processing technique relies on a set of orientation-selective convolution kernels to generate a salient crack map. Based on this map, projection integrals including VPI, HPI, and two diagonal PIs are computed (refer to Figure 10). For each PI, four statistical measurements including maximum value, average value, standard deviation, and skewness are calculated. It is noted that the minimum value of each PI is always 0 and therefore ignored from the calculation. Since each PI yields four statistical indices, the number of features extracted from VPI, HPI, and two diagonal PIs is 4 × 4 = 16.

In addition, to isolate the object of interest, image-processing techniques including median filter, Otsu binarization, morphological operation, and connected component labeling are utilized. The median filter is particularly suitable for removing dot noise in image samples [85,86]; this technique with a window size parameter of 8 has been used to pre-process the collected image samples. Subsequently, the Otsu binarization [66,87] coupled with morphological operation is used for image thresholding and reveal the objects of interest within an image scene. The morphological operation [88,89] includes the algorithms for removing small and noncrack objects. Finally, connected component labeling [90,91] is used to separate crack objects for subsequent tasks of texture computation.

When the object of interest has been extracted (refer to the lower branch of Figure 9), image texture descriptors are used to characterize the surface properties of the image region that bounds the object. This study employs the statistical measurements of color channels and ARCS-LBP histogram as means of texture description methods. The first texture descriptor includes statistical indices of the three color channels, i.e., red, green, and blue. For each channel, three representative statistical measurement indices of mean, standard deviation, and skewness are computed. Therefore, this texture descriptor yields 3 × 3 = 9 features. The second texture descriptor is the ARCS-LBP. To compute this descriptor, analyses on attractive and repulsive relationship is performed with a local structure of the size 3 × 3 pixels for all of the pixels in an image sample. Accordingly, the ARCS-LBP method constructs a histogram that represents the texture of the image (refer to Figure 11). Based on this histogram, statistical measurements including mean, standard deviation, and skewness indices are calculated to serve as numerical features for the subsequent pattern recognition task.

4.3. Machine Learning Model Optimization and Prediction

The dataset including the extracted features and their corresponding class labels (noncrack, sealed crack, and crack) is used to construct the machine-learning model based on an integration of SVM pattern classifier and SSA optimizer. To train the proposed CV-SSA-SVM, the aforementioned dataset has been randomly divided into a training set (90%) and a testing set (10%). The training set is used for model construction; the testing set is reserved for quantifying the model predictive capability when classifying the novel data sample. Moreover, the Z-score equation should be used to standardize the 28 features extracted by the aforementioned image processing and texture description techniques. The Z-score equation is expressed as follows:where XZ and XD are the normalized and the original feature, respectively. MX and STDX denote the mean value and the standard deviation of the original feature, respectively.

As described in the previous section, the construction of the SVM model used for the classification of crack and sealed crack requires a suitable determination of the penalty coefficient C and the parameter σ of the RBKF the kernel function which governs the data mapping process. The penalty coefficient C essentially controls the amount of penalty suffered by misclassified data samples during the model training phase. Meanwhile, the parameter σ of the RBKF affects the locality of the kernel function that directly influences the generalization of the SVM model [92]. It is worth noticing that the task of determining those hyper-parameters can be formulated as a global optimization problem. Furthermore, because C and σ are both searched in continuous space, the number of parameter combinations is infinitely large. This fact means that an exhaustive search for the most appropriate hyper-parameters is infeasible. Hence, this work proposes to utilize the SSA metaheuristic to optimize the crack and sealed crack detection based on computer vision.

For the SSA algorithm, the number of members within the swarm is problem-dependent and often selected via experiments [74,93]. Based on suggestions from previous works [9496] and trial-and-error experiments, the number of members used in SSA has been selected to be 20. Accordingly, the SSA metaheuristic with 20 members and 100 iterations are used to optimize the training phase of the pattern classification model. This swarm intelligence aims at fine-tuning the model selection of the SVM employed for crack and sealed crack detection via minimizing the classification error. The SSA relies on the swarming motion of salp swarms when they forage as well as the influence of leader salp to gradually explore and exploit the search space. Herein, the lower and upper boundaries of the SVM’s hyper-parameters are 0.01 and 100, respectively. After the allowable number of iterations, it is expected that the SSA is able to locate a good solution that represents the penalty coefficient and the RBFK parameter. With the optimized hyper-parameters, the CV-SSA-SVM can carry out the prediction process and assign the label for image samples in the testing set automatically.

In addition, to optimize the machine-learning model based on the SSA metaheuristic, an objective function that relies on a fivefold cross validation has been used. This objective function minimized by the SSA metaheuristic is expressed as follows:where K = 5 and CN = 3 denote the number of data folds; and represent false-negative rate (FNR) and false-positive rate (FPR) computed for the cth class in the kth data fold, respectively. CN denotes the number of class labels.

The FNR and FPR metrics are given bywhere FN, FP, TP, and TN refer to the false-negative, false-positive, true-positive, and true-negative data samples, respectively.

5. Experimental Results and Discussion

As stated earlier, the CV-SSA-SVM model has been coded and complied in Visual C# .NET framework 4.7.2. In addition, experiments with the compiled computer program have been accomplished on the ASUS FX705GE - EW165T (Core i7 8750H and 8 GB Ram) platform. The SSA optimization process is carried out to assist the SVM-based detection of crack and sealed crack. After 100 iterations, the SSA metaheuristic has identified the best values of the searched parameters as follows: the penalty coefficient = 704.94 and the RBKF parameter = 7.46. The best-found cost function value is 0.0925. The record of the best found objective function value during the optimization process of the SSA is demonstrated in Figure 12.

As stated in the previous section, the collected dataset which includes 300 data samples and three class labels has been randomly separated into a training set (90%) and a testing set (10%). Furthermore, to reliably evaluate the predictive capability of the proposed CV-SSA-SVM, the model training and testing processes have been repeated 20 times. Accordingly, the statistical measurements attained from the 20 independent runs are used to assess the model predictive capability in the task of detecting pavement crack and sealed crack. This repetitive process of experiment aims at diminishing the variation caused by the randomness in the data sampling phase.

Moreover, to demonstrate the capability of the newly constructed CV-SSA-SVC, Random Forest Classification (RFC) model [97], Backpropagation Artificial Neural Network (BPANN) [98,99], and Convolutional Neural Network (CNN) models [100] have been selected as benchmark approaches. The RFC, BPANN, and CNN are capable classifiers and have been widely employed in pattern recognition and particularly in data-driven or structural health monitoring based on computer vision [101112].

In this study, the RFC has been implemented in Microsoft Visual Studio with the assistance of the Accord .NET library [84]. Adaptive Moment Estimation (Adam) [113] is the state-of-the-art approach for training the neural network. Therefore, Adam optimizer is used to train both the CNN and BPANN models. Notably, the deep CNN models trained by the Adam, denoted as DCNN-Adam, is established with the help of built-in functions provided in MATLAB deep learning toolbox [114]. The BPANN model trained by Adam, denoted as Adam-BPANN, is coded and compiled in Visual C# .NET. Trial-and-error runs have been used to identify the parameters of the benchmark models. The number of classification trees used in the RFC model is 50. Based on the suggestion in Ref. [115], the number of neurons in the hidden layer of the Adam-BPANN is selected to be where DX = 28 is the number of input features and CN = 3 is the number of output classes. The number of training epochs and number of samples in one batch of the BPANN model are 1000 and 8, respectively. The structure of the CNN model is provided in Table 1; this model has been trained with 3000 epochs and a batch size = 8.

Moreover, to evaluate the predictive capability of the proposed CV-SSA-SVC and the benchmark methods, a set of performance measurement metrics including classification accuracy rate (CAR), precision, recall, negative predictive value (NPV), and F1 score is used [116]. The equations used to compute CAR, precision, recall, NPV, and F1 score are as follows [116,117]:where NC and NA are the numbers of correctly predicted samples and the total number of samples, respectively. As stated in the previous section, FN, FP, TP, and TN denote the false-negative, false-positive, true-positive, and true-negative samples, respectively.

The prediction performances of the newly developed CV-SSA-SVM and the employed benchmark models are reported in Table 2 which contain the mean and standard deviation (Std) of the performance measurement metrics. This table summarizes the predictive performances of the models when they are employed to categorize novel image samples in the testing sets. It is noted that the results shown in these two tables are obtained from experiments of the models with 20 independent runs. Observably, the proposed hybridization of SSA metaheuristic and SVM machine learning has attained the most desirable performance in the testing phase.

For detecting cracks from image samples, the CV-SSA-SVM achieves the CAR = 91.33% and F1 score = 0.87. The RFC is the second best model (with CAR = 88.67% and F1 score = 0.79), followed by the Adam-DCNN (CAR = 86.17% and F1 score = 0.78) and Adam-BPANN (CAR = 79.33% and F1 score = 0.71). Considering the accuracy of sealed crack detection, the performance of the CV-SSA-SVM (with CAR = 92.83% and F1 score = 0.89) is also higher than those of Adam-DCNN (with CAR = 88.33% and F1 score = 0.83), Adam-BPANN (with CAR = 82.00% and F1 score = 0.73), and the RFC (with CAR = 75.17% and F1 score = 0.73). When predicting samples of the 1st class (noncrack), the proposed method also achieves the highest accuracy rate of 90.5%; the Adam-DCNN is the second best model with CAR = 89.50%. The performances of the RFC (CAR = 86.50%) and Adam-BPANN (CAR = 79.33%) in classifying samples in the 1st class are inferior to those of the two aforementioned methods.

It is observed that the RFC shows good performance in detecting cracks from image samples (with CAR = 88.67%); however, it performs poorly in detecting sealed cracks with CAR of only 75.17%. This result of the RFC is worse than CV-SSA-SVM, Adam-BPANN, and Adam-DCNN. The two benchmark models of the Adam-BPANN and the Adam-DCNN show relatively good detection accuracy for all of the three class labels; their detection accuracy rates are higher than 86%. Notably, the proposed CV-SSA-SVM demonstrates superior detection performances with the detection accuracy rates exceeding 90% for all of the three class labels of interest. The boxplots illustrating the statistical distributions of the models’ performance in terms of CAR computed from 20 independent runs are provided in Figures 1315.

Moreover, to reliably evaluate the superiority of the proposed CV-SSA-SVM model used for detecting cracks and sealed cracks appearing on the pavement surface, this study has utilized the Wilcoxon signed-rank test [118] with a significance level (value) = 0.05. The important performance measurement metrics of CAR and F1 score are subject to the Wilcoxon signed-rank hypothesis test. In detail, the null hypothesis is that the means of the prediction performances of two models are actually equal. The Wilcoxon signed-rank test results are reported in Table 3.

The Wilcoxon signed-rank tests with value = 0.05 show that the CV-SSA-SVM is significantly superior to the RFC, Adam-BPANN, and Adam-DCNN in detecting crack and sealed cracks ( <0.05). However, we cannot reject the null hypothesis when the CV-SSA-SVM and Adam-DCNN are used in detecting samples belonging to the 1st class of noncrack. This outcome indicates that these two models are both highly fitted for predicting samples in this class of interest.

The experimental results have pointed out the superiority of the proposed method of the CV-SSA-SVM over RFC, Adam-BPANN, and the deep neural computing approach of CNN. Notably, the SSA-optimized SVM, RFC, and BPANN require the feature extraction phase to compute the numerical inputs. These inputs contain information of the visual appearance of the object and the surface property of the object within the scene. The superiority of the CV-SSA-SVM over RFC and Adam-BPANN confirms the advantage of the hybrid computing model that combines metaheuristic optimization and machine learning-based pattern classification.

Thus, this finding complies with results reported in recent studies that utilize hybrid computing models in civil engineering [119125]. Although SSA has been demonstrated to be an effective swarm intelligent method used for solving various complex optimization tasks Houssein et al. 2020; [76, 126, 127], its application in constructing sophisticated computer vision-based systems is still rare. Therefore, the integration of SSA into the proposed framework of CV-SSA-SVM can be considered as an attempt to fill this gap in the literature and point out the potentiality of this swarm-based stochastic search in optimizing other similar computer vision models.

Different from the methods of the SSA optimized SVM, RFC, and Adam-BPANN, the CNN model is capable of performing feature extraction automatically. Therefore, the CNN model does not require the feature computation process that extracts numerical input data from the image samples. Instead of that, the Adam-DCNN model constructs the feature representation via convolutional operators and its hierarchical architecture for learning high-level features from raw image samples [128].

Although the CNN is a highly capable image classification method [129133], its performance considerably depends on the size of the collected training samples [53]. The main advantages of a CNN model which are automatic feature representation and deep hierarchical architecture can only be realized with a large training dataset with correct ground truth labels. This fact also presents a major difficulty in implementing CNN models because a large amount of data samples for a certain class label (i.e. sealed crack) is not always readily presented. In addition, great effort must be dedicated in data labeling to construct a reliable deep-learning method with a huge number of model parameters to be tuned [30, 134].

Therefore, the fact that the CV-SSA-SVM can outperform the Adam-DCNN method in classifying the dataset at hand is understandable. As pointed out by Chen et al. [52], under certain circumstances, especially when the number of training data can not be gather in a large quantity, the performance of deep neural networks can be worse than the method that relies on handcrafted features. It is because under a limited number of training instances, Adam-DCNN faces difficulty in adapting its internal structures with a huge number of parameters in the hidden layers. Moreover, different from the model construction phase of the SVM that is based on quadratic programming, the model training process of CNN models relies on gradient descent algorithms. When the number of trained parameters greatly exceed the number of training instances, these optimizers based on gradient descent cannot effectively adapt the set of network’s parameter in the case of insufficient training samples.

6. Conclusion

This study has put forward and verified a method based on computer vision for the automatic classification of crack and sealed crack appearing in asphalt pavement surfaces. The new approach based on computer vision, named as CV-SSA-SVM, is an integration of feature computation based on image processing, metaheuristic optimization, and data classification based on machine learning. The feature extractor used in this study relies on GSF coupled with PI to analyze the visual appearance of object within a image scene.

Image-processing techniques including image thresholding, and morphological operators are used to extract the region of interest. Subsequently, the surface properties of the extracted region are analyzed by the texture descriptors of statistical measurement of color channels and ARCS-LBP. Provided with the extracted features, the SVM machine learning optimized by the SSA metaheuristic is utilized to establish a decision boundary that divides the learning space into three regions corresponding to three class labels of “noncrack,” “sealed crack,” and “crack.” Experiments with image samples of asphalt pavement surfaces demonstrate that the proposed CV-SSA-SVM is highly suitable for the task of interest with CAR = 90.50% for the class of “noncrack,” CAR = 92.83% for the class of “sealed crack,” and CAR = 91.33% for the class of “crack.” This outcome is significantly better than those obtained from the benchmark methods including the RFC, Adam-BPANN, and Adam-DCNN. Hence, the newly developed CV-SSA-SVM can be a potential tool to help pavement maintenance agencies in the task of periodic pavement condition survey. Future extensions of the current study may include the investigation of the CV-SSA-SVM performance with respect to different swarm sizes and the extension of the current image dataset to enhance the generalization of the computer vision model.

Data Availability

The dataset used to support the findings of this study has been deposited in the repository of GitHub at https://github.com/NDHoangDTU/CV_SSA_SVM_Crack_SealedCrack. The first 28 columns of the data are features extracted from image samples. The last column is the label of the data instances with 0 = “noncrack,” 1 = “sealed crack,” and 2 = “crack.”

Conflicts of Interest

The author declares that there are no conflicts of interest regarding the publication of this article.

Supplementary Materials

The first 28 columns of the data are features extracted from image samples. The last column is the label of the data instances with 0 = “noncrack,” 1 = “sealed crack,” and 2 = “crack.” (Supplementary Materials)