Abstract

This study aims at proposing a computer vision model for automatic recognition of localized spall objects appearing on surfaces of reinforced concrete elements. The new model is an integration of image processing techniques and machine learning approaches. The Gabor filter supported by principal component analysis and k-means clustering is used for identifying the region of interest within an image sample. The binary gradient contour, gray level co-occurrence matrix, and color channels’ statistical measurements are employed to compute the texture of the extracted region of interest. Based on the computed texture-based features, the logistic regression model trained by the state-of-the-art adaptive moment estimation (Adam) is utilized to establish a decision boundary that delivers predictions on the status of “nonlocalized spall” and “localized spall.” Experimental results demonstrate that the newly developed model is able to achieve good detection accuracy with classification accuracy rate = 85.32%, precision = 0.86, recall = 0.79, negative predictive value = 0.85, and F1 score = 0.82. Thus, the proposed computer vision model can be helpful to assist decision makers in the task of the periodic survey of structure heath condition.

1. Introduction

Public safety is a major concern of civil engineers who design and maintain high-rise buildings. Despite considerable efforts in design and advanced knowledge of building structures, accidents can still happen in the built environment due to excessive usage, structural aging, and inclement weather conditions [1]. Among the hazards occurred in high-rise buildings, falling objects from overhead caused by concrete spalling can be particularly dangerous and have a high potential severity to occupants’ heath [2]. The effect of concrete debris can be devastating for human lives if it gets broken off from surfaces of exterior wall systems of high-rise buildings [3].

A concrete spall (Figure 1) is regarded as flakes of concrete/mortar broken off from a concrete element (e.g., beam, wall, and ceiling) [4]. Spalling is typically caused by stresses brought about by differential movement of materials. Most often, spalling in concrete is due to corrosion of steel reinforcement embedded in the structure. To prevent such accidents and to ensure the safety and serviceability of the built environment, periodic visual surveys of structural heath condition and proper maintenance activities are very crucial [5].

In developing countries, including Vietnam, manual visual inspection is still the principal method for evaluating structural heath conditions. This activity is performed at regular intervals to identify potential damages and guarantee the service/safety requirements of high-rise buildings. Provided the well-trained technicians experienced in structural heath assessment, manual visual inspection is able to providing accurate surveying outcomes. Nevertheless, due to the increasing numbers of buildings needed to be inspected periodically and the limited number of experience technicians, timely evaluation of building elements becomes infeasible and inspection deficiencies become a major concern of property owners. Therefore, there is a practical need to substitute the unproductive manual visual survey with a more effective approach.

Recently, due to the ease of access to low-cost and high-quality visual sensing equipment including digital cameras, computer vision-based models have been increasingly used for automatic structural heath monitoring [6]. These advanced approaches have been proved to be viable alternatives to replace the labor-intensive and subjective methods relied on manual survey. With the use of advanced image processing techniques operated on image samples collected from digital cameras, the physical condition of civil structures can be continuously surveyed and reported to maintenance agencies. This evaluation outcome can be effectively used to support the decision-making process regarding maintenance prioritization and funding allocation.

Due to such reasons, a large number of computer vision-based approaches have been proposed to successfully detect various forms of structural defects such as cracking and spalling. Abdel-Qader et al. [7] employs a principle component analysis-based model to recognize cracking defects appeared on bridge surfaces; the principle component analysis is utilized to support data cluster identification with a large database of bridge images. O’Byrne et al. [8] utilizes texture analysis for detecting damages appeared on infrastructural elements; the texture-based image segmentation relies on pixel intensity values and gray level co-occurrence matrix. Subsequently, a support vector machines model is then employed for the data classification task. Lattanzi and Miller [9] rely on the data clustering approach for image segmentation based on the Canny and k-means algorithm; the research finds that the combined algorithms can deliver good accuracy of crack recognition under different environmental circumstances.

As can be seen from the literature, a large number of previous studies have been dedicated to crack detection for concrete structures [1020]. Only recently, there is an increasing focus on detecting other forms of damage including concrete spalling [2124]. German et al. [25] constructs a combination of segmentation, template matching, and morphological preprocessing for detecting spall appeared on surfaces of concrete columns. Machine learning models including support vector machines, Naïve Bayesian classifier, and random forest have been employed to identify concrete defects [8, 26]. A model for localization and quantification of concrete spalling defects based on terrestrial laser scanning has been proposed in [27]. Dawood et al. [21] presented a computer vision-based model for spalling detection used in environment of subway networks.

Hoang [28] relies on a steerable filter and machine learning to recognize wall defects such as cracks and spalls. A concrete spalling detection model for metro tunnel from point cloud that employs a roughness descriptor has been developed by Wu et al. [24]. Hoang [29] presents an image processing approach for automatic detection of concrete spalling using machine learning algorithms and image texture analysis. Nevertheless, this model focused on machine learning-based texture discrimination and was not capable of isolating the entire individual spall object.

Yao et al. [30] establishes a convolutional neural network-based model for concrete bughole detection; a large number (about 10,000) of image examples have been used as a training dataset. Li et al. [31] proposed a model for detecting exposed aggregate appeared on stilling basin slab using the attention U-Net network. Chow et al. [32] employs deep learning of a convolutional autoencoder for anomaly detection of defects existing on concrete structures. A model for recognizing damaged ceiling areas in large-span structures has been proposed by Wang et al. [33]; this model employs a convolutional neural network for pattern recognition. Although deep learning-based models are capable of performing feature extraction phase automatically, these supervised learning models generally demand a large-size training dataset and a meticulous process of data labeling [3436]. This data labeling process itself can be time-consuming as well as error prone [5]. In addition, the deep learning models also require experience and the trial-and-error process to adjust a significant amount of model’s free parameters.

An effort of combining unsupervised learning and machine learning-based data classification has been recently introduced in [37]. The k-means clustering algorithm and machine learning classifier have been integrated to form an automatic approach for estimating stripping of asphalt coating. The k-means clustering algorithm is utilized to separate pixels with similar values on the surface of aggregates; subsequently, machine learning models are used to categorize the identified clusters into groups of asphalt-coated and uncoated areas.

As pointed out by previous studies, the current challenges faced by computer vision-based concrete damage detection including spall recognition are complex environmental conditions (e.g., noisy background image) [5] and the difficulty of the image labeling process [32]. More efforts should be dedicated to automatically identify the damage’s region of interest (ROI) via unsupervised learning methods. Capable machine learning methods with few free parameters should be investigated as viable alternatives to sophisticated models used for data classification. It is because simple and manageable models significantly facilitate the development and application of hybrid computer vision-machine learning approaches for concrete spalling detection.

Based on such motivation, this study proposes and verifies an automated method for recognizing localized spall objects based on an integration of a Gabor filter, k-means clustering, image texture analysis, and logistic regression pattern classification models. The Gabor filter coupled with the principal component analysis and the k-means clustering are used synergistically for automatic identification of ROI on concrete surface. The image texture analysis combines powerful texture discriminators of binary gradient contours, color channels’ properties, and the gray level co-occurrence matrix. The logistic regression model trained by the state-of-the-art adaptive moment estimation (Adam) optimizer is employed for data classification.

The subsequent sections of the study are organized as follows: the second section reviews the research methodology. The third section presents the image data collection process. The proposed integrated model used for concrete spall detection is described in the next section, followed by the experimental results and discussion. The final section summarizes the research findings with several concluding remarks.

2. Research Methodology

2.1. Gabor Filter (GF)

Image segmentation is the process of separating an image into distinctive regions [38, 39]. The GF is a widely applied approach for segmenting image [40, 41]. This approach is inspired by the multichannel operation of the human visual system used for visual interpretation in real-world circumstances [4244]. Based on experimentation, it has been shown that the GF resembles simple cells in the Mammalian vision system. Thus, this filter can be a reasonable model of how humans actually recognize and discriminate areas characterized by different texture [45].

The GF consists of two-dimensional Gabor filters which can be described as complex sinusoidal waves modulated by Gaussian envelopes [43]. This filter carries out a localized and oriented frequency analysis of a two-dimensional signal. The GF yields a response that can be mathematically given as follows [45]:where u0 denotes the frequency of a sinusoidal plane wave along the x axis. σx and σy represent the space constants of the Gaussian envelope along the x and y axes, respectively. It is noted that the GF with arbitrary orientations can be obtained via a rigid rotation of the x-y coordinate system.

The frequency domain representation of the GF is given by [45]where , , and .

It is worth noticing that it is necessary to specify tuning parameters of the GF including the orientation angles and the radial frequency. Based on the suggestions of Jain and Farrokhnia [45], four values of orientations are often employed: 0°, 45°, 90°, and 135°. Given an image with a width of pixels, the following values of radial frequency u0 can be considered: .

2.2. The K-Means Clustering Algorithm

In this study, the unsupervised machine learning approach of k-means clustering [46] is employed to divide an image into different regions based on the analysis results obtained from the GF. This unsupervised machine learning method is simple yet powerful algorithm for automatic data grouping [47]. Based on such method, image pixels that have the similar properties can be grouped in one cluster. Accordingly, data samples belonging to one cluster feature the smallest degree of variation. The iterative algorithm used to compute the cluster centers is presented in Algorithm 1.

Determine the number of cluster k
Randomly assign k centers of data samples
(1)Loop
(2)  Assign each data points to the cluster with the nearest mean
(3)  Recalculate means for data points assigned to each cluster
(4)Until the data assignments are unchanged.
2.3. Image Texture Analysis
2.3.1. Binary Gradient Contours (BGC)

The BGC, proposed by Fernández et al. [48], is a group of computationally simple texture descriptors. Given a 3 × 3 grayscale image patch, these texture descriptors employs a set of eight binary gradients between pairs of pixels all along a closed path around the central pixel [49]. The BGC includes three versions which are single-loop, double-loop, and triple-loop descriptors. Via experimentation, the BGC operator has been found to achieve good texture discrimination outcomes.

A matrix S which is the pixel intensity of an image patch of the size 3 × 3 is given as follows:where Ic denotes the central pixel. I0, I1, …, I7 are the neighboring pixels.

The schematic representations of BGC with three versions of single, double, and triple loops are presented in Figure 2. In addition, to facilitate the mathematical formulation of these texture descriptors, a square crop Sm,n is given bywhere Im,n represents the pixel at mth row and nth column.

Accordingly, the formulations of the single, double, and triple-loop versions are given by [48](i)Single-loop version:(ii)Double-loop version:(iii)Triple-loop version:

2.3.2. RGB Channels’ Properties

Since the color properties of spall and nonspall objects are expected to be dissimilar, this study employs the statistical measurements of three color channels: red (R), green (G), and blue (B) as a means of texture description. Given an image sample I, the first-order histogram P(I) can be computed. Accordingly, the mean (), standard deviation (), skewness (), kurtosis (Kc), entropy (), and range (Rc) of the three color channels (R, G, and B) can be calculated [29, 50].

2.3.3. Gray Level Co-Occurrence Matrix (GLCM)

The GLCM [51, 52] is also an extensively employed approach for characterizing image texture. This approach focuses on capturing the repeated occurrence of certain gray-level patterns [53]. Therefore, indices extracted from a GLCM can be effectively utilized to evaluate the coarseness/fineness of an image region. Let r and represent a distance and a rotation relationship between two individual pixels. The GLCM, denoted as , denotes a probability of the two gray levels of i and j having the relationship specifying by r and [54]. Based on the recommendations of Haralick et al. [51], the GLCM can be constructed with r = 1 and  = 0°, 45°, 90°, and 135°. Accordingly, for each matrix, four indices of angular second moment (AM), contrast (CO), correlation (CR), and entropy (ET) can be computed as follows [29, 55]:where denotes the number of gray level values; , and are the means and standard deviations of the marginal distribution with respect to .

2.4. Logistic Regression Model (LRM)

The LRM is a capable method for solving binary classification problems [29, 5658]. The task at hand is to construct a decision boundary that categorizes the input data into two distinctive regions. Therefore, given a vector of input data , where D is the number of the features used for classification, the model is able to derive the class output y with either y = 0 (for the negative class of nonspall) and y = 1 (for the positive class of spall).

The probability of the positive class derived by a LRM is given by [59]where . The vector is the model parameter.

As a supervised learning approach, a set of training examples needs to be prepared so that the vector can be adapted during the model training phase. A LRM can be trained by either minimizing the least square loss function or maximizing the log likelihood function.

The least square loss function is given by [60]where M is the number of training data.

The log likelihood function is described as follows [61, 62]:

A LRM can be trained via the stochastic gradient descent framework [29]. If the least square loss function is used, the update rule for adapting the model parameters is given by [60]where Lr denotes the learning rate parameter.

Meanwhile, if the log likelihood function is selected, the update rule used that compute is given by [61, 62]

2.5. Adaptive Moment Estimation (Adam) Optimizer

Adam, proposed by [63], is designed as an algorithm for first-order gradient-based optimization of stochastic objective functions. This algorithm is relied on adaptive estimates of lower-order moments. Adam can be considered as an extension of the stochastic gradient descent employed to train machine learning models via an iterative weight updating process [64]. It is noted that the conventional stochastic gradient descent employs a constant learning rate (Lr) for all weight updates. Adam seeks for improving the model training phase by adaptively fine-tuning the Lr parameter.

Adam harnesses information obtained from the average of the second moments of the gradients. In detail, this optimization algorithm computes an exponential moving average of the gradient and the square gradient. Moreover, a set of parameters (β1 and β2) is used to dictate the decay rates of these moving averages [64]. Via experimentation, it can be shown that the advantages of this optimizer include efficient computation, straight forward implementation, no memory requirements, and the capability of dealing with a large number of optimized parameters [63].

In order to implement Adam to optimize a LRM, it is necessary to compute the gradient (). The gradient in the case of using the least square loss function is given by [60]

If the log likelihood function is employed, the gradient is given by [22, 61, 62]

Accordingly, the Adam procedure (illustrated in Algorithm 2) used for training a LRM can be performed iteratively with the following steps:(i)Compute gradient (ii)Update the biased first and second raw moment estimates(iii)Compute the bias-corrected moment estimates(iv)Adapt the optimized parameters

Define step size a = 0.001
Define exponential decay rates β1 = 0.9 and β2 = 0.9999
Define the objective function f(θ)
Randomly initialize the searched variable θ
Assign m0 = 0,  = 0, and t = 0
(1)While (θt not converged)
(2)  t = t + 1
(3)  Compute gradient:
(4)  Update biased 1st moment estimate
(5)    
(6)  Update biased 2nd raw moment estimate
(7)    
(8)  Calculate bias-corrected first moment estimate
(9)    
(10)  Calculate bias-corrected 2nd raw moment estimate
(11)    
(12)  Update the searched parameter
(13)    
(14)End While
(15)Returnθt

3. Collection of Image Samples

The LRM used in this study belongs to the category of supervised machine learning methods. To train this LRM with the use of the aforementioned Adam optimizer, it is a requisite to prepare a set of training image samples as well as a set of testing image samples to verify the model construction phase. Therefore, this study has carried out field surveys at several high-rise buildings in Danang city (Vietnam) to collect a set of 600 image samples. Among them, 300 samples contain localized spall objects and 300 samples consist of nonlocalized spall objects. Notably, image samples of the two class of nonspall (class label = 0) and spall (class label = 1) have been assigned by a human inspector for the purposes of model training and testing. The Cannon EOS M10 (CMOS 18.0 MP) and Nikon D5100 (CMOS 16.2 MP) have been employed to collect image samples. In addition, the image size has been standardized to be 64 × 64 to facilitate the computation process. The image set has been collected so that a diverse background (e.g., cracks and stains) can be included. The collected image set is demonstrated in Figure 3.

4. The Proposed Hybrid Approach of Image Processing and Machine Learning Approach for Automatic Detection of Concrete Spall

This section of the study aims at describing the structure of the proposed hybrid approach of image processing and machine learning used for recognizing localized spall object. The overall structure of the proposed approach is presented in Figure 4. It is noted that the hybrid model used for automatic concrete spall detection has been developed in Visual C#.NET environment (Framework 4.6.2) and implemented with the ASUS FX705GE–EW165T (Core i7 8750H, 8 GB Ram, 256 GB solid-state drive).

The model can be divided into several operational steps:(i)Automatic ROI identification(ii)Image texture computation(iii)Machine learning-based pattern classification

4.1. Automatic Region of Interest (ROI) Identification

To deal with the diverse shapes of localized spall objects, this study relies on the techniques of GF to automatically identify ROIs that contains the potential defects of interest. It is noted that an image sample has been denoised by a median filter with a window size of 4 pixels and converted to a grayscale one. After the GFs with different orientations and radial frequency are computed, the principal component analysis (PCA) is performed to transform the set of GFs and reduce the data dimensionality (Figure 5). The number of the PCA transformed data is selected corresponding to 99% of cumulative variance explained. It is noted that the GF and the PCA operations have been implemented via built-in functions provided by the Accord.NET Framework [65].

Based on the PCA result, the k-means clustering algorithm is used to segment the image sample. Via experimentation, the suitable number of clusters for the collected dataset is found to be 3. Subsequently, the morphological operation of filling and removing small objects are utilized to process the segmented image. Moreover, the operation of background removal is performed to remove redundant objects. In this study, an object within an image sample is considered to be background if its width or height is equal to that of the image sample.

Accordingly, each image cluster or segment is presented as a binary image. The connected component labeling algorithm [66] is then used to analyze the position of the binary-1 pixels and separate them into distinctive component regions. Essentially, all pixels having value binary 1 and are connected to each other are grouped into one object [38]. To remove crack objects, for each grouped pixels obtained from the connected component labeling analysis, an object slenderness index (OSI) is computed as follows:where LOX and LOY are the object lengths along the X axis and Y axis, respectively. and denote the mean thicknesses of the object along the X axis and Y axis, respectively.

If the OSI of an object is greater than a certain threshold (TOSI), this object is classified as a crack. Via several trial-and-error experiments with the collected image sample, a suitable value for the threshold TOSI is found to be 5. After the ROIs have been identified, operations of image convolution and cropping are employed to isolate the areas of interest within the image sample. The processes of ROI identification and isolation are demonstrated in Figures 6 and 7.

4.2. Image Texture Computation

Based on ROIs obtained from the previous step, image texture analysis consisting of statistical measurements of BGC, RGB channels, and GLCM is carried out. The BGC texture descriptor includes all of the three variants of single, double, and triple loops. Each of the variants yields a histogram which describes the texture property of an image sample. Accordingly, statistical indices including mean, standard deviation, skewness, kurtosis, and entropy can be computed for each histogram. Therefore, the BGC results in 15 numerical features. As mentioned earlier, the mean (), standard deviation (), skewness (), kurtosis (Kc), entropy (), and range (Rc) of the three color channels (R, G, and B) are used to represent the color features of image samples. Thus, there are 6 × 3 = 18 additional numerical features. Moreover, properties of the GLCM including the four indices of angular second moment (AM), contrast (CO), correlation (CR), and entropy (ET) are used. It is noted that for each image sample, four GLCMs are established. Thus, the GLCM texture descriptor yields 4 × 4 = 16 features. In total, there are 15 + 18 + 16 = 49 numerical features that can be extracted from the image texture computation process.

4.3. Pattern Classification Using LRM Trained by the Adam Optimizer

Using the extracted ROIs and the aforementioned texture descriptors, a dataset with 790 samples and 49 features can be constructed. This dataset contains 465 nonlocalized spall samples and 325 localized spall samples. As stated earlier, the output class is either 0 for the negative class and 1 for the positive class. Moreover, in order to standardize the input features’ range, the numerical texture descriptors have been normalized by the Z-score equation as follows:where Xo and XZN represent the original and normalized input data, respectively. mX and sX are the mean and the standard deviation of the original input data, respectively.

Based on the aforementioned dataset, the LRM is trained with the Adam optimizer using the least square and log likelihood loss functions. These two LRM is denoted as Adam-LS and Adam-LL. It is noted that 90% of the collected dataset has been employed to construct the LRM model. Meanwhile, the rest of the dataset is reserved to verify the generalization capability of the model.

5. Research Findings and Discussion

As mentioned earlier, the whole collected dataset is divided into two subsets: a training set (90%) and a testing set (10%). Moreover, to diminish the effect of randomness brought about by data sampling and to assess the generalization capability of the integrated method reliably, the data sampling process has been repeated 20 times. A partitioned datasets used for model training and testing are demonstrated in Table 1. In addition, the LRM trained with the stochastic gradient descent algorithm with the least square and log likelihood loss function are employed as benchmark models. The stochastic gradient descent models coupled with the former and later loss function are denoted as LS-LR and LL-LR, respectively. Furthermore, the two LRMs trained with the Adam optimizer are denoted as Adam-LS-LR and Adam-LL-LR. All of the LRMs have been trained with 1000 iterations.

In addition, the classification accuracy rate (CAR), precision, recall, negative predictive value (NPV), and F1 score are computed to quantify the model predictive accuracy. These performance measurement indices are provided as follows [67]:where FN, FP, TP, and TN denote the number of false-negative, false-positive, true-positive, and true-negative samples, respectively.

The experimental results obtained from the repetitive data sampling with 20 runs are reported in Table 2. As can be seen from this table, the Adam-LL-LR has achieved the best predictive accuracy in both of the training (CAR = 85.25%, precision = 0.84, recall = 0.81, NPV = 0.86, and F1 score = 0.82) and testing phases (CAR = 85.32%, precision = 0.86, recall = 0.79, NPV = 0.85, and F1 score = 0.82). Since the prediction performances obtained from the training and testing phases of the Adam-LL-LR are relatively similar, it can be shown that this model has not suffered from overfitting. In addition, the LL-LR model is the second best approach (with CAR = 81.90% and F1 score = 0.78), followed by the Adam-LS-LL (with CAR = 72.03% and F1 score = 0.71) and the LS-LR (with CAR = 70.82% and F1 score = 0.70). Herein, the index of the F1 score is emphasized because it presents the harmonic mean of the precision and recall.

The training and testing performances of the employed models are graphically presented in Figures 8 and 9. The boxplot shown in Figure 10 demonstrates the testing performances of LRMs. In addition, to confirm the statistical difference of each pair of the localized spall detection models, the Wilcoxon signed-rank test with a significance level ( value) = 0.05 is employed in this section of the study. The test outcomes are reported in Table 3. Observably, experimental results show that all of the values are lower than the significance level. Thus, the null hypothesis shows that the performances of the two models under testing are statistically indifferent and can be confidently rejected. This hypothesis test asserts the superiority of the Adam-LL-LR model over other benchmark approaches.

Based on the experimental result, the Adam-LL-LR model is best suited for the collected dataset at hand. The performance of this model is further studied in this section. Illustrations of correctly recognized spall objects yielded by Adam-LL-LR are presented in Figure 11. As can be observed, the model can deliver accurate detection results in the presence of a window (Figure 11(a)) and a minor defect on the mortar surface (Figure 11(b)). Notably, the localized spall objects can still be located well in the cases that there are crack objects in the captured scenes (Figures 11(c)–11(e)). Furthermore, Adam-LL-LR has also performed well in the cases that there are multiple spall objects in the image samples (Figures 11(f) and 11(g)). In addition, the proposed Adam-LL-LR model can be used to quantify the percentage of damaged areas found in image samples; the computation results are demonstrated in Figure 12.

Nevertheless, as shown in Figure 13, the newly developed model has made incorrect detection results in the cases of complex background. As observed in Figure 13(a), an area in the background has the texture property similar to that of the spall object. This can lead to a false-positive detection. Complex background (Figure 13(b)) and irregular lighting conditions (Figure 13(c)) also tend to reduce the model accuracy. These phenomena can lead to false-negative cases.

6. Concluding Remarks

Localized spall is a common defect observed on surfaces of reinforced concrete elements. Accurate detection of this damage is crucial during the phase of the periodic structural heath survey. This study has developed and verified a computer vision-based approach for automating the task of localized spall recognition. The newly developed model is a hybridization of image processing and machine learning approaches. Image processing methods of the GF coupled with k-means clustering and morphological analyses are used to automatically identify the ROIs that potentially contain the defect. The BGC, GLCM, and color channels’ properties are employed as texture descriptors. Based on the computed image texture, the LRM optimized by the state-of-the-art Adam is used to construct a decision boundary that separate the data samples into two regions of nonlocalized spall and localized spall. Experimental results show that the LRM trained by the Adam optimizer can deliver the most desired prediction accuracy. Therefore, the proposed integrated model can be a useful tool to assist building maintenance agencies in the task of evaluating structure heath condition.

Data Availability

The image dataset used to support the findings of this study has been deposited in the repository of GitHub: https://github.com/NhatDucHoang/LocalizedSpallDetection_AdamLRM.

Conflicts of Interest

The author declares that there are no conflicts of interest.

Acknowledgments

This research was financially supported by Duy Tan University.