Bugholes are surface imperfections that appear as small pits and craters on concrete surface after the casting process. The traditional measurement methods are carried out by in situ manual inspection, and the detection process is time-consuming and difficult. This paper proposed a deep-learning-based method to detect bugholes on concrete surface images. A deep convolutional neural network for detecting bugholes on concrete surfaces was developed, by adding the inception modules into the traditional convolution network structure to solve the problem of the relatively small size of input image (28 × 28 pixels) and the limited number of labeled examples in training set (less than 10 K). The effects of noise such as illumination, shadows, and combinations of several different surface imperfections in real-world environments were considered. From the results of image test, the proposed DCNN had an excellent bughole detection performance and the recognition accuracy reached 96.43%. By the comparative study with the Laplacian of Gaussian (LoG) algorithm and the Otsu method, the proposed DCNN had good robustness which can avoid the interference of cracks, color-differences, and nonuniform illumination on the concrete surface.

1. Introduction

Bugholes are surface imperfections that appear as small pits and craters on concrete surface after the casting process [1]. These imperfections appear as regular or irregular pits with diameters ranging from few millimeters to 15 mm in diameter and are usually scattered randomly around the surface of the concrete. Bugholes are recognized as a major problem in the construction industry very early on [2, 3]. On the one hand, building owners and architects are getting stricter on quality of concrete surfaces. Surfaces are demanded to be flat and free of surface bugholes to leave an aesthetically pleasing impression. On the other hand, even though bugholes are primarily an aesthetic issue for exposed concrete structures and do not affect the structural strength of concrete, it does reduce the adhesion properties of the fiber-reinforced plastic (FRP) material applied to the concrete surface [4]. Additionally, research has indicated that salt accumulated in bugholes causes premature degradation of reinforced concrete (RC) structures [5]. Moreover, these surface bugholes are generally considered a nuisance for subsequent processes since these imperfections need to be filled before paint coating is applied to the concrete surface, and this additional surface preparation is a labor-intensive and costly process.

Owing to its influence on the quality of concrete surfaces, the methods for the detection of bugholes were established very early on. In the 1970s, construction professionals attempted to assess concrete surface quality by manually counting the number and measuring the diameter of bugholes and then calculating percentage of holed areas on the surface, which was considered time-consuming and impractical [2, 3]. Therefore, an improved method classifying concrete surface quality was proposed by Thomson [6], who suggested using bughole photos with different degrees of coverage as reference samples to compare with actual concrete surface. The current method [7, 8] of bughole rating developed by the American Concrete Institute (ACI) based on the idea introduced by Thomson and the ACI method compares the concrete surface to be assessed with a set of standard surface photographs representing seven scales, and the expert who is performing the comparison determines which scale the concrete surface being inspected belongs to. While simple in principle, some people argue that the comparison with photographs of reference samples can be problematic due to the variability between different printed scales of the reference samples and the subjectivity of the human eye [9, 10]. In addition, one surface may have several types of bugholes combined or a combination of several different surface imperfections such that the use of the reference becomes rather difficult and subjective. Moreover, there is a large amount of concrete engineering in practical states, and the manual inspection method is not only time-consuming and labor-intensive but also costly.

In order to obtain a better evaluation of the concrete surface, more objective and intelligent methods need to be developed. Digital image processing technology is considered a powerful automated tool that can provide objective results quickly [11, 12], and it has been successfully applied in bridge coating quality assessment in recent years. Lee et al. developed an automated processor that can recognize the presence of bridge coating rust defects [13]. In order to solve the problem of nonuniform illumination, Chen et al. proposed the adaptive ellipse approach (AEA), the box-and-ellipse-based adaptive-network-based fuzzy inference system (BE-ANFIS), and the support-vector-machine-based rust assessment approach (SVMRA) [1416]. In order to adapt to various background colors and overcome the effects of background noise or nonuniform illumination, Shen et al. proposed a rust defect recognition method based on color and texture feature, which combines the Fourier transform and color image processing [17]. Son et al. employed the J48 decision tree algorithm to rapidly and accurately determine rusted surface area [18]. In order to improve the detection accuracy of the rusted areas on steel bridges, Liao et al. proposed a digital image recognition algorithm that consisted of the K-means method and the double-center-double-radius (DCDR) algorithm [19]. Shen et al. proposed an artificial-neural-network-based rust intensity recognition approach (ANNRI) [20]. In addition to detecting rust, researchers also applied a variety of image processing techniques on visual images to detect cracks, including edge detection methods [2124], morphological operations [2527], digital image correlation [28, 29], and image binarization [30, 31]. A comparative study of fast Haar transform (FHT), fast Fourier transform, Sobel edge detector, and Canny edge detector showed that FHT has the best performance [22]. Lim et al. used the Laplacian of Gaussian (LoG) edge detector to detect surface cracks in concrete bridge decks and obtain global crack maps through camera calibration and robotic localization [23]. Talab et al. used multiple filters such as Sobel and Area filters to change the small area to the background and used the Otsu method to detect major cracks [24]. Some researchers have verified that the morphological operations are effective for crack detection [25, 26, 32]. Rimkus et al. used digital image correlation (DIC) technique to detect and locate cracks in concrete surfaces [28]. Kim et al. and Li et al. suggested using image binarization methods to extract crack information from digital images [30, 31].

The current research focused on the use of image processing technology to detect surface cracks and rust with relatively few studies on surface bugholes. Zhu and Brilakis proposed an image processing method to detect bugholes (referred to as air pockets in their paper) on the concrete surfaces [33], but this method did not consider the influence of nonuniform illumination. Ozkul and Ismail developed a bughole measuring device for rating the quality of a concrete surface [1]. Silva et al. developed an expert system that uses image analysis methods to classify the surface quality of self-consolidating concrete for precast members by calculating the percentage area, diameter, and distribution of the bugholes [34]. Hirano et al. used a thresholding method to detect bugholes in the images, but using this method, it is difficult to detect small bugholes [35]. Liu et al. established a method to detect surface bugholes via image analysis and published evaluation parameters; the OTSU image threshold segmentation technology is adopted to extract the characteristics of bugholes on the concrete surface [36]. Isamu et al. developed an image analysis method using color images to quantify the bugholes distributed on concrete surfaces [37].

The application of digital image processing technology (IPT) has promoted the advancement of quality inspection methods of structural surface. However, the direct use of image processing technology for surface defect inspection has several disadvantages. First, the algorithms are tailored for certain images in the studied datasets, which affect their performance on new datasets [38]. Second, due to the effects of noise such as illumination, shadows, and combination of several different surface imperfections, the detection results of the image processing algorithms may be inaccurate [3941]. Finally, the image processing algorithms are often designed to aid the inspector in defect detection and still rely on human judgement for final results [25]. One possible solution is using deep-learning algorithms to analyze the inspection images [42]. In recent years, deep-learning algorithm has shown remarkable performance in image object recognition [4346] and deep convolutional neural networks (DCNN) have attracted wide attention as an effective recognition method [47]. Several researchers have applied this method to detect concrete surface cracks. Zhang et al. conducted a comparative study on the image classification of pavement cracks using three methods: deep convolution network, support vector machine, and integrated learning. Results showed that the detection effect of the deep convolution network was better than the other two methods [48]. Cha et al. used a convolution neural network based on deep learning to detect cracks in concrete images [49]. Wang et al. proposed a convolutional neural network (CNN) for recognizing cracks on asphalt surfaces at subdivided image cells; the accuracy of the proposed CNN can achieve high accuracies 96.32% and 94.29% on training data and testing data, respectively [50].

In this paper, a deep convolutional neural network (DCNN) has been used to detect bugholes on concrete surfaces, and the effects of noise such as illumination, shadows, and combinations of several different surface imperfections in real-world environments were considered. The performance of the proposed method was compared with that of the traditional image processing methods.

2. Main Concrete Surface Defect Classification

There are many different types of quality defects of concrete surfaces. The most common surface defects are cracks, bugholes, and color-differences. Figures 13 show the image characteristics of these three types of defects.

Cracks, bugholes, and color-differences have its own features. Cracks are generally irregularly line-shaped, as shown in Figure 1. Owing to the different reflectivities of light, a bughole is normally darker than the rest of the concrete surface. The greater the depth, the darker the color. As shown in Figure 2(a), typical bugholes are nearly circular. However, irregularly shaped bugholes also exist, as shown in Figure 2(b). Additionally, because of the smaller depth of some bugholes, the contrast with the normal concrete surface is not obvious, as seen in Figure 2(c). Color-difference is the color that deviates from the color of a normal or desired concrete surface. It has neither a particular shape nor a clear border, as shown in Figure 3.

The previous studies ignored the situation that multiple defects existed on a concrete surface at the same time. However, this situation often occurs during the concrete casting stage. The concrete surface bughole images were classified according to a combination of surface defects, as shown in Table 1.

3. Proposed Method for Concrete Surface Bughole Detection

Figure 4 shows the proposed method’s general flow with training steps (solid lines) and testing steps (dashed lines). The DCNN was trained for a total 30 training epochs. The model was evaluated on the validation set after every training epoch. When the DCNN classifier was well trained, the testing images were scanned by the validated classifier to generate a report of bugholes.

3.1. Database Establishment

To train a CNN classifier, raw images of concrete surfaces with multiple types of defects (including cracks, bugholes, and color-differences) were taken from several completed construction sites with a mobile phone camera. The shooting distance to the objects ranges from 1.0 to 5.0 m. The bugholes need to be visible in images with the naked human eye. The total number of raw images was 116, with a resolution of 3,120 × 4,160 pixels. Among the 116 raw images, 80 were used for training and validation and 36 were cropped into 800 small images (256 × 256 pixel resolutions) for testing. Due to the small size of bugholes, the pixels of a single bughole range from 18 × 18 to 60 × 60. The 80 raw images were cropped into smaller images (28 × 28 pixel resolutions), which were manually annotated as bughole (i.e., positive sample) or not bughole (i.e., negative sample) to generate a database, as shown in Figure 5.

The total number of prepared training images in the database was 4 K. According to the ratio of training set, validation set = 4 : 1 [49], the number of training set images was 3.2 K, and the number of validation set images was 0.8 K.

3.2. Overall Architecture

This section describes the overall architecture of the DCNN used in this study, including parameter selection of each layer. Figure 6 presents the DCNN architecture, which was the original configuration for concrete bughole detection. The first layer was the input layer of 28 × 28 × 3 pixel resolutions, where each dimension indicated height, width, and channel (i.e., red, green, and blue), respectively. Due to the relatively small size of input image (28 × 28 pixels) and the limited number of labeled examples in training set (less than 10 K), this network utilized the inception modules architecture proposed by Szegedy et al. [51]. The inception architecture was able to approximate an optimal local sparse structure in convolution vision network, which allowed for utilizing efficient dense computation instead of insufficient numerical calculation on nonuniform sparse data structure directly. By applying the inception modules into the traditional convolution network structure, less network parameters can be obtained, which meant that the model would be more robust to the overfitting, especially when a small dataset was used. Besides, the batch normalization layer was inserted before activation function in all layers, which made the network easier to train and improved the generalization ability of final model [52]. Table 2 shows the convolution configuration and inception architecture.

The convolution layer has proved to be greatly effective in extracting different features from images. Instead of the fully connected layer in traditional neuron network, each neuron of convolution layer is connected to only a local region of the input volume. The spatial extent of this connectivity is a hyperparameter called the receptive field of the neuron (equivalently, this is the filter size). The extent of the connectivity along the depth axis is always equal to the depth of the input volume. Meanwhile, the filter has the shared weights for input images. For the first few layers with local receptive fields, convolution layers can extract elementary visual features such as oriented edges, end-points, and corners. These features are then combined by the subsequent layers in order to detect high-order features. For example, suppose that the input volume has size of 28 × 28 × 3 (e.g., an input image with three channels of red, green, and blue). If the receptive field (or the filter size) is 5 × 5, then each neuron in the convolution layer will have weights to a 5 × 5 × 3 region in the input volume, for a total of 5 × 5 × 3 = 75 weights (and +1 bias parameter), giving output of size (28 − 5)/1 + 1 = 24. So, after the convolution layer, the feature map gets a size of 24 × 24.

In general, the pooling layers will be periodically inserted into the convolution layers of a CNN architecture, reducing the number of parameters and saving computation resources required for data storage, which also avoids overfitting to a certain extent. The pooling units can perform different functions, such as max pooling, average pooling, or even L2-norm pooling, of which the max pooling was the most commonly used [53]. It divides the input image into several rectangular areas and outputs the maximum value for each subarea. Figure 7 shows an example of max pooling, with a stride of 2, where the pooling layer output size is calculated by the equation in the figure.

The ReLU layer applies the nonsaturating activation function to perform a threshold operation on each element of the input. It can increase the nonlinearity of the decision function and the entire network and will not affect the receptive fields of the convolution layer. In addition, the sigmoid function and the saturating hyperbolic tangent are often used to increase the nonlinearity. Figure 8 depicts several examples of nonlinear functions. Compared to other functions, ReLU function is more popular, because it can speed up the neural network’s training speed without significantly affecting the generalization accuracy of the model [45].

The loss layer is used to determine how the training process penalizes the difference between the predicted and actual results of the network, which is usually the last level of the network. The softmax function is often used in the last layer of the CNN architecture, as the output layer, to classify input data. The softmax function is given by Equation (1), which is expressed as the probabilistic expression, , for the training example out of training examples, the class out of classes, and weights, , where are inputs of the softmax layer. The sum of the right-hand side for the input always returns 1, because the function always normalizes the distribution. In other words, the following equation returns probabilities of each input’s individual classes:

The network is trained using a stochastic gradient descent (SGD) algorithm with a minibatch size of 100 out of 4,000 images. SGD, using backpropagation, is considered the most efficient and simplest way to minimize deviations [54, 55]. A disadvantage of the SGD method is that its update direction is completely dependent on the current batch. Thus, its update is very unstable. A number of improvements have been proposed and used, including the proposed use of a momentum method, in which the stochastic gradient of momentum reduction remembers the updated at each iteration and determines the next update as a linear combination of the gradient and the previous update [56]:that leads towhere the parameter, , which minimizes , is to be estimated and is a step size (learning rate).

As small and decreasing learning rates were recommended [57], the exponential decay learning rates depicted in Figure 9 were used in this study. The x-axis represents epochs, so that the learning rates are updated each time. As shown in Figure 9, the error tends to converge after 30 iterations. Weight decay and momentum parameters are assigned as 0.0001 and 0.9.

4. Testing and Discussion

To evaluate bugholes detection performance of the trained DCNN, 36 images not used in training were cropped into 800 small images (256 × 256 pixel resolutions) for testing. These images were taken from several construction sites. The test results of some images are shown in Figure 10. The yellow box marks the discriminated bugholes, and the number inside indicates the labeled number of bugholes in one image.

From the test results above, the proposed DCNN showed excellent performance in detecting bugholes on the concrete surfaces. When there simultaneously exist multiple types of defects, including cracks and color-differences, the trained DCNN also accurately detects the bughole defect without interference from other defects, as shown in Figures 10(b) and 10(c). Furthermore, under the circumstance of nonuniform illumination, the trained DCNN can also exclude the noise of strong light or shade and recognize the bugholes with a high accuracy, as shown in Figure 10(d).

By the image detection experiment, it can be found that the error mainly occurs at the edge of the image and the color-difference area, as shown in Figure 11. The identification error, as shown in Figure 11(a), is basically caused by improper images cropping, and the reason for identification error in Figure 11(b) seems to be the shape of stronger color-difference region, similar to bugholes.

By performing image testing on the trained DCNN, the recognition accuracy reached 96.43%, as shown in Table 3. Note that if pixels that are actually nonbughole are erroneously detected as bughole, it will be regarded as false positives; conversely, if pixels that are actually bugholes are incorrectly classified as nonbughole, it will be considered as a false negative.

5. Comparative Study

In order to compare the performance of the proposed DCNN-based bughole detection method and the traditional edge detection methods, it is necessary to conduct a comparative study. According to the study of related researchers, the Otsu method has the highest accuracy of identify the bugholes on concrete surface compared with the global threshold method, the nonmaximum suppression edge detection method, and the canny edge detection method. Talab et al. and Liu et al. [24, 36] used the Otsu method to detect cracks in the image and achieved good detection results. Lim et al. [23] used the Laplacian of Gaussian (LoG) algorithm to detect cracks, and the results showed that the LoG algorithm successfully detected the crack in the image. Therefore, the Laplacian of Gaussian (LoG) algorithm and the Otsu method were chosen for comparative study.

The bughole detection results under different methods are shown in Figure 12. For these four types of images, the method presented in this paper all had good recognition results and can accurately identify the region and the number of bugholes in images. Moreover, the trained DCNN can avoid the interference of cracks, color-differences, and uneven illumination on the concrete surface. When there exist only bugholes on the concrete surface, the LoG method and the Otsu method can identify bugholes well, as shown in Figure 12(a). However, when there exist multiple types of defects on the concrete surface as shown in Figures 12(b) and 12(c), the detection performance of both the LoG method and the Otsu method are prone to be poor. Both the Otsu method and the LoG method are hard to obtain good bughole detection results as expected because of the noise of color-difference. As shown in Figure 12(d), when the illumination of the concrete surface was uneven, the Otsu method could not detect the bugholes at the shadows; although the LOG method perform well in detecting bugholes, strong light would drastically lower the detection performance.

6. Conclusion

This paper proposed a deep-learning-based method to detect bugholes on concrete surface images. The concrete images required for the training, validation, and testing were taken from several construction sites by a mobile phone camera. The total number of raw images was 116. The 80 images were cropped into 4,000 smaller images of 28 × 28 pixel resolutions to build the database for training and validation processes, and the 36 images cropped into 800 small images (256 × 256 pixel resolutions) were used for the testing process. Due to the relatively small size of input image (28 × 28 pixels) and the limited number of labeled examples in training set (less than 10 K), this network utilized the inception modules architecture. From the results of the image test, the proposed DCNN had an excellent bughole detection performance and the recognition accuracy reached 96.43%, expanding the application of deep-learning-based methods in the detection of concrete surface defects.

The effects of noise such as illumination, shadows, and combinations of several different surface imperfections in real-world environments were considered. By the comparative study with the Laplacian of Gaussian (LoG) algorithm and the Otsu method, it is clear that the proposed DCNN has good robustness and can avoid the interference of cracks, color-differences, and nonuniform illumination on the concrete surface; using both the Otsu method and the LoG method, it was difficult to detect bugholes effectively due to the interference of color-difference and nonuniform illumination.

However, a common limitation of deep-learning-based methods is that the network training requires a large amount of labeled training data, so it is necessary to add more images to extend the database. In the next stage, the major task is to establish a quality evaluation system of concrete surface based on computer vision.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This work was financially supported by the Fundamental Research Funds for the Central Universities (2019CDXYTM0032, 2019CDQYTM028, and 106112017CDJXY200009) and National Natural Science Foundation of China (51608074).