Most of the bridge structures in the world are built of reinforced concrete. With the growth of service life and the increase of urban traffic and other factors, most bridges put into service have more or less damage. Traditional bridge damage detection methods include the manual inspection method and bridge inspection vehicle method, which have many shortcomings. Moreover, the detection of cracks in bridges is critical to the safety of transportation due to the extremely large number of bridges built in the road networks across the world. To this end, this paper uses the most widely used CNN in deep learning to identify and classify crack images and proposes a migration learning technique to solve the problem of the large amount of training data required for training CNN. The data augmentation and sliding window techniques are introduced to divide the collected crack data into training establish and test set. The experiments show that the method in this paper can classify the crack images better, extract and locate the cracks of bridge crack units, and finally extract the crack coordinates of boxing. Compared with the customary image recognition methods, the method used in this paper is easier to operate in practical engineering, and the accuracy of the obtained results is higher.

1. Introduction

In China, the road networks have a very large number of bridges built, and since the 1990s, the construction of highway bridges has vigorously developed. According to the Ministry of Transport, by 2020, there will be 832,500 bridges in service nationwide [1], an increase of 27,200 bridges over the same period last year, including more than 4,600 very large bridges. By now, China has more than 1 million highway and railroad bridges of all types, the largest number in the world. Bridges are an important part of today’s transportation system, and regular checks are needed to ensure transportation safety. Bridge cracks are major damage in the road networks that may cause an accident, and the increasing stress on roads and bridges is alarming [2, 3].

While China’s bridge construction industry has made brilliant achievements, there is still a long-standing problem of “emphasis on construction, not maintenance.” In the natural environment, bridges are inevitably damaged by various factors such as earthquakes, high-speed winds, and freeze-thaw cycles caused by temperature differences. In addition, the aging of concrete materials causes internal steel bars to rust, carbonize, and deform and various human factors such as increased traffic volume and vehicle overload. These factors will lead to the decline of the bridge’s health [4], affecting its service life. If there is no timely inspection and maintenance work, the damage to the bridge is likely to cause major traffic accidents. The occurrence of these accidents will not only pose a threat to people’s lives but also lead to incalculable losses of national assets. For example, on August 14, 2018, a 200-meter-long section of the Moranti Bridge in northwestern Italy collapsed during a storm (as shown in Figure 1(a)), killing 43 people and seriously injuring many others and forcing the relocation of more than 600 people in the area. It is known that the bridge had been in service for 51 years. In recent years, bridge accidents have also occurred in China. For example, on December 26, 2010, the bridge across the river in Yancheng City, Jiangsu Province, collapsed after being hit by a passing vessel; on July 26 of the same year, the bridge over the Yi River in Tangying, Luanchuan, Henan Province, collapsed due to flooding caused by heavy rains (as shown in Figure 1(b)), resulting in the disappearance and death of more than 90 people. Therefore, we should pay attention to the safety hazards of bridges and keep a constant concern for the health and safety of bridges.

In terms of management and maintenance of in-service bridges, the NDT technology of Chinese bridges is still lacking compared to that of developed countries and cannot meet the huge road and bridge traffic network in China. In this regard, how to combine current research hot issues through technological innovation is the problem and challenge faced by Chinese bridges at this stage of development.

In order to guarantee the durability and stability of bridge structures and minimize the incidence of bridge accidents, experts and scholars in related fields are committed to developing bridge health monitoring systems and studying bridge structure damage detection technologies. In order to achieve all-weather safety monitoring of bridge operation, judge the safety of the overall bridge structure at any time, and make an effective judgment and assessment of the location and degree of damage in time when bridge damage occurs, at present, some important bridges in the world have been designed and installed with bridge health monitoring systems according to relevant theoretical standards, such as the Tsing Ma Bridge, Hangzhou Bay Bridge, and Hong Kong-Zhuhai-Macau Bridge in China. Many other more advanced bridge health monitoring systems are being developed [5].

There are various methods for testing bridges and a wide range of objects to be tested. For example, bridge deflection, stress-strain, displacement, cracks, and expansion joints are all elements to be tested [6]. In this paper, cracks are chosen as the object of study because the site and type of cracks appearing in bridges can best reflect the characteristics of bridge defects, and also crack damage is the most common early breakage in bridge damage diagnosis. According to some literature data, more than 90% of the damage of concrete bridges is caused by cracks. Some bridge cracks are affected by load, impact, harsh environment, and other factors, gradually turning from small cracks to larger cracks, thus extending to produce new cracks. Sometimes there are even deep cracks and penetration cracks of larger widths. These types of cracks are particularly harmful to bridges and are an important issue that cannot be ignored in bridge accidents [7].

To this end, this paper uses the most widely used convolutional neural network in deep learning to recognize and classify crack images and proposes a migration learning technique to solve the problem of the large amount of training data required for training convolutional neural networks. There are several deep learning algorithms used in machine vision. However, CNN-based networks have been more effective in the domain of image classification and perform well on hard visual recognition tasks [8]. We introduce the data augmentation technique and sliding window technique to divide the collected crack data into a training set and test set. The experiments show that the method in this paper can classify the crack images better, extract and locate the cracks of bridge crack units, and finally extract the crack coordinates by boxing. Compared with the traditional image recognition methods, the method used in this paper is easier to operate in practical engineering, and the accuracy of the obtained results is higher.

In recent years, research on the detection and identification of bridge crack images using machine vision technology has gradually received attention from experts in the field of bridge damage detection. It is based on machine vision theory and aims to allow computers, drones, and other auxiliary devices to replace human hands, which can perform long-distance, high-resolution, and low-cost automatic inspection techniques for bridges [9].

The core of machine vision in the early years was digital image processing technology, which first appeared in the 1950s [10], and then gradually developed into a discipline, with the rapid development of computer technology and hardware devices, which in turn pushed the image processing technology to a higher level, now known as machine vision. Compared with several bridge inspection methods introduced in the previous subsection, the crack image detection method based on machine vision has not only high accuracy for crack detection but also has much lower detection cost in comparison. With the development of drone technology, carrying high-definition cameras on drones to inspect bridges can be completely independent of environmental factors such as terrain and bridge type, saving manpower and improving detection accuracy to avoid manual errors.

The image processing technique was used to detect the deformation of bridge cracks [11]. It was found that this method can accurately detect the deformation of bridge cracks under different loads, and the feasibility of this method was verified through experiments. Reference [12] studied the quantification of concrete cracks using image processing techniques, and the crack opening area was projected by designing an image software method. This technique was verified by using relevant experiments. Reference [13] proposed an image recognition technique for pavement cracks based on a neural network algorithm [14], in which operations such as median filtering and segmentation image enhancement were used to identify the cracks in the pavement cracks, identify the crack subblocks in the areas with severe damage to the pavement, and then further use the counting method to measure the width of the cracks. However, this method requires high pixel quality of the image. In [15], a study of machine vision-based bridge apparent damage detection was carried out [16], in which image data were first acquired using a CCD camera, the images with defects in the acquired data were quantified and segmented, the images containing damage were identified by classification using a histogram BP neural network, and finally crack defects were quantified and estimated. A sample database of defective images was established based on the acquired data. Reference [17] designed an acquisition system for bridge crack images, combined with image processing techniques to annotate the acquired images, and then performed image preprocessing and feature analysis, used the projection feature method to determine whether the images contained cracks, and realized feature extraction of cracks. In [18], the projection method was used to classify the crack images, the digital image processing algorithm was used to achieve automatic recognition of the crack images, the digital morphology method was used to measure the size of the cracks semiautomatically, and finally, based on the Visual C++ platform, a software that can detect the condition of the bottom of the bridge was developed.

References [19, 20] proposed an image preprocessing method for solving the problem of the large amount of noise in the acquired images due to the interference of light and stains. Firstly, the image smoothing technique is used to solve the problems such as varying illumination intensity and shadows, then the linear filtering property based on Hessian matrix is used to enhance the local features of concrete cracks, and finally, the threshold segmentation processing technique is used to separate and extract the cracks. Solving the problems encountered in engineering examples makes this something we need to learn. However, the problems encountered in different environments are different, and there are still many problems and challenges.

3. Traditional Bridge Damage Detection Methods

The traditional method of bridge damage detection is manual inspection, which mainly involves inspectors observing bridges for damage such as cracks directly with the naked eye or using auxiliary equipment such as binoculars, telephoto cameras, and scaffolding [7]; recording the location, thickness, and size of the cracks; and finally collating them to assess the health of the bridge. Figures 2(a)2(d) show an example of manual inspection.

A bridge inspection work vehicle is a special vehicle suitable for preventive inspection operations and disease repair of large- and medium-sized bridges. According to the special device, the bridge inspection test is mainly divided into two types: truss type and basket type, as shown in Figure 3.

Bridge inspection vehicles can play an auxiliary inspection role for the bottom of the bridge or some other parts of the bridge that are difficult to observe directly. However, there are still some problems such as high cost, safety hazards for the workers, and effects on traffic. Using the traditional methods, we may not be able to detect the defects in real time, which imposes more risks to transportation safety.

4. Migration Learning-Based CNN Construction

The convolutional neural network structure used in this paper is a variant of the VGG structure, where a filter of size 3 × 3 is used with a maximum pool, a step size of 2, and a window size of 2 × 2. ReLU is used as the activation function, which helps to increase the nonlinearity of the network. Compared to the prototype VGGNet, the configuration is changed to a fully connected layer with 256 neurons before the Softmax layer, making the network better compatible with migration learning due to the lack of training data.

4.1. Batch Normalization Layer

In this paper, we add a batch normalization (BN) layer between the convolutional layer and the pooling layer, which is also a layer of the neural network, like the convolutional layer, activation function layer, and pooling layer. Batch normalization is a network training technique that can improve the performance of the network model and suppress the occurrence of “gradient dispersion” during network training, thus making it easier and more stable to train deep network models.

The specific operation of the batch normalization layer is to insert a normalization layer before the input of each layer of the network, and the data is normalized and then output into the next layer of the network. This avoids changes in the input data distribution of the next layer due to parameter updates during the training of the lower layer of the network. The key point of its algorithm is the transformation reconstruction, where the learning parameters γ, are ss shown in equation (1).

Each will have a pair of parameters , :

The formula shows that the learned reconfiguration parameters γ, β can be recovered to the original one of the learned features of a layer. The final equation for the skin normalized layer forward conduction process iswhere m is the minibatch size:

4.2. Data Enhancement Techniques

In warp networks, acquiring more data will make the trained neural network model better, but data acquisition is often subject to various limitations in practice. Based on this problem, we propose using the data augmentation technique, which is to create fake data based on the real dataset and add it to the training set. Since the dataset used in this paper includes only 2000 images, which is a small dataset, the data augmentation technique is introduced here to suppress the overfitting phenomenon. The data enhancement in image recognition makes the image produce random rotation, horizontal flip, translation, reflection, random cropping, adjusting contrast, and so on. Data enhancement operations are generally performed after the data normalization process.

4.3. Transfer Learning Network Structure

In transfer learning, due to the difference in the types of images contained in the source and target domains, the most common approach when performing knowledge transfer is to remove the original initial fc layer and then use an adaptive fc layer in the target domain. The dimensionality of the new adaptive layer is selected based on the complexity of the image features, but determining the size of the dimensionality still depends on experience and selecting a more appropriate dimensionality by cross-validation.

Compared to the VGG-16 network with a training dataset of over 450,000 images, our cracked ImageNet currently has only 2,000 well-defined labeled images. In the absence of data, in order to avoid serious overfitting problems, we place a relatively small number of nerve fc layers. In the first adaptation layer, 256, 1024, and 4096 neurons were used for precomputation, and the results were not significantly different. Therefore, considering the computational cost, in the final design, two adaptive fc layers are used. The first one has 256 neurons, and the second one is the softmax layer. The detailed network structure is shown in Figure 4.

5. Experimental Verification

Before the network training, we performed two sets of image recognition classification experiments to verify the method’s feasibility.

5.1. Damage Detection

For the classification problem with supervised learning, we usually divide the labeled data into two sets: one for training and the other for testing. A total of 2000 cracked images were collected using a UAV and a handheld DSLR camera to randomly generate a training set and a test set. Some of the collected image data are shown in Figure 5. The ratio of the training set and test set is set to 4 : 1 based on experience. In other words, out of 2000 original images, 1600 are used for the training and 400 are used for the testing process. The 1600 images used for training were first cropped into smaller images with 224 × 224 pixels resolution, as shown in Figure 6. Before training, each image is labeled with defective or nondefective labels. Choosing a relatively small crop size can facilitate the training of the neural network. It makes it possible to capture finer features, such as scratches and shadows. However, smaller images make it more complicated to label classes and also require more computational power.

5.2. Sliding Window Technology

Since the input training data is cropped to a smaller image of 224 × 224 pixels resolution for easy training of the network, this causes some cracks to appear on the four edges of the image, as shown in Figure 7. As the image data passes through the CNN, the input image becomes smaller, meaning that cracks on the edges have less chance of recognition than cracks in the middle of the image. Secondly, not being sure if these features are real cracks may affect the occurrence of false labeling when labeling the training set. Even if the neural network classifies these images, its recognition accuracy decreases due to the difficulty in identifying the crack features. To solve this problem, we use a sliding window technique in the training step for detecting all locations located in the image space.

As shown in Figure 8, a schematic diagram of the scheme using the sliding window technique for scanning is shown. The length of time that the network processes the image data is related to the substance of the step size of the sliding window scan in the sea. It also affects the recognition accuracy of the image if an inappropriate window step size is chosen. Through testing, this paper sets the step size to 256, and in order to reduce the error rate of crack recognition, a complete scan is set to contain three scanning processes. The size of the scanning window is fixed at 256 × 256, and the first scan starts from the top left corner of the image with the coordinate mark (0, 0), the second scan starts from the coordinate (96, 96), and the third scan starts from the coordinate (176, 176), thus gradually scanning the whole image.

5.3. Comparison of Results

The experiments in this section first use the 1600 small sample dataset to train on the ordinary convolutional neural network (CIFAR-10 model) for 100 epochs, the loss value stabilizes, and then use 400 test samples to test the network model, using the validation accuracy and network loss value as the criterion. Then, the pretrained migration learning model is retrained with 1600 small samples, and the loss value does not decrease after 100 iterations of training. The results are shown in Figure 9.

Based on the variation curves of the accuracy and loss values in Figure 9, it can be observed that a certain degree of overfitting occurs. For both binary classification tasks, we observe about 10% overfitting (obtained from the difference between the training accuracy and the test accuracy). Since the training accuracy in the task reaches more than 90%, we can conclude that the main reason for the overfitting is the small amount of training data since the CNN architecture used in this paper already has enough complexity to cope with the classification problem.

The training is performed on the variant VGG model in this paper using a small sample dataset of 1600 images. The prediction results are compared with the prediction step curves of the model pretrained by migration learning, as shown in Figures 10(a) and 10(b).

In Figure 10(a), it can be found that the initial correct rate of prediction using the model directly on the data without migration learning is 24.7% when classifying the crack dataset, while the initial correct rate after using migration learning is 62.3%. After about 50 epoch iterations of training, the highest correct rate of 99.1% was achieved with migration learning. After about 70 iterations of epoch training, the recognition rate with zero-based learning also reaches the highest value of 97.4%. Through Figure 10(b), the loos value of pretraining by migration learning decreases faster, indicating that the network is trained faster. It is observed through the loss curve that the network is trained for the prediction process, and no significant overfitting occurs in the field. The iterative training curves are smoother and faster for both the 2-category task and the 5-category task. The comparison from Figures 10(a) and 10(b) also shows that the degree of overfitting increases for all classification tasks as the number of tasks increases.

6. Conclusions

This paper details the local details of using migration learning methods in CNNs. Firstly, image recognition experiments are used to verify the feasibility of neural networks to recognize cracked images. Then, the construction of all the migration learning neural network structures used in this paper is carried out, and a detailed network configuration table is given. Finally, the data are trained in the network using data augmentation techniques. The obtained results show that the migration learning approach can effectively avoid the occurrence of the overfitting phenomenon and also suppress the phenomenon of the network falling into local optimal solutions during gradient descent. More importantly, it greatly improves the network training speed and saves computational power and time cost. In the future, we intend to use a large-scale dataset to train the model to improve its accuracy. Moreover, we intend to use other deep learning algorithms and traditional learning algorithms and conduct a detailed analysis of the performance of various models in this domain.

Data Availability

The datasets used are available from the corresponding author on reasonable request.

Conflicts of Interest

The author declares that there are no conflicts of interest.