Abstract

Aiming at the problem of insufficient number of samples due to the difficulty of data acquisition in the identification of tunnel lining defects, a generative adversarial network was introduced to expand the data, and the network was improved for the mode collapse problem of the traditional generative adversarial network and the problem that the generated image features were not obvious. On the basis of the WGAN-GP network, a deep convolutional network is selected as its backbone network, and the effectiveness of the deep convolutional network in feature extraction by Lv et al. (2022) is used to improve the quality of the images generated by the network. In addition, the residual module is introduced into the discriminator network, and the upsampling module is introduced into the generator network, which further solves the problem of gradient disappearance of the two networks during the update iteration process through the idea of cross-connection, while better retaining the underlying features, which effectively solves the problem of mode collapse and low quality of generated images in the generative adversarial network. Compared with the original network, the image quality of the generated adversarial network is improved, and the discriminator and generator losses converge faster. At the same time, the recognition accuracy of the YOLOv5 network is improved by 4.4% and the overfitting phenomenon is alleviated, which proves the effectiveness of the method under the limited training data set.

1. Introduction

In today’s era, rail transit has become an important means to solve urban traffic jams. Taking advantage of tunnels can effectively avoid the time cost and safety risks caused by traffic jams. However, the existence of the tunnel itself also has many security risks, such as the construction at the beginning of the engineering hidden trouble, time erosion under the aging problem, and the sudden geological disaster caused by the structural hidden trouble. How to avoid and repair these hidden dangers effectively is an important problem for national property and people’s safety. Using GPR image combined with deep learning to identify defects in tunnel lining can effectively save manpower and material resources, which is of great significance to the society [1]. However, it is not easy to obtain the data of tunnel lining disease, and there is no public data set for use. The lack of data set is likely to lead to the problem of insufficient precision or poor robustness of the trained identification network. Therefore, how to effectively expand the number of samples is a problem that must be faced.

In the traditional way of expanding data set, the number of samples can be expanded by rotating, clipping, and adding noise to the original data in the way of data enhancement [2]. Goodfellow et al. [3] proposed a generative adversarial network (GAN) model in 2014. Based on the idea of game theory, they constructed a network of mutually opposing generators and discriminators to generate images, and the image generation effect was relatively good. At present, GAN has been successfully applied in image repair and restoration, animation generation, superresolution image reconstruction, and other fields [4]. Arjovsky et al. [5] proposed to use Wasserstein distance to replace KL divergence and JS divergence in the original GAN objective function to construct a WGAN network in view of problems such as training instability and gradient disappearance in the original GAN. However, although WGAN network solves the problem of original GAN in principle, its effect is not significantly improved in the process of image generation compared with traditional GAN. Therefore, Gulrajani et al. [6] proposed Wasserstein generative adversarial network (WGAN-GP) with the introduction of gradient penalty terms. The existence of gradient penalty terms effectively stabilized the training of WGAN network and significantly improved the quality of generated images.

At the application level, Wang Jianlin et al. made use of the effectiveness of WGAN-GP in image generation to solve the problem of low detection accuracy of cooperative target caused by complex component structure and change of measurement environment in 3d precision measurement of large components by expanding the sample number of cooperative target image and realized multitype cooperative target detection [7]. Li et al. used WGAN-GP network to generate rice disease image samples, expanded the small sample set of rice disease image, and effectively enhanced the model training and learning effect [8]. Xu et al. [9] proposed an oversampling model based on convergent WGAN, called convergent WGAN (CWGAN), in order to improve the training stability of GAN oversampling to detect network threats. The training process of CWGAN consists of multiple iterations. In each iteration, the training time of the discriminator is dynamic, depending on the convergence of the discriminator loss function in the last two iterations. When the discriminator is trained to converge, the generator is trained to generate new minority samples.

The above methods still have the problems of poor sample quality or slow convergence of network loss. Aiming at this problem, this paper proposes a WGAN-GP generative adversarial network model based on deep convolutional networks and uses the superiority of deep convolutional networks in feature extraction to improve network performance. At the same time, residual module and upsampling module are introduced into discriminator network and generator network, respectively, which can avoid the loss of bottom features by crossing connections, alleviate the problem of disappearing gradient of network, and improve the stability of network. Finally, the generated defect data is combined with the original data to form an enhanced data set to improve the accuracy and robustness of the recognition network.

2. Introduction to GAN

2.1. Generative Adversarial Networks

Generative adversarial network (GAN) is a generative model proposed by Goodfellow et al. In 2014, its main idea is to make two neural networks continuously play binary minimax game, during which the model gradually learns the real sample distribution. Generally speaking, when the confrontation between two networks reaches Nash equilibrium, the training is considered to be completed [10].

Figure 1 shows the basic model of GAN. The input of generator network (denoted as ) is random variable (denoted as ) from hidden space (denoted as ), and the output generates samples. Its training objective is to improve the similarity between the generated samples and real samples, so that they cannot be distinguished by discriminator network (denoted as ). Let the distribution of the generated sample (called ) be as similar as possible to that of the real sample (called ). The input of is the real sample (denoted as ) or the generated sample (denoted as ), and the discriminant result is output. The training objective is to distinguish the real sample from the generated sample. The discriminant results are used to calculate the objective function, and the network weights are updated by back propagation.

Therefore, the training purpose of GAN can be described as minimizing the distance between and while maximizing the sample discrimination accuracy of discriminator . Thus, the expression of the objective function can be obtained:

2.2. WGAN-GP

The problems and challenges that the original GAN has been facing can be summarized in one sentence: the better the discriminator, the more serious the generator gradient disappears. During the training process, if the discriminator is trained too well, the generator will have a problem that it cannot continue to learn. When the discriminator is the optimal discriminator, the optimal discriminator is brought into the generator loss function, and after transformation, we can get the following:

At this point, KL divergence is introduced:

and JS divergence is as follows:

to measure the similarity between two data distributions, so Equation (2) can be written as follows:

If two distributions have no overlap at all, or if their overlap is negligible, their JS divergence is a constant . Whether and are far away or close at hand, JS divergence is constant , and this leads to the problem of gradient disappearance. At this point, the generator will not be able to propagate back, resulting in no further learning.

In view of the above problems, Wasserstein distance proposed in literature [5, 11] is used as a new similarity measurement standard. Wasserstein distance, also known as Earth-Mover (EM) distance, is defined as follows:

is the set of all possible joint distributions of and combined. For each possible joint distribution , it is possible to sample from it to obtain a true sample x and a generated sample and calculate the distance between the pair of samples: . represents the expected distance of samples under the joint distribution . And the Wasserstein distance is the minimum expected value of all possible joint distributions. The advantage of Wasserstein distance is that even if two distributions do not overlap, it still reflects the distance between them. The in Wasserstein distance definition cannot be solved directly, so the author transforms it into the following form based on existing theorems. Then, calculate the expected value of the sample distance under the joint distribution . And the lower bound that we can take on this expectation value in all possible joint distributions is defined as the Wasserstein distance. The advantage of Wasserstein distance compared with KL divergence and JS divergence lies in that Wasserstein distance can still reflect their distances even if the two distributions do not overlap. The of Wasserstein’s distance definition cannot be solved directly. The author uses an existing theorem to transform it into the following form:

where represents the Lipschitz constant, which is defined as an additional constraint is imposed on a continuous function , requiring that there exists a constant such that any two elements and in the domain are satisfied:

However, although WGAN is proved to be perfect theoretically, the real effect is not very good, mainly because of the Lipschitz continuity condition, and WGAN-GP is an improvement on the Lipschitz continuity condition. Its algorithm process includes two important processes: (1)Use random numbers to make an interpolation between the generated data (denoted as ) and the real data (denoted as )(2)Introduce gradient penalty term

Advantages of WGAN-GP compared with WGAN are as follows: (1)When the weight of WGAN is cut (for example, when the weight is cut to [-0.01,+0.01], it will lead to uneven weight distribution), WGAN-GP can make good weight distribution by using gradient punishment and give full play to the learning power of neural network(2)The gradient of is the entire sample space including the generated image and the real image. It is very difficult to find the gradient directly. However, using random numbers to make an interpolation between the generated data and the real data, to replace all of the parts can achieve similar results, and it is much simpler

3. Improved WGAN-GP Network Model

3.1. Generator Network Structure

The generator structure model is shown in Figure 2. Compared with the original network, the generator network has made the following improvements: (1)Upsampling is carried out by transpose convolution(2)Batch-normalization has been used in each layer except for the input layer of the network to stabilize learning(3)Remove the full connection layer and use the full convolutional network to increase the stability of the model(4)The activation function in the network uses ReLU function, and the last layer uses Tanh activation function

In each layer of the module, two different methods are used for upsampling, including one nearest neighbor upsampling and deconvolution. The two results are added and averaged to get the final output. This module can effectively prevent information loss and error caused by upsampling. Different sampling methods can extract different shallow local image features, which is conducive to the extraction of texture features and obtain better generation effect. At the same time, as a shortcut, it is completed directly by simple identity mapping, so there is no need to introduce additional parameters, reducing the computational burden.

3.2. Discriminator Network Structure

As the depth of deep network increases, the training of deep learning network becomes more and more difficult, mainly because in the training process of network based on stochastic gradient descent [12], the multilayer back propagation of error signal is very easy to cause the phenomenon of “gradient dispersion” or “gradient explosion.” This phenomenon has troubled the design, training, and application of deeper convolutional neural networks for a long time.

In 2015, the 152-layer ResNet (residual network) proposed by He et al. [13] won the image recognition champion of ILSVRC competition (top1 error 3.6%), which well solved the problem of training difficulties caused by network depth, and its network performance was far superior to traditional network models. Its main idea is to reduce the computational burden through the identity mapping of shortcut (shortcut connection), and its basic residual module network structure is shown in Figure 3.

Assuming that the expected mapping is, and the residual module converts the expected mapping to by using the shortcut, then there is during back propagation. The existence of constant 1 effectively ensures that the gradient will not disappear when calculating the gradient. At the same time, when the network is propagating, we can ensure that there is more than one path for the information at the bottom layer to be transmitted to the next layer. Even if the gradient of a certain network layer becomes 0, the information will be transmitted across the connection at this time to ensure that the performance of the residual network is better than that of the ordinary network. Therefore, the discriminator network structure of this design is as follows: (1)The convolution of stride and padding was added to replace the pooling layer(2)In addition to the last output layer, the full connection layer is removed and the convolutional network is used to increase the stability of the model(3)The LeakyReLU function is selected as the activation function in the network, while the last layer does not use the activation function(4)Introduce residual modules to optimize the network between each layer

Its structural model is shown in Figure 4.

4. Experiments

4.1. Data Set Establishment

The data used for deep learning this time came from a tunnel bureau in Xi’an. The interior of the tunnel lining was scanned and imaged by ground penetrating radar, and five kinds of disease images, such as voids and incompactness, were selected as input data for model training. A certain proportion is divided into training set and test set. After the traditional data enhancement of the training set and test set, the training set is divided into training set and verification set, which is convenient to understand the training degree in the training process. The collated data set of tunnel lining diseases includes 3800 ground penetrating radar images, and part of the collected disease images are shown in Figure 5.

4.2. Experiment Configuration

In the process of neural network construction, in order to reduce the repeated creation of various tedious environmental codes, the deep learning framework constructed in advance can be used. The deep learning framework used in this experiment is PyTorch framework, which has three main characteristics: concise, high-speed, and easy to use.

PyTorch is designed for minimal encapsulation, using existing code rather than rewriting existing code. PyTorch’s simplicity allows developers to understand the logic of each step of the code. At the same time, the code of PyTorch framework runs faster than many frameworks, and the implementation and invocation of the code is very convenient, so this framework is selected as the construction framework of all the networks in this experiment.

The experimental environment of this paper is Windows10 system, in which the batchsize of the improved WGAN-GP network model is set as 64, and that of YOLOv5 network is set as 2. The improved WGN-GP network uses Adam optimizer, and the initial learning rate is set to 0.0002 with 3000 iterations. YOLOv5 network iterates 200 times. Table 1 lists the detailed parameters of the experimental environment.

4.3. Result Analysis

(1)After the improved WGAN-GP network is defined, the collected data set of tunnel lining diseases is used to train the improved WGAN-GP network and compare it with the original network. As shown in Figure 6, the feature of disease image generated by WGAN-GP network reconstructed by convolutional network is significantly better than that generated by WGAN-GP network constructed by full-connection layer. The improved WGAN-GP network based on convolutional layer is obviously clearer than the WGAN-GP network based on convolutional layer. In addition, the number of parameters in the original WGAN-GP network using full connection layer is 25.2 m, and the number of parameters in the improved network is only 1.77 m, which reduces 92.98%, greatly reducing the operation cost

(2)At the same time, compared with the generation network that does not introduce the residual module in the discriminator, the improved generative confrontation network converges faster in both and and can be generated earlier. For stable and high-quality disease images, the loss functions of the two networks are shown in figure 7

(3)In the official code given by YOLOv5 [14], its target detection network has four versions of different depth and breadth, namely, YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x models. Among them, YOLOv5s has the smallest depth and the smallest width of feature map, while YOLOv5l is the largest. Take YOLOv5s as an example, its network structure is shown in Figure 8, which can be divided into input, backbone, neck, and prediction according to different stages of data processing. Among them, data enhancement, adaptive picture scaling, and Ancho box calculation of input image data are all completed in the input part. Backbone uses focus structure and CSP structure to extract the characteristic information of input data and sends it to neck for further study. The neck part adopts FPN+PAN [15, 16] structure, and the CSP2 structure is adopted to strengthen the ability of network feature fusion. Finally, prediction is made in the prediction part, loss function is calculated, and NMS [17] (nonmaximum suppression) is used to screen the multiobjective box

Finally, 1909 images were selected for the data samples to be integrated and labeled, and then, the original data set was mixed and sent to YOLOv5 network for training and recognition. Under the same test set, the tunnel disease test results are shown in Tables 2 and 3.

5. Conclusions

Generative adversarial networks have always been a hot research field in artificial intelligence. Its advantage lies in the introduction of a new data enhancement method, which can greatly improve the robustness and security of neural networks. However, it also faces prominent problems, including the efficiency of model training, the probability of model crash, and the quality of generated samples. Therefore, it is of great significance to further study and explore generative adversarial networks.

Under the premise of limited tunnel image disease data, this paper introduces generative adversarial network to expand the data and makes improvements to the problems existing in the original generative adversarial network, which greatly improves the quality of generated samples. At the same time, compared with the original data set, the average recognition rate of the training network using the expanded data set is increased by 4.4%, and the recognition rate of all classes is greatly improved. The experimental results show that the improved WGAN-GP network designed in this paper effectively improves the quality of generated samples, strengthens the stability of generative adversace network, and improves the recognition accuracy of the network effectively with the expanded data samples.

In this experiment, the generative adversarial network is mainly improved. In future research, in addition to further optimizing the generative adversarial network, the used recognition network can also be improved, which is believed to improve the recognition rate of diseases.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no competing interest.

Acknowledgments

This study is funded by the Key Scientific Research Project of Universities in Henan Province (19A510004) and the Key Science and Technology Project of Henan Province (222102210135).