Abstract

In terms of the problems of five categories of nonweld seam stripes, including inclusion, oil-spot, silk-spot, and water-spot, which interfere with weld seam recognition during robotic welding, a convolutional neural network (CNN) algorithm, combined with a multistage training strategy, is used to construct a digital model for weld seam recognition, on the basis of which the classification accuracy is compared with the standard model of seven categories of representative CNN. The results show that the ResNet model with a multistage training strategy classifies weld seams with an accuracy of 83.8%, which is superior to other standard models. In this study, the physical scenario of weld seam recognition is migrated to a neural network digital model, fulfilling the intelligent recognition of weld seams in complex scenarios based on the CNN digital model.

1. Introduction

Welding is a kind of method in machining that joints two parts into the required structure; it is the most basic process of equipment manufacturing. The welded joint is highly dependent on the skill and experience of the worker; bright light and toxic gases can be generated in this process, and they are harmful to the health of the worker. In this regard, it is imperative for the realization of intelligent welding in the industry. Fortunately, along with the progress of science and technology, various intelligent technologies supported by computer science have been widely used in many industries. This process liberates people from a harmful work environment, while improving production quality and efficiency. Welding has also benefited from it. The welding robots have been replacing workers to complete joint tasks in some simple scenarios.

The core technology of robots is to get welding position information from the photo, which is used to drive the actuator to complete the welded joint task. The processing of the weld seam information that is contained in the photo is a core technology, and in recent years, for accurate extraction of weld seam information, many advanced photo processing algorithms have been introduced. For example, the grayscale transform method, the neighbourhood averaging method, the median filter method, and other related algorithms are researched for photo noise filtering [1, 2]. Polar, threshold, grayscale centre of gravity, Hough transform, and other related algorithms are researched for weld seam stripe extraction [36]. The adaptive threshold method, random Hough transform, Steger, laser linear stripe threshold segmentation, and other related algorithms are researched for weld seam stripe localisation [7, 8], for improving the quality of weld seam photos, using laser stripe structure to accurately extract stripe features [9], and nearest neighbor clustering algorithms, directional template methods, genetic algorithms, etc. used for weld seam stripe extraction [1012]. In engineering practice, these methods are used to solve the weld seam localisation problem from different angles. However, in reality, the workpiece surface is shaped like a weld seam, such as by inclusions, oil-spot, silk-spot, and water-spot in nonweld seam stripes, which frequently occur. In the scenario with high similarity information interference, it poses huge challenges to distinguish them in real weld seam scenarios with traditional photo processing methods. What is worse, this phenomenon often leads to robot welding errors on the workpiece. Therefore, in order to improve the working efficiency and reliability of the welding robot, it is significant to recognise the linear stripes that are on the workpiece surface before performing positioning welding.

Convolutional neural networks (CNNs) are a deep learning algorithm specifically used to recognise objects with high similarity. In recent years, machine vision algorithms represented by CNNs have been widely used in face recognition, vehicle recognition, animal recognition, and other fields. Its core is that the special convolution structure can extract photo features in detail, so as to identify the target with high similarity. CNN meets the needs of the weld seam and the nonweld seam, respectively. In this paper, the CNN method is used to identify and position a weld seam under the interference of complex traces and to provide a real and reliable welding target for robots.

2. Principle of Weld Seam Recognition Based on CNN

CNN uses a convolutional form to extract pixel-level information, and CNN has the ability to recognise different types of targets in photos. In this process, all the features of the shape of the targets recorded by the pictures are precisely collected by the input layer of the network. What’s more, combined with the feature fitting method of neural network gradient update, under the stimulation of the input of a large amount of picture information, the network possesses the features between different types and records them in the form of weight values [13, 14]. Figure 1 from 1–19 shows the process of obtaining a CNN weld seam recognition digital model for weld seam recognition.

Figure 1: (1). Weld seam dataset, collation, and reading, (2). Standard size photo processing; this process implements the standard photo size required by the network, (3). Express the photo in terms of the number of channels, (4). Preprocessing of convolution samples, (5). Extraction of photo features using a convolution kernel and convolution calculation is shown in the following formula: where j,k denote the jth row and kth column of its feature vector. denotes the weight; i corresponds to the number of neuron feature vectors in the next layer; s represents the number of feature vectors in the previous layer. M, n denote the value of the (m,n) th convolution kernel, b is the bias, z is the neuron input in the layer, a is the neuron output in the layer (in this study, we consider z as the neuron input, after the activation functions are changed to a, that is, the neuron output), and l denotes the lth layer.

6 is the result obtained by convolution. 7 is the pooling process based on the convolution result, and pooling is calculated as follows:where ap denotes the output of the pooling layer, a is the output of the neuron of the layer (in this study, we consider z as the input of the neuron, and after the activation, function becomes a, i.e., the output of the neuron), d denotes the step of pooling, m,n denote the value of the (m,n) th of a pooling kernel. j,k denote the jth row and kth column of its feature vector, respectively. m,n denote the value of the (m,n)th of a convolution kernel. l denotes the l th layer.

8 is the result obtained from the pooling process. 9 is the transformation of the data dimension of the pooling result to facilitate data transfer with the fully connected layer. 10 is the fully connected layer, whose computation process iswhere ap denotes the output of the pooling layer. j,k denote the jth row and kth column of its feature vector; z denotes the node of the network where it is located. l denotes the l th layer. s denotes the node of the layer where it is located.

11 is the output of the results of the fully connected layer calculations. 12 is the output to perform operations with labels. 13 are updating weights and biases for backpropagation (between the output and hidden layers). 14 are updating weights and biases for backpropagation (between hidden and input layers). 15 is an intermediate function transfer. 16 is an intermediate function transfer 17-18-19. Convolution kernel and bias update, where the fully connected layer error is calculated as follows:where δlj is the middle term of the definition, z denotes the host network node, and l denotes the lth layer, j denotes the j th row of its feature vector. a is the output of the neuron in that layer. (In this study, we consider z as the neuron input, after the activation functions are changed to a, which is the output of the neuron).C denotes the value of the node where it is located, where the upper label shows the error of the fully connected layer and the lower label shows the error of the convolutional layer.

The fully connected layer passes backwards to the pooling layer:where z denotes the node in the network, l denotes the lth layer, j,k denote the jth row and kth column of its feature vector, ap denotes the output of the pooling layer. i corresponds to the number of neuron feature vectors in the next layer. s denotes the node in the layer. C denotes the value of the node, and δlj is the defined intermediate term.

The error of the pool layer to convolution layer:where z denotes the node at which the network is located. l denotes the lth layer. j denotes the jth row of its feature vector. ap denotes the output of the pooling layer. i corresponds to the number of neuron feature vectors in the next layer, and C denotes the value of the node in which it is located.

Error transfer between convolution layers:where z denotes the node in the network. l denotes the lth layer. C denotes the value of the node, and a is the output of the neuron in the layer (in this study, we consider z as the input of the neuron, and after the activation, functions become a, which is the output of the neuron).

Model parameter gradients:where l denotes the lth layer. i corresponds to the number of neuron feature vectors in the next layer. s denotes the node in the layer where it is located. C denotes the value of the node where it is located. δlj is the defined intermediate term, and a is the neuron output of the layer (in this study, we consider z as the neuron input, and after the activation, functions become a, i.e., the neuron output).

The migration of the weld seam information from the physical scenario to the digital model is achieved by the process of 1–19 in Figure 1, which enables the identification of the weld seam entities in the physical scenario through the digital model.

3. Process of Acquiring Digital Models for Weld Seam Identification

In the welding process, linear stripes on the surface of the workpiece to be welded, such as inclusions, oil-spot, silk-spot, and water-spot, which are highly similar to the weld seam, they are the main factor used by an interference welding robot to recognize the weld seam. The task of CNN is to establish a digital model for recognizing the weld seam and nonweld seam information in photos, so as to achieve intelligent automatic recognition without human participation. The difficulty of this process is establishing a highly accurate digital model for weld seam recognition. Normally, CNN fitting physical scenario information’s accuracy depends on the design of the network’s optimizer and loss function.

3.1. Optimizers and Loss Functions for Numerical Models of Weld Seam Identification

In terms of the future CNN network, according to the characteristics of the sample, researchers will select the appropriate optimizer and loss function based on experience. However, different optimizers and loss functions have their own advantages; the more mature neural network optimizers contain two categories: gradient descent and adaptive learning rate optimization algorithms. In each round of iteration, stochastic gradient descent (SGD) only randomly optimizes the loss function on certain training data. Compared to other optimization strategies, SGD has a more efficient rate and has the capability of quickly adjusting the value of the weight array to the global optimal solution. The algorithm’s mathematical principle is shown in the following formula:where θ is the parameter to be updated, η is the hyperparameter (learning rate), and ▽J is the gradient of the parameter to be updated.

The more mature adaptive learning rate optimization algorithms mainly include AdaGrad and Adam, among which the unique design of Adam’s momentum term makes it have better finding ability for a global optimal solution.

AdaGrad optimizer structure:where θi is the variable, η is the hyperparameter, and is the gradient of parameter θi at the moment of time t, where Gt is a diagonal matrix and the (i, i) element is the gradient sum of squares of parameter θi at the moment of time t.

Compared with the Batch Gradient Descent (BGD) optimization strategy, AdaGrad’s learning rate changes with the gradient calculation process, and its learning rate does not need to be adjusted manually, which makes it possible to fit the sample features with smaller gradients at the later stage and fit them more carefully. Its disadvantages are that the denominator will keep accumulating, so that the learning rate will shrink and eventually become very small.

Adam optimizer structure iswhere θt is the variable, η is the hyper parameter (learning rate, Lr), and the significance of other variables is shown as follows:where β is the hyperparameter and is the gradient of the parameter θi at the moment of t. If mt and vt are initialized as 0 vectors, then they are biased toward 0, so a bias correction is carried out, which is offset by calculating the bias corrected mt and vt.

The loss functions of neural networks mainly contain two categories: the mean square error loss and the cross-entropy loss, and the common loss functions are LTMAE (Loss transferring, LT) and LTsoftmax.

LTMAE Function equation is given as follows:where yi is the output of the network with input samples passing through the CNN, yip is input sample label value, and n represents how many input samples are counted at one time.

LTMAE is commonly used in loss calculation for regression problems, where the average value of the difference between the statistical network output and the sample labels (L1loss) or the difference squared (L2loss) measures the performance of the fit of the network’s common parameters to the sample features.

LTsoftmax Function equation is given as follows:where n represents how many input samples are counted at a time, is the weight b bias, yi is the output of the network with input samples passing through the CNN, and xi is the node input value.

LTsoftmax is commonly used in loss calculation for classification problems, where a specific function of the network output is transformed into a probability problem between 0 and 1, and the difference between the probability value and the sample label is used to measure the fit between the network prediction and the sample label.

3.2. Weld Seam Identification Digital Model Driving Strategy for Optimizer and Loss Function

The establishment of the CNN weld seam recognition digital model is performed as the network’s weights. Upon they are activated by the training dataset, the gradient descent method is used to converge on the features possessed by the weld seam and its interference nonweld type. The accuracy of this feature convergence depends on the activation method during weight update, specifically the choice of network optimizer and activation function. In terms of the recognition of weld seams under complex trace interference in this study, due to the highly similar background of photo information, the features between recognition targets are also highly similar. Therefore, using a single optimizer and loss function to fit the features is difficult, and it is a huge challenge to fulfill the task of recognizing weld seams under highly nonweld seam information interference.

In a complex trace interference environment, a high-precision weld seam recognition CNN model needs to integrate the advantages of each optimizer and loss function. In this study, two algorithms, SGD and Adam, are used in the training phase of the numerical model for weld seam recognition. The SGD strategy is used to quickly approximate the global optimal solution, while Adam introduced to a method overcome the shortcomings of the SGD, because the SGD will fall into a large number of local suboptimal solutions or saddle points in the highly nonconvex error function optimization processes, LTMAE and LTsoftmax. The purpose is to use LTMAE loss to increase the class spacing between weld seam and nonweld seam and calculate the LTsoftmax loss to ensure classification accuracy between different classes. That is, the design of multiple optimizations and multiple losses is used to overcome the difficulty of recognizing high-similarity physical scenarios in weld seam recognition. During the training process, the distribution of optimizers and loss functions is shown in Figure 2.

4. Results

4.1. Dataset Introduction

In the welding task, the common weld seam identification interference information on the workpiece surface contains four categories such as inclusion, oil-spot, silk-spot, and water-spot (https://www.kaggle.com/datasets/zhangyunsheng/defects-class-and-location (Figure 3)). That is, the dataset used in this study consists of 5 categories; each type contains information about 700 pictures, and there are 3500 pictures in 5 categories, small photos collected from public datasets on the Internet, and most of the photos collected in factories. All the photos together constitute this article’s dataset.

The composition of five categories of training sets, validation sets, and test sets in the training phase of the neural network weld seam recognition digital model is shown in Table 1. Operating environment GPU: GeForce RTX2080Ti; CPU: 8-thread Intel(R) Xeon(R) Gold 6130 CPU @2.10 GHz; RAM: 32 G.

4.2. Classification Results

The classification effects of a total of seven representative CNN standard models [1521] (all with a single optimizer and a single loss function) containing AlexNet, GoogleNet, ResNet, VGG, EfficientNet, MobileNet, and ShuffleNet are collected during the determination of the numerical model for weld seam recognition, and the classification accuracy of each model is shown in Table 2, where ResNet is the model with the highest accuracy for weld seam recognition among all algorithms in the table. Based on the ResNet model and using the driven strategy in Figure 2, the classification accuracy obtained was higher (1.6% improved) compared to the plain ResNet model.

4.3. Model Evaluation

Table 3 is the confusion matrix of the “ResNet + multistage training strategy model.” Inclusion, oil-spot, silk-spot, water-spot, and weld seam have 100 photos, respectively, in test data, and correctly classified photos are 85, 83, 78, 87, and 86 with a total accuracy of 83.8%. Table 4 is a two-class confusion matrix, which takes weld seam as positive, inclusion, oil-spot, silk-spot, and water-spot as negative. Table 5 shows the values of F-score, TPR, FPR, and accuracy, which variables are from Table 4, and compares our model (ResNet + multistage training strategy model, ResNet + M) with 3 kinds of the latest improved ResNet algorithm model (ResNet improved 1 [22], ResNet improved 2 [23], and ResNet improved 3 [24]) in Table 5.

As shown in Table 5, the ResNet + M model’s F-score is 0.6798, which is smaller than ResNet improved 1’s 0.8572, but is bigger than 0.6044 and 0.5482, which are the results of “ResNet improved 2” and “ResNet improved 3,” respectively; the value of the TPR of the ResNet + M model is 0.8600, which is bigger than other three contrast algorithms (the larger the TPR value, the better the model performance); the FPR of the ResNet + M model is 0.1675, which is smaller than other three contrast algorithms (the smaller the FPR value, the better the model performance); the accuracy value of the ResNet + M model is 83.80%, which is bigger than other three contrast algorithms.

5. Conclusion

As an important innovative application of computer technology in the welding field in recent years, welding robots are of great significance to promoting the development of the welding field, improving welding efficiency, and reducing welding costs. However, welding is, after all, a complex job with an unstable scenario and many factors that interfere with the quality of welding, which is limited by objective factors. At present, welding robots have only been relatively extensively applied in individual, simple, and high-repetition processes. Knowing how to accurately recognize a weld seam under complex interference information is an important prerequisite when robots are involved in a complex scenario of welding. In this study, the CNN algorithm is introduced for recognizing weld seams under the interference of multiple categories of nonweld seam information. The research results show that the ResNet model under the multistage training strategy can recognize weld seams with 83.8% accuracy; the overall performance is better than the other three contrast algorithms in this study.

This study establishes a CNN model for weld seam recognition in complex scenarios, which is based on deep mining of sample features in the training set and realizes the migration of weld seam recognition tasks from physical scenarios to digital models under complex welding conditions. The acquisition of a digital model is an important prerequisite for automatic weld seam recognition by robots under unmanned participation. What’s more, this study provides an important technical approach when robotic welding cannot be applied in more complex scenarios and has important practical significance for the further popularization of robotic welding applications.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request. The datasets for this research are from “Jiangsu Automation Research Institute,” and part of the dataset is sourced from the Internet, which is a public dataset (https://www.kaggle.com).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Data curation and formal analysis were done by Shi Chao. Funding acquisition, investigation, and methodology were carried out by Sun Hongwei. Liu Chao was responsible for the investigation, methodology, and project administration. Tang Zhaojia was responsible for provision resources and software.

Acknowledgments

This research was financially supported by the National Key R&D Program of China (2020YFB1712600), Development and Application Demonstration of Management and Control Platform for Shipbuilding process.