Review Article

A Survey of Machine Learning in Friction Stir Welding, including Unresolved Issues and Future Research Directions

Table 4

Application of various models on friction stir welding process.

AlgorithmAdvantagesDisadvantagesDataset, features, and outputsRefs.

LeNET; a CNN consists of 5 convolution layers, with filter sizes of 28, 14, 10, and 5, with each layer consisting of 6 filters. Its task was to use the top view of the weld pool to predict the backside bead width as it could not be measured directly due to spatial limitations(1) The CNN has large filter sizes that let it absorb a lot of generic data for the given features
(2) Training will not require a lot of computational resources as it has very few layers
(1) Merely 6 filters per layer may not be enough to capture all the details from the training data
(2) Very few filters of sizes 5 or less were used. This prevents the CNN from getting enough breathing space to learn the finer details from the images
Input: an image, which was the top view of the weld pool
Output: a floating point number representing the back side bead width
Training data: 22687 images
Testing data: 2905 images
Validation data: 2905 images
[9193]

A CNN with 5 layers with filter sizes 5 in every layer. The convoluted layers had 8, 16, 32, 64, and 128 layers, respectively. On top of which 3 FC layers were stacked. The goal of the CNN was to classify the images into 3 different penetration stages, as it is difficult for a human to do soThe CNN is given more features to work with, i.e., the top and back image, which makes the already easy task of classification easierBatchnorm or dropout layers were not used. That coupled with an overwhelming number of parameters increases the odds of overfittingInput: 2 images, the top view and the back side view of the welding pool
Output: a one hot encoded vector of size 3, with each element representing the penetration stage in the weld pool.
Training data: 20573 sets of 2 images
Testing data: 2570 sets of 2 images
Validation data: 2570 sets of 2 images
[93, 94]

To increase the signal-to-noise ratio (SNR) and fault probing depths of transient thermography inspection for aluminum sheets and FSW, a multilayer perceptron feedforward NN with feature extraction methods is used.
Following data preparation, statistical moments, PCT, or TSLM (thermographic signal linear modeling) is used to extract features, and the data is then prepared for supervised learning. The classifying model is a Bernoulli distributed issue.
Each network consists of ten input neurons, one hidden layer with a nonlinear activation function, and one neuron with a logistic activation function in the output layer
(1) The ML approach was successful in raising the flaw probing depth to 2.2 mm with a true positive rate of 97.2 percent
(2) The CNN is provided with processed data using TSLM, which demonstrates an enhanced signal-to-noise ratio and a considerably more accurate classification at higher depths of the weld
(1) The training data provided for the FS sheet is of lower quality than that provided for the AL sheet; hence, the model appears to be less accurate in the FS sheet than in the AL sheet
(2) The algorithm employed for weight optimization does not guarantee that the global minimum will be found
Input: 10 features
Output: 0 or 1 where 0 represents no flaw and 1 represents a flaw.
Training data: not provided
Testing data: not provided
Validation data: 30 points
[95]

An RL algorithm implementation using a table. The goal of this algorithm was to identify the parameters, welding speed, and rotational speed to get the best possible outcome or the strongest possible weld. The weld quality was assessed with the help of 2 white LEDs whose light was reflected off of the welds to be captured by a CMOS sensor, the data from which was used to assess the quality and the strength of the weld pool(1) The RL agent learns completely on its own, without any aid from human observers
(2) The algorithm can be used to optimize both the quality of the welds and the process productivity, by penalizing slow welding speeds
The optimization problem was not solved in the most efficient manner. Maximizing the expected value of an infinite sum of random variables is a very time-consuming methodThere is not a requirement for a dataset, as RL algorithms learn by exploring the environment. To assess which actions affect the state in order to get a higher reward, while minimizing the penalties. Which essentially means that the algorithm creates its own dataset[68, 96]

RSM and a fuzzy logic model with four steps.
Fuzzy logic is used to forecast weld quality in terms of UTS and percentage elongation
(1) The ANFIS is a hybrid system that combines the benefits of artificial neural networks and fuzzy inference methods. (2) The ANFIS has a number of benefits, including the ability to capture a process’ nonlinear structure, adaptability, and rapid learning capacity(1) ANFIS is limited by the curse of dimensionality which is the increase in error with increase in features; therefore, the model cannot be employed if the programmer was to include a larger number of inputsInput: 4 features; tool pin geometry, TRS, WS, and tool tilt angle
Output: weld quality
Training data: 31 data points
Testing data: 5 data points
Validation data: 5 data points
[58, 97]

Wavelet analysis and support vector machines are used to classify friction stir welds using weld surface images.
Gaussian and polynomial kernels are used in the SVM classification algorithm
(1) The employed Gaussian kernel is more accurate than more commonly used polynomial kernels
(2) Multiple SVMs were used for the different outputs
(1) One of the method’s drawbacks is that only the most obvious faults are detected during classificationInput: 3 features: TRS (rpm), WS (mm/s), and PD (mm)
Output: energy, variance, and entropy
Training data: 112 points for each good and defective weld, respectively
Testing data: 70 data points from each of the above classes
Validation data: 42 data points
[58, 98]

The most important elements influencing the performance of FSSW joints have been discovered.
A four-factor, five-level factorial design matrix with a center composite rotatable matrix was used to account for the wide range of features. To determine the influencing nature and ideal condition of the process on TSFL, surface plots and contour plots, which are signals of probable factor independence, were created
(1) RSM is a powerful tool for fine-tuning FSW process parameters
(2) The discrepancy in strength between the predicted and experimental values is only 5%. The RSM is more accurate in the modeling and optimization processes
(1) Physical examination of contour plots is required; hence, the method is not fully automatedInput: 4 features: TRS (rpm), PD (mm), plunge rate (mm/min), DT (s)
Output: tensile shear fracture load (TSFL)
Training data: 16 data points
Testing data: 8 data points
Validation data: 6 data points
[58, 98, 99]

A Gaussian process reduction (GPR) algorithm was used to predict the maximum temperature that the weld pool would reachGPR algorithms are very efficient in terms of data required to train the algorithm. One can reach a very good fit without a lot of data. With very less data, the time required to train the model is very less, which makes it perfect for prototypingThe running time of the GPR algorithm scales cubically with , where is the number of data points. This makes it hard to work with large datasetsInput: 3 numerical features; weld speed, tool rotational speed, and tool angle
Output: peak temperature
Training data: 17 data points
Testing data: not used
Validation data: not used
[60]

An SVM was used to predict the maximum temperature that the weld pool would reach. The output from this would be used to control the parameters that affect the temperature (the input features) so as to reach the optimal temperature and keep the parent metals in the threshold where it stays in the plastic state to ensure the strongest possible weldSVMs have the same run time irrespective of the size of the training dataset. Moreover, this class of algorithms is very effective when the data is scaled up to higher dimensionsThe lack of data increases the chances of overfitting significantly, though it is difficult to overfit an SVM. This makes the reported accuracy unreliable as it may not have the same accuracy when used with a totally new data pointInput: 3 numerical features; weld speed, tool rotational speed, and tool angle
Output: peak temperature
Training data: 17 data points
Testing data: not used
Validation data: not used
[60]

A set of 3 FCNNs with 2 hidden layers, each containing 5, 10, and 20 neurons per layer. It was trained to predict the frictional power supplied by the tool by parsing the plunging load, rotational speed, tool diameter, and the interaction timeFCNNs or ANNs are very versatile and can learn to predict any sort of nonlinear relationship between the input features and the outputs. With a large set of hyperparameters to tune to ensure the best fitOne of the major drawbacks of the architectures used is that the increase in the number of neurons increases the accuracy of the prediction but at the expense of generalization. This could have been prevented by using dropouts or batch normalization layers in between the hidden layersInput: 4 numerical features; plunging load, tool rotation speed, tool diameter, and the interaction time
Output: frictional power
Training data: 1800 data points, with the train-test-validation split not mentioned
[100]

LBP is a descriptor used for classification in computer vision. It is able to distinguish texture and patterns from a particular image and can be used to detect defects in FSW(1) The algorithm is very light and requires less computing
(2) Can be used for real-time applications
(1) May not be able to detect all defects in the weld and still requires physical examination
(2) It is unable to extract defects in larger samples
Input: 1 feature; grayscale image
Output: region of defect highlighted
[101]

The method uses GLCM (gray-level cooccurrence matrix), GLSZM (gray-level size zone matrix), and DWT (discrete wavelet transform) for feature extraction which is then passed through an ANN for automatic weld defect classification(1) Feature extraction process is effective for higher-order textures (3 or more pixels)
(2) The implemented smart system helps NDT experts classify weld defects easily
Input: image of the weld
Output: classified defect name
Training data: 500 images
Testing data: 250 images
Validation data: not mentioned
[102]

This method first sends the data through an SVM and then through a naive Bayes classification algorithm; furthermore, a decision tree algorithm is applied to the dataset, and the dataset is subjected to a random forest algorithm and finally an ANN to detect the fracture location in dissimilar friction welded joint(1) Study proves that ANN is most successful in classification of defect location when the same dataset is provided to all algorithms(1) Lack of data to improve the accuracy of the algorithmInput: 4 features; upset pressure (MPa), tool rotational speed (rpm), burn off length (mm), and friction pressure (MPa).
Output: 0 for fracture at upper location and 1 is for fracture at weld
Training data: 22 data points
Testing data: not mentioned
Validation data: validation dataset not used
[59]

In this method wavelet, a packet is used to obtain the temperature signal components of different frequency bands; then, a LSVM (least squares linear system as a loss function) is used as model for classification and parameter optimization using genetic algorithm(1) Able to replace the inequality restraints of an SVM which helps in learning fast
(2) Very efficient for large scale and less complex problems
(1) Inefficient for large scale and less complex problemsInput: 3 features; temperature, rotational speed, and transverse speed
Output: one of three classes of coefficient of strength greater than 75%, between 65% and 75%, and less than 65%
Training data: 16 data points
Testing data: not mentioned
Validation data: validation dataset not used
[61]

A logistic model tree (LMT), which took in statistical data corresponding to vibrations from an accelerometer to classify the weld into three classes, namely, good, broken, and air bubble(1) LMTs combine the best of logistic regression and decision trees to give an accurate model
(2) Since there are a lot of features out of which many could be useless, the decision tree part of the algorithm helps in feature selection, while the logistic regression part does the classification
The calculations done to get the weights and trees are very complex, which causes a small change in data to drastically modify the architecture, or the output, which ends up wasting a lot of computing resourcesInput: statistical data from the sequential readings from an accelerometer; the mean, median, mode, standard deviation, skewness, variance, maximum, minimum, and count.
Output: a probability distribution of the weld being good, broken, or having an air bubble
[103, 104]

AlexNet, a CNN containing 5 layers of filter sizes 5, 3, 3, 3, and 3 with the number of filters being 96, 256, 384, 384, and 256, which took in images from the weld pool and classified them into two classes; ok and not ok(1) The CNN, which takes image data, gets a very clear idea of the defects. There is a strong correlation between the image of the weld and whether the weld is broken.
(2) The CNN outperforms (93.64% accuracy) the RNN (82.72% accuracy), which took in other features to try predicting the same thing
(1) The network, being very complex, requires much time to train. It needed 3 hours to complete 60 epochs of data
(2) The lack of batch normalization and dropouts reduces the generalization power of the network
Input: image of the weld pool
Output: probability distribution of the weld being ok and not.
Training data: 144 data points
Testing data: 110 data points
[105, 106]

A decision tree was trained to predict the probability of the tool breaking by looking at the raw variables corresponding to the welding process. After which, the tree was dissected to identify the parameters that most affected the tool’s chances of breaking and howThe decision tree, albeit a bad model for classification or regression, is very good at identifying and exploiting features which most affect the outcome. While this is one of its major drawbacks in most cases, here, it helps usThe decision tree does not have an understanding of the system behind friction stir welding; rather, it looks at the information gain and the correlation which might not imply causality. Hence, small changes in the training dataset may drastically change the architecture of the tree, thus changing the most important parameterInput: welding and rotational speeds, tilt angle, axial pressure, shoulder, and pin
Radius, plate thickness, and the work plate material properties of thermal diffusivity
Output: the probability of the pin breaking
[67]