Material Design & Processing Communications

Review Article

A Survey of Machine Learning in Friction Stir Welding, including Unresolved Issues and Future Research Directions

Table 5

ML models used in UTS prediction.


Algorithm	Advantages	Disadvantages	Dataset, features, and outputs	Refs.

A random forest algorithm to approximate the ultimate tensile strength (UTS) of the weld. It works by building a decision tree, which is a tree in which each node performs a test on the input data to perform the required task (in this case, regression). The model was observed to have a test error (root mean squared error; RMSE) of 6.72, against mPa as a label	The random algorithm is known for being able to gather the most insights from the least possible data without preprocessing or data augmentation. This algorithm performs the best out of the three algorithms (random tree, artificial neural network, and M5P algorithm) that were trained using the same dataset	This performance with lesser data does not scale up. In other words, the performance of the algorithm does not improve with more data. It makes the decision tree more complex and hence takes a longer time to run but fails to increase the model accuracy by a margin as large as a neural network would	Input: 2 floating point numbers; the rotational speed and the feed rate Output: floating point number, the ultimate tensile strength of the weld (in mPa) Training data: 20 data points Testing data: 6 data points Validation data: no validation dataset used	[107–109]

An FCNN consisting of a single hidden fully connected layer. The input features it takes are floating point numbers, which are arranged in a vector. Hence, convolution filters were not required. The model was observed to have a test error (root mean squared error; RMSE) of 20.59, against mPa as a label	The number of neurons, hence weights remain constant throughout the training process. Adding more data will not change that	The meagre dataset provided is not enough to train all the weights in one pass (epoch). A large number of epochs are required, and with such little data, the chances of the model weights getting overfit are very high. Which will significantly reduce its accuracy with new data	Input: 2 floating point numbers; the rotational speed and the feed rate Output: floating point number, the ultimate tensile strength of the weld (in mPa) Training data: 20 data points Testing data: 6 data points Validation data: no validation dataset used	[66, 108, 109]

The M5P algorithm, which works on the M5 tree building algorithm, which is very close to how decision trees work. It differs from a normal decision tree in the sense that it performs linear regression with the leaf nodes of the tree, which in theory takes the best of both worlds. The model was observed to have a test error (root mean squared error; RMSE) of 32.03, against mPa as a label	(1) It takes the best of both decision trees and linear regression models, by attaching weights to the leaves of the decision tree (2) The M5 model tree adds extra steps, namely, pruning and smoothing, to make the model more robust	The data provided is not enough for the M5P model to gather enough insights. The pruning and smoothing steps were not as efficient as they would be if more data were to be provided. The regression layer also suffers from a lack of data to train its weights	Input: 2 floating point numbers; the rotational speed and the feed rate Output: floating point number, the ultimate tensile strength of the weld (in mPa) Training data: 20 data points Testing data: 6 data points Validation data: no validation dataset used	[108–110]

A linear regression algorithm was used, which in essence is a single neuron, i.e., perceptron from a FC layer, which is generally used in neural networks. It has a vector of weights and a bias which it adds to return a single floating point value. It is capable of executing regression and binary classification. In this case, it has been used to perform regression	(1) It is a very light model which will require very less time and computing resources to train. Moreover, since it has very few weights, it is very difficult to overfit (2) This is the most fitting model for the task as the experiment has only 45 data points, out of which only 32 are being used to train it	(1) It has very few weights, which may not capture the relation in the best way (2) The model is linear and hence will falter when made to predict nonlinear relations between the input(s) and the output with the best accuracy possible	Input: 3 floating point numbers, namely, the rotational speed, welding speed, and axial force Output: ultimate tensile strength (in mPa) Training data: 32 data points Testing data: 13 data points Validation data: validation dataset not used	[65, 111]

An SVM-based regression algorithm, which accepts the rotational speed and the travel speed (welding speed) of the pin, as well as the category of the material used, which are classified by tensile strength, namely, low tensile strength (LTS) and high tensile strength (HTS). These features were used to predict the mean tensile strength of the weld	SVMs excel in predicting simple relations between a small number of features. Moreover, SVMs do not require as much computational resources as a neural network to learn the relationship between the input and the output data	(1) The testing dataset size was not large enough to assess the accuracy of the model in the best possible way (2) The training data was not ample, which caused the model to overfit	Input: 3 features; rotational, travel speed of the pin, and classification by tensile strength Output: one floating point number depicting the mean tensile strength of the weld Training data: 12 data points (6 from each class) Testing data: 9 data points (6 for LTS and 3 representing HTS) Validation data: validation data not used	[54, 112]

Feedforward neural network combined with a BP algorithm and multiobjective particle swarm optimization Backpropagation is employed in this study with a network that has a two-neuron input layer for each input factor and a one-neuron output layer	(1) Sensitivity analysis was performed to find the parameters that need to be accurately measured. The findings of such an analysis would also provide useful information on the model parameters’ “robustness,” allowing for better decision-making	(1) In this method, the optimal activation function and the number of neurons were found using a hit and trial method which is time-consuming	Input: 2 features; TRS (rpm) and WS (mm/s) Output: ultimate tensile strength and hardness Training data: 20 data points Testing data: 5 data points Validation data: validation data not used	[58, 113]