#### Abstract

This paper deals with the prediction of surface roughness in manufacturing polycarbonate (PC) by applying Bayesian optimization for machine learning models. The input variables of ultraprecision turning—namely, feed rate, depth of cut, spindle speed, and vibration of the *X-*, *Y-,* and *Z*-axis—are the main factors affecting surface quality. In this research, six machine learning- (ML-) based models—artificial neural network (ANN), Cat Boost Regression (CAT), Support Vector Machine (SVR), Gradient Boosting Regression (GBR), Decision Tree Regression (DTR), and Extreme Gradient Boosting Regression (XGB)—were applied to predict the surface roughness (Ra). The predictive performance of the baseline models was quantitatively assessed through error metrics: root means square error (RMSE), mean absolute error (MAE), and coefficient of determination (*R*^{2}). The overall results indicate that the XGB and CAT models predict Ra with the greatest accuracy. In improving baseline models such as XGB and CAT, the Bayesian optimization (BO) is next used to determine their best hyperparameters, and the results indicate that XGB is the best model according to the evaluation metrics. Results have shown that the performance of the models has been improved significantly with BO. For example, the values of RMSE and MAE of XGB have decreased from 0.0076 to 0.0047 and from 0.0063 to 0.0027, respectively, for the training dataset. Using the testing dataset, the values of RMSE and MAE of XGB have decreased from 0.4033 to 0.2512 and from 0.2845 to 0.2225, respectively. Moreover, the vibrations of the *X*, *Y*, and *Z* axes and feed rate are the most significant feature in predicting the results, which is in high accordance with the literature. We find that, in a specified value domain, the vibration of the axes has a greater influence on the surface quality than does the cutting condition.

#### 1. Introduction

The elements used in optical applications are mostly made of glass, crystalline materials, polymers, or plastic materials. Properties such as degree of transparency, refractive index, and spectrum are the deciding factors for the choice of optical materials [1]. In recent years, polymer optics have become increasingly widely used. They offer advantages over traditional glass materials in different applications, due to their lightweight, low cost, high impact resistance, and flexibility in complex geometric design, despite their lower optical quality in comparison with glass materials [2]. Recently, various methods have been developed to mass-manufacture polymers for optics application such as compression molding of the polymer followed by multistep grinding to achieve the desired surface finish, with the workpiece being beveled to prevent surface damage from sharp edges, and high-pressure injection into a mold of the desired geometry. After the mold is filled, the liquid polymer is allowed to cool and solidify. The workpiece is then polished to the desired surface finish [3, 4]. In addition, a manufacturing process method improves flexibility and cost-effectiveness in the production of small batches in diamond turning. This is often a significant gain compared to glass optics, facilitating a great deal of freedom in optical style. A variety of polymers are used for optical applications, such as polystyrene, PMMA, polyurethane, and PC. In terms of mechanical properties, PC offers a range of benefits over its rivals, especially for lighting applications [5].

Recently, there has been a dramatic increase in the breadth of applications for ultraprecision machining using single-crystal diamond tools [6]―particularly in the fields of manufacturing optical or magnetic parts, such as magnetic discs, polygonal mirrors, copier drums, and different lens shapes. As a result, the technique has successfully been expanded to machining various soft materials, creating mirror-like surfaces with submicrometer geometrical precision using ultraprecision machining and a single-crystal diamond tool [7]. Previous research indicates that surface quality from diamond turning is influenced by many factors such as machine vibration, chatter, tool wear, tool geometry, built-up edge, chip striking, and lubrication. Indeed, the surface roughness of the diamond turning machine parts plays a crucial role [6, 7]. There are several difficulties when studying the surface roughness of diamond turning parts. One of the main obstacles is that machining is a nonlinear process, governed by several parameters such as tool geometry, the interaction between parts, cutting parameters, machine vibration, and material properties.

Determination of surface quality is an important step in the manufacturing of any machine. Designers of modern production systems and machines are always aiming to better control surface quality, using better computational means with new algorithms. This has led researchers to seek to better understand the prediction of quality [8]. Various mathematical models have been proposed in the literature for predicting surface roughness. For instance, in a series of works, Krolczyk et al. [9, 10] have constructed second-order polynomial prediction functions for predicting the surface roughness and tool life in the dry machining of duplex stainless steel. In such a mathematical model, the influence of different parameters, namely, cutting speed, feed, and depth of cut, has been revealed based on the Student’s *t*-test (comparison of two mean values of populations with Gaussian distributions and homogeneous variances). Krolczyk et al. [10] have exposed that the feed rate was the main influencing factor on the surface roughness. In another study, Kuntoğlu et al. [11] have also employed response surface methodology to seek the optimum cutting conditions, analysis of vibration, and surface roughness under different cutting speeds, feed rates, and cutting edge angles in turning of AISI 5140 steel. It is interesting to notice that the model developed in [11] can predict both surface roughness and vibration during the turning of AISI 5140 within an accepted range of error of 10%. They also found that the feed rate was the most affecting parameter in increasing the surface roughness, which is in accordance with Krolczyk et al. [10] in turning steel. For machining of alloys, Gupta et al. [12] have constructed an optimization procedure to estimate the machining responses of nickel-based superalloy. The optimization problem has been conducted using a combination of response surface methodology, particle swarm optimization, and teaching learning-based optimization techniques. Different parameters have been considered including cutting speed of machining, feed rate, and cutting tool angle, whereas the machining responses were cutting force input, the potential of tool wear, surface roughness, and the length of tool-chip contact.

Recent research has suggested that machine learning may accurately predict surface roughness in the turning process. Eser et al. [13] have estimated the surface roughness of AA6061 alloy in milling using artificial neural networks and response surface methodology. It should be noticed that in such a study, the impact of the cutting parameters on the prediction has been characterized by using variance analysis [13]. In another work, Elangovan et al. [8] built a Multiple Linear Regression (MLR) model to predict surface roughness on the basis of input parameters: feed rate, depth of cut, spindle speed, flank wear, and vibration signal. The artificial neural network (ANN) method has been applied to predict the roughness for different cutting parameters (cutting speed, depth of cut, and feed rate) [14–17]. Özgören et al. [18] have employed the ANN technique to predict the power and torque values obtained from a beta-type Stirling engine. The best ANN’s architectures have been determined such as 5-13-9-1 and 5-13-7-1, respectively, by using Levenberg–Marquardt learning algorithm [18]. In recent years, besides ANN modeling, some new ML methods have been introduced: Support Vector Regression (SVR), Gradient Boosting Regression (GBR), Linear Regression (LR), and Random Forest Regression (RFR) [19–21]. Pimenov et al. [22] have tested different machine learning models such as random forest, standard multilayer perceptrons, regression trees, and radial-based functions, for the prediction of surface roughness deviations from face milling machining processes. It is worth noticing that, in [22], the final machine learning prediction model has been developed in an automatic real-time manner for the machining processes. In addition, comparative research into ML models has been carried out to find the best in output prediction. The results for the SVR, polynomial regression, and ANN models indicated that ANN performs best in predicting lifetime but also worst in predicting cutting force and Ra [21]. This shows that not only does the regression of various machine learning models yield different results, but also a specific model has a certain effect on one type of output.

Therefore, this research sets out to investigate the application of SVR, Cat Boost Regressor (CAT), XG Boosting Regressor (XGB), Decision Tree Regressor (DTR), GBR, and ANN on surface roughness prediction. The paper is organized as follows. Section 2.1 introduces the database used in this study, while Section 2.2 outlines the research methodology. Section 2 gives a brief overview of all the machine learning models used in this work. Finally, Section 3 provides the results and discussion thereof.

#### 2. Materials and Methods

##### 2.1. Data Collection and Analysis

In this study, the results of surface roughness measurements for 35 experiment runs are harvested from the available literature (Bolat [23], published in open access mode). In Bolat [23], the PC sheet was cut into a workpiece whose diameter was 30 mm and thickness was 10 mm. All workpieces were numbered as shown in Figure 1(a). Then, the workpiece was mounted inside a fixture, and the material of the fixture was not affected by temperature changes (shown in Figure 1(b)). During the machining process, the mixture between Kerosene and air was used as cutting lubrication.

**(a)**

**(b)**

**(c)**

**(d)**

The monocrystalline diamond tool S95843 was mounted on a tool holder (Figure 1(c)). According to the tool numbering for monocrystalline of C0.5mLG, the cutting tool has properties such that the type of radius is controlled waviness tool, noise radius is 0.5 mm, the top rank angle is 0°, and front clearance angle is 10°. To implement the experiments, the Precitech Freeform 700 U four-axis diamond turning machine tool was used for 35 experiment runs with feed rate, spindle speed, and depth of cut being 1–12 *μ*m/rev, 1000–2250 rpm, and 3–50 *μ*m, respectively, as shown in Table 1.

The surface roughness prediction dataset consists of two subdatasets. These subdatasets were combined to make the prediction data. Table 1 shows all these 35 measurement results, with seven attributes. These attributes contain information about six independent variables—feed rate, cut depth, spindle speed, and respective vibrations of *X-*, *Y-,* and *Z*-axis—and a dependent variable: surface roughness.

To measure the roughness of the finishing workpiece, the white light interferometer method was used by the Zygo NewView 5000 device with a vertical resolution of 0.1–0.2 nm. The surface roughness of the finished workpiece was measured in terms of three different positions. The measured values of surface roughness shown in Table 1 are root mean squared of three measured results obtained from the interferometer measurements.

##### 2.2. Methodology

The data used in this study have been extracted using exploratory data analysis (EDA). The resulting dataset has been standardized by the log(1 + *x*) function to ensure a consistent scale and distribution for all variables. The input data are divided into training and testing datasets, respectively, representing 78% and 22%, corresponding to 27 trials for training and 8 trials for testing, respectively. The training of the models consisted of six regression methods: multilayer perceptron neural network (MLP-NN), SVR, CAT, XGB, DTR, and GBR. The testing dataset was used to validate the model. To assess the performance of the proposed model, various error metrics—RMSE, MAE, and *R*^{2}—were employed.

Figure 2 presents the flowchart of the model used in this paper: dataset extraction, feature selection, and different combinations of the dataset made to predict surface roughness. The figure also shows how to fine-tune the model parameters by using the Bayesian optimization algorithm to seek the best parameter for each model and then to determine the best model to predict Ra. Finally, the results and observations are discussed in Section 3.

#### 3. Machine Learning Algorithms

##### 3.1. Artificial Neural Network (ANN)

ANNs are complicated computational models inspired by biological neural networks, which are capable of regression, classification, and pattern recognition. There are different ANN-type algorithms, such as backpropagation neural networks [24], probabilistic neural networks [25], convolutional neural networks [26], time-recurrent neural networks [27], and long short-term memory networks [28]. With straightforward and original characteristics, the multilayer perceptron (MLP) model has been chosen in this study. MLP consists of three main layers, fully connected: the input layer, hidden layer, and output layer [29]. Thanks to its properties, MLP has been used to predict tool wear flank and surface roughness [30–32].

The main advantages of the ANN model are the capability to work with any type of input data (complete or incomplete) [33]; the information is stored on the entire network instead of a database and the capability of parallel computing, which helps to reduce the computational time. On the other hand, some drawbacks of this approach need to be mentioned such as the hardware dependence, which requires a lot of computer resources when the input data is large. In addition, the behavior and the duration of the network are hard to control, which requires a lot of trials.

##### 3.2. Support Vector Machine (SVM)

Support Vector Machine (SVM) theory was developed by Cortes and Vapnik [34], and a version of an SVM for regression was launched in 1997 [35]. Support Vector Machine consists of different branches in which Support Vector Regression (SVR) is an important application. Many ML algorithms follow the principle of empirical error minimization, while SVR follows the principle of structural risk minimization across a restricted range of learning patterns, so it can obtain better generalization [36]. SVR is a distinguished analytical tool. It is applicable because it uses linear improvement techniques to seek out optimal solutions to nonlinear prediction issues in relation to higher-dimensional features. Therefore, it has been widely used for forecasting in the fields of finance, agriculture, hydrology, the environment, etc., and especially, in mechanical machining [21, 44, 45]. Consequently, SVR is an appropriate model to predict Ra in ultraprecision machining PC for applications in optics.

In terms of advantages, SVM can work really well with high-dimensional input space and is relatively memory efficient. In terms of drawbacks, unlike ANN, SVM is not suitable for large datasets; it does not perform well with any type of data (for example, data set with more noise).

##### 3.3. Cat Boost Regression (CAT)

Most popular implementations of gradient boosting use decision trees as base predictors. It is convenient to use decision trees for numerical features, but, in practice, many datasets embody categorical features that are vital for prediction. CAT is a novel gradient boosting technology, developed by Yandex. It is associated with improved implementation within the gradient boosting tree algorithmic framework. This framework relies on a symmetrical decision tree algorithmic rule with few parameters, support for categorical variables, and high accuracy [46]. CAT improves the accuracy of the algorithm and its generalizability [47]. It has been successfully applied in many fields such as weather forecasting, media popularity prediction, evapotranspiration, and biomass [48, 49]. It is for this reason that the model is applied here to predict the performance of ultraprecision machining.

The advantages of gradient boosting approaches are their predictive accuracy compared to other machine learning models. These types of approaches have a lot of flexibility, which means that they can be optimized using different loss functions or several hyperparameter tuning options that make the function fit flexible. They can work directly with the input data, which means that no data preprocessing is required. In terms of drawbacks, gradient boosting approaches are usually computationally expensive. The minimization of errors in these approaches can cause overfitting, and the influence of parameters is quite heavy on the behavior of the approach.

##### 3.4. Decision Tree Regression (DTR)

Decision trees (for classification and regression) are classic ML algorithms. As a group, their learning ability is not outstanding, but they are well known for their generalizability and feature filtering. When used for regression tasks, they are called regression trees [50]. As the number of iterations increases, the model continues to learn. The training is stopped when triggered by hyperparameters such as the number of selected features, the maximum depth of the tree, and the minimum sample size of branches.

Compared to other algorithms that require data preprocessing, the decision tree algorithm requires less effort in this process. A decision tree does not require normalization or scaling of data. The decision tree algorithm can also work with incomplete data. In terms of disadvantages, a small change in data requires a lot of changes in the structure of the decision tree, which can eventually cause instability. The computational time of the decision tree is often expensive, especially when training the model.

##### 3.5. Gradient Boosting Regression (GBR)

Gradient Boosted Trees (GBTs) are a set of DTs, whose results are a combination of predictions of base models. DT-based ensembles like GBT have often been used in regression and classification problems, as they perform well [51]. GBT is an iterative algorithm, which means that each tree can take account of the error in the previous one. The final result of the GBT is the mean of the predicted results from all trees.

##### 3.6. Extreme Gradient Boosting Regression (XGB)

XGB is the most popular ML algorithm, developed in 2015. Regardless of the data type, it is well known to provide better solutions than other ML algorithms, because of its rapidity, efficiency, and scalability [52, 53]. It has been the focus of research in various fields [54–56]. In particular, in mechanical machining [52, 57, 58], XGB is a good choice to predict tool wear and surface roughness.

XGB is used for supervising learning problems, where we use the training data (with multiple features) to predict a target variable . Before we learn about trees specifically, let us first review the basic elements in supervised learning. Below are the analytical formulas for the regression math on which XGB is based [52, 59].

Let be a dataset composed of *n* samples and *m* features . A number of *k*-additive functions are used in tree ensemble models to evaluate the function . This function can be expressed as [52]where *k* represents the number of trees, is the *i*^{th} iteration of the training process, and denotes the decision rules of the tree and weight of leaf score. The regression space *F* can be expressed as follows [52]:with *q*, *T*, and representing the tree structure, the number of leaf nodes, and the corresponding weight, respectively. The errors of the model can be minimized using a regularized objective function, as shown as follows [52]:with being a differentiable convex loss function, in terms of regression type, which can be the mean squared error function, while is the regularization term that penalizes the complexity of the model to avoid overfitting, defined as [52]

denotes the complexity, and is a constant coefficient. Since XGB is an additive algorithm, the prediction of the *i*^{th} instance at the *k*^{th} iteration can be expressed as follows [52]:

The main objective of XGB is to determine an additive function that minimizes the objective function using the gradient descent optimization algorithm.

##### 3.7. Bayesian Algorithm Optimization

In XGB and CAT, the main goal of hyperparameter optimization (i.e., tuning) is to minimize the objective function defined in equation (5). There are two popular hyperparameter optimization methods: random search (RS) and Bayesian optimization (BO). In random search, the hyperparameters are randomly chosen from the predefined search domain, and the searching is independent of the previous boosting result [60, 61]. The main advantage of RS is that it can be applied to high-dimensional problems. Bayesian optimization can be considered a probability approach, using probability theories to optimize the hyperparameters [62]. In this work, we have chosen Bayesian optimization because of its performance, which has been demonstrated in previous studies in the literature [63, 64]. For simplicity’s sake, in the rest of the paper, we use XGB_opt and CAT_opt to denote the XGB and CAT models built by using BO, respectively.

Before GBR and XGB training, we need to initialize the model’s hyperparameters. However, the selection of network hyperparameters, based on experience or RS such as GridSearchCV and RandomizedSearchCV, currently requires a large number of attempts. On the other hand, to optimize the model’s performance and reduce computational time, the hyperparameter screening process must be optimized—this is the main purpose of using BO. The Bayesian optimization framework utilizes historical data to optimize the search domain and constantly predict the posterior piece of information [65]. In particular, suppose that we have a functional relation between the hyperparameters and loss function:where is the set of all hyperparameters in which is the set of hyperparameter combinations, is the optimal parameter combination obtained from the final optimization, and is the objective function.

In the proposed model, the hyperparameters are maximum tree depth (*D*), number of nodes in each tree (*γ*), number of trees (*K*), learning rate (*η*), regularization parameter (*λ*), and number of samples (*N*), as introduced in equations (1)–(4). The loss function is defined by the RMSE as [65]where is the *j*^{th} hyperparameter combination, *y* is the true value, and is the model output results obtained using the *j*^{th} hyperparameter combination .

The next step of BO is to construct the dataset , where is the *i*^{th} set of hyperparameters and is the corresponding error of the model output [65]:

The posterior probability is denoted as *D*. A Gaussian distribution is applied to the alternative model *M* whose mean and variance are denoted as *µ* and *K*, respectively. The specific functional expression *M* is obtained by fitting the dataset *D* [65]:

Based on *M*, the next observation is calculated using the acquisition function [65]:

In the Bayesian decision theory, the capture function works by calculating the expected loss corresponding to the hyperparameter space loss. In each iteration, the dataset *D* is updated by receiving the parameters and losses from the previous one. The main characteristic of BO is model construction based on historical data to optimize the hyperparameters for each model [66].

##### 3.8. Performance Assessment Criteria

Three statistical metrics have been used in this study to assess the performance of the proposed AI model in predicting surface roughness: root means square error (RMSE), mean absolute error (MAE), and coefficient of determination (*R*^{2}). These metrics are defined as follows [67–70]:where and are the measured and predicted values, respectively, and *N* is the total number of predicted data points. Higher values of *R*^{2} represent the better performance of the model. On the other hand, the better performance of the model is reflected by lower values of RMSE and MAE.

When contrasting the values of metrics, we will prioritize the evaluation criteria selected in RMSE, because this is a more suitable method than MAE when the model error follows a normal distribution. Moreover, RMSE has a distinct advantage over MAE in that RMSE avoids using absolute value, which is highly undesirable in many mathematical calculations [71]. Therefore, if comparing the prediction accuracy of various regression models, then RMSE is a better choice as it is simple to calculate and differentiable. Moreover, a higher value of *R*^{2} is considered desirable.

#### 4. Results and Discussion

##### 4.1. Prediction Accuracy of Various Baseline Models

The analyzed performance of MLP-NN, SVR, CAT, XGB, DTR, and GBT baseline regression models in terms of Ra prediction for diamond ultraturning is reported in this section. Table 2 shows the results of the various models acting on the training and testing datasets, sorted in ascending order of RMSE for the testing dataset. It can be seen that the predicted Ra varies considerably from one model to another. Using the testing dataset, XGB exhibits the best performance, in terms of all error metrics—this model yields the highest value of *R*^{2} and the smallest values of RMSE and MAE. With the training dataset, XGB also exhibits similar performance, with the highest *R*^{2}. The DTR model performs best in MAE with the training dataset and best in RMSE with the testing dataset. On the contrary, DTR exhibits the worst performance out of all models, in terms of both the training and testing datasets.

In this work, a comparison between different activation functions for the MLP-NN model was performed with initial parameters, layers (32, 16), optimizer (“Adam”), and activation (“relu”, “identity”, “sigmoid”, “tanh”, and “logistic”). Table 3 shows the results of using different activation functions acting on the assessment criteria using training and testing datasets, respectively. It should be noticed that parametric study was only conducted on activation functions in the present study.

As shown in Table 3, the “identity” activation function exhibits the best performance, in terms of all error metrics—this model yields the highest value of *R*^{2} and the smallest values of RMSE and MAE.

Figure 3 illustrates the performance of the six models on metrics on the training dataset. Each color of the bars corresponds to one metric. The ranking of the models is marked above each bar. As can be seen from Figure 3, there is a slight difference between the rankings in relation to the training and testing datasets. With the training dataset, the accuracy rankings with respect to the metrics of RMSE are GBT, XGB, DTR, CAT, MLP-NN, and SVR, while with metrics of MAE, the order is GBT, XGB, DTR, CAT, MLP-NN, and SVR. The rankings with metrics of *R*^{2} are XGB, GBT, DTR, CAT, MLP-NN, and SVR.

However, Figure 4 shows that there is a small difference in terms of performance ranking when using the testing dataset. Furthermore, the performance ranking varies as a function of error metrics. The accuracy ranking with RMSE metrics in increasing order is XGB, CAT, GBT, MLP-NN, SVR, and DTR, and that of MAE in increasing order is XGB, GBT, CAT, DTR, MLP –NN, and SVR, while ranking results with *R*^{2} metrics are XGB, CAT, MLP-NN, GBT, SVR, and DTR. Considering the performance using both the training and testing datasets, it can be concluded that XGB yielded the best performance whereas DTR exhibited the worst performance out of these models.

The line and scatter plots of the measured Ra and the predicted values found by the six ML baseline models with the training and testing dataset are presented in Figures 5 and 6 for each trial, respectively. The different colors of the scattered points in the figures represent the values predicted by different baseline models. In the training and testing datasets, the values predicted by the XGB model are the closest to the original values of Ra. The value predicted by the CAT model does not closely correspond to the original measured values of Ra.

However, as discussed above, RMSE is the preferred criterion for selecting better models, in relation to the testing dataset. Therefore, to continue to improve the prediction accuracy for Ra, we choose 2 models: XGB and CAT, for the next procedure, in keeping with the approach illustrated in Figure 2.

##### 4.2. Description of the Optimization Problems

The XGB and CAT methods require tuning hyperparameters to prevent overfitting and improve model performance. Table 4 presents the hyperparameters of the XGB and CAT models. As mentioned above, XGB and CAT belong to regression with a large number of hyperparameters. The value of the parameter is crucial and so must be carefully selected. However, to date, only heuristic methods have been put forward.

As discussed in Section 2, the overfitting in XGB and CAT is negated by optimizing their hyperparameters using the Bayesian optimization approach. To save computational time, in this paper, we focus only on the hyperparameters that have a significant effect on the model performance, as found in previous studies.

The two optimization problems (one for XGB and one for CAT) of this study are described in Table 4. For XGB, the decision variables are learning_rate, max_depth, subsample, colsample_bytree, reg_alpha, max_leaves, gamma, and min_child_weight. On the other hand, for CAT, the decision variables are learning_rate, depth, bagging_temperature, and num_leaves. The inferior and superior bounds of these variables are also given in Table 4. In this work, the objective of the optimization problems is to minimize the value of RMSE between predicted and experimental data points.

##### 4.3. Hyperparameter Tuning with Bayesian Optimization

The settings of the Bayesian optimization algorithm search domain are derived from historical data and also from initial tests [72–74]. In terms of other hyperparameters such as “*n*_estimators” (number of boosted trees), “min_child_samples” (minimum number of data points needed in a leaf), and “subsample_for_bin” (number of samples for constructing bins), default settings in Python have been applied [54]. Table 5 presents the evolution of the optimization procedure for XGB, for instance, whereas Table 6 indicates the best value found for each hyperparameter of the two methods XGB and CAT, respectively.

As discussed above, the objective of this study is to find the best model for predicting surface roughness. Therefore, XGB and CAT have been selected for performance comparison, with the Bayesian optimization algorithm being applied to each model. Table 7 demonstrates the comparison models under the metrics of RMSE, MAE, and *R*^{2} found by XGB and CAT, based on their hyperparameters in the best in the BO and the default in the baseline models. By comparing, we found the effect of hyperparameters on the metrics of the models. The correlation exhibits a promising RMSE and MAE in XGB_opt, achieving the lowest values shown in Figures 7 and 8 . Moreover, the metric of *R*^{2} significantly increases from 0.9999 to 1 in performance on the training dataset, and for the testing dataset, it increases from 0.7227 to 0.8924.

The line and scatter plots of the measured Ra and the values predicted by the optimized models XGB_opt and CAT_opt on the training and testing datasets are presented in Figures 9 and 10 for each trial, respectively. The different colors of the scattered points in the figures represent the values predicted by different baseline models. In the training and testing dataset, the predictions of the XGB_opt model are the closest to the original measured values of Ra. The value predicted by the CAT model does not closely correspond to the original values of Ra measured in both datasets.

However, as discussed above, RMSE is the preferred criterion for selecting better models in relation to the training dataset. Therefore, to continue to improve the prediction accuracy for Ra, we choose 2 models: XGB and CAT, for the next procedure, in keeping with the approach illustrated in Figure 2.

#### 5. Discussion

With the rise of modern technology, in the production process of PC for optical applications by single-point diamond turning, the equipment maintains a stable operating state, so the production process is also carried out stably. Therefore, for given strip specifications, most turning processes will obtain relatively stable datasets without wide variability. The key point of prediction models is to improve prediction performance. The main goal of this paper is to determine the optimal predictive model in relation to surface roughness, by comparing the performances of different models. In addition, in order to preserve the principal characteristics of the considered models, in this paper, we have not coupled those models with other optimization algorithms. Then, we performed hyperparameter tuning by using BO to find the best model.

By executing the XGB, we can determine which features contribute most heavily to the result. In short, the importance of each feature can be found and plotted. The feature importance graphs for each dataset are plotted in Figure 11. From Figure 11, we can clearly see that *Z*-axis vibration is the most important feature which has contributed to the prediction of the results, followed by the *Y*-axis vibration and *X*-axis vibration. This result is in accordance with Kara and Bayraktar [75], in which experiments showed that the surface roughness increased while increasing vibration of the machining tool (i.e., increase of the cutting speed). For instance, in Kara and Bayraktar [75], the cutting speed exhibited 42.14% of the influence on the surface roughness using the analysis of variance. However, it is worth noticing that the contrary is not confirmed (low value of vibration does not imply good surface quality) Bolat [23]. Also shown in Figure 11, the feed rate is a crucial variable highly affecting the surface roughness of the material. Kara and Bayraktar [75] and Krolczyk et al. [10] have also found similar results in their works. It has been obtained in these studies such that a low value of feed rate provides a small value of surface roughness and vice versa. An important point should be noticed such that there is an influence of feed rate on the surface roughness and energy consumption. Such consideration has not been conducted in the present study. Moreover, another limitation of the present study is that the lack of confirmation tests. Such procedures should be carried in order to assess the performance of the prediction model. Finally, analysis of variance should be processed in further studies to reveal how the input variables influence each other and the surface roughness. Such information is crucial to control the turning process.

#### 6. Conclusions

In this work, six machine learning models—namely, ANN, SVR, CAT, DRT, GBT, and XGB—have been applied to predict surface roughness in the SPDT process. We have compared the performances of the six models using experimental data. Three quality assessment metrics (RMSE, MAE, and *R*^{2}) have been used to evaluate the performance of each model. Subsequently, hyperparameter optimization using BO has been applied. The results are summarized as follows:(1)The result comparison has shown that there is little difference in terms of ranking when using testing RMSE. For example, the values of RMSE for XGB, CAT, GBT, GBT, MLP-NN, SVR, and DTR are 0.4033, 0.4704, 0.4812, 0.4829, 0.5032, and 0.5712, respectively. However, it has been concluded that overall, XGB, and CAT perform better, followed by GBR, SVR, and DTR, respectively(2)The results in relation to hyperparameters using BO stability show inconsistency in the evaluation metrics. XGB_opt performs best with RMSE, MAE, and *R*^{2} in both the training and testing datasets. It can be seen that using BO has improved significantly the performance of the 2 considered models, especially CAT. Indeed, the value of *R*^{2} when using CAT has increased from 0.9565 to 0.9893 for the training dataset and from 0.6229 to 0.7355 for the testing dataset. The value of MAE when using CAT has decreased from 0.1224 to 0.0579 for the training dataset and from 0.3503 to 0.2870 for the testing dataset. The value of RMSE has decreased from 0.1594 to 0.0792 for the training dataset and from 0.4704 to 0.3940 for the testing dataset(3)Comprehensively considering the prediction accuracy of the six models, XGB appeared to be the best model for predicting surface roughness. Indeed, this model has shown strong performance in terms of different qualitative estimators. For example, XGB exhibited the highest values in terms of *R*^{2} for both training and testing datasets. In terms of other estimators such as RMSE and MAE, XBG also produced one of the best values. In addition, XGB also performs best when combined with Bayesian optimization algorithm(4)We can clearly see that machine vibration is the most important feature in contributing to the prediction of the results—especially, vibration on the *Z*-axis, followed by the *Y*-axis vibration and *X*-axis vibration. The least important feature is the depth of cut(5)Overall, out of six initially proposed models, we have succeeded to filter out the 2 best models based on different qualitative estimators such as RMSE, MAE, and *R*^{2} for predicting the surface roughness in turning polycarbonate for optical application. In addition, we have improved the performance of these two best models using the Bayesian optimization algorithm

Certainly, further studies should be conducted to enhance the results of the present paper. From an experimental point of view, confirmation tests should be carried out in order to test the performance of the proposed method in real conditions. Besides, working parameters such as feed rate, cut depth, spindle speed, and cutting speed should be related to the surface roughness by an explicit equation, providing an easy and direct application by engineers/researchers in practice. This means that the proposed machine learning model should be employed to derive such an equation in further research studies. From a computational point of view, uncertainty quantification should also be performed, in order to propagate the variability of the experimental database. Besides, the number of data points for training and testing the models was not large, which may yield some unexpected effect in the results. For such a reason, more data points should be collected in further studies, or experimental tests should be conducted by the researchers’ group.

#### Nomenclature

PC: | Polycarbonate |

ML: | Machine learning |

BO: | Bayesian optimization |

ANN: | Artificial neural network |

MLP: | Multilayer perceptron |

CAT: | Cat Boost Regression |

CAT_opt: | CAT built with BO |

SVR: | Support Vector Regression |

GBT: | Gradient boosting trees |

GBR: | Gradient Boosting Regression |

DTR: | Decision Tree Regression |

XGB: | Extreme Gradient Boosting Regression |

XGB_opt: | XGB built with BO |

EDA: | Exploratory data analysis |

RMSE: | Root mean square error |

MAE: | Mean absolute error |

R^{2}: | Correlation coefficient |

: | Vibration of X-axis (mgRMS) |

: | Vibration of Y-axis (mgRMS) |

: | Vibration of Z-axis (mgRMS). |

#### Data Availability

The Excel data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.