#### Abstract

High-speed machining is a technique that maintains a high interest in the manufacture of metal parts for the excellent results it provides, both in surface finish and in economic benefits. In the industry, the tendency is to incorporate data management and analysis techniques to generate information that helps improve the surface roughness results in machining. A good alternative to improve the surface quality results in the manufacture of metal parts is using predictive models of the surface roughness. In this document, we present work done with experimental data obtained from two high-speed machining (HSM) machines with different types of tools and cutting conditions, conducted under an experimental design with interest in three of factors commonly studied to generate surface roughness models: tool characteristics, cutting conditions, and characteristics of the machined material. Steel and aluminum alloys were used in the experimentation. The results are contrasted with prior experiences that use the same experimental design but with different soft computing techniques and they are also contrasted with the results of similar previous works. Our results show accuracies ranging from 61.54% to 88.51% on the datasets, which are competitive results when compared with the other approaches. We also find the axial cut-depth is the most influential feature for the slots datasets and the hardness and diameter of the cutting tool are the most influential features for the geometries datasets.

#### 1. Introduction

High-speed machining is a technique that maintains popularity in the manufacture of metal parts, with high levels of usability in the metalworking industry to manufacture metal parts with high levels of quality in surface finish [1–3]. The surface quality achieved with material removal techniques is very important in the manufacture of pieces from alloys of metals and plastic and in general materials that can be subjected to roughing [4]. The surface quality in a machined part depends to a large extent on the combination of factors such as the properties of the machined material, characteristics of the machining center, and the tool used [5, 6].

In the industrial field of mechanical cutting, the tendency is to incorporate data managing and analysis techniques to generate information that helps improve machining surface quality [7, 8]. Soft computing techniques help identify factors that affect the machining process and their most convenient values to obtain the best surface quality [6], while minimizing the associated costs, such as calibration costs, experimentation, and qualities measurement, among others. Surface quality is often related to surface roughness, although it can be calculated from several parameters. In practice, the surface roughness can be evaluated using the parameter roughness average (Ra) [2, 4, 9, 10]; this is the most common industrial parameter for this task according to [11] and previous works such as [10]. The Ra parameter can be measured relatively simple using profilometers [4, 9]. The surface roughness has a great influence on other factors of interest in manufacturing such as friction, electrical and thermal resistance, and the appearance of the machined part, among other factors that can affect its functionality [12]. Also, surface roughness can help to establish the relationship between the lubrication and other elements such as friction or wear between parts [13]. Friction causes wear between parts (particularly metal alloys), while ineffective lubrication increases friction; both of these affect surface roughness [14]. Works such as [15–17] present predictive models of surface roughness with particular conditions of lubricant use.

Surface roughness average is the most used parameter to estimate the surface quality according to [18]; it is also important because it provides ideas on surface finish [19, 20] and also provides information on the behavior of a surface in contact with other surfaces [9, 21]. In the metal fabrication industry, there are research works that evaluate the machinability of steel pieces according to parameters that influence the process, such as the work described in [22]. There are also several works that present surface roughness predictive models based on soft computing techniques, using parameters related to the cutting process as predictors, such as the characteristics of the machine or the characteristics of the machined material, examples of this type of work in [4, 6, 9, 16, 23]. A brief description of these works is given in the following paragraph.

An extended research has been conducted into the applicability of artificial intelligence and soft computing techniques for surface roughness prediction in mechanical cutting over the last years, follow previous works as [15]; the most common configuration (until 2011) was the multilayer perceptron (MLP) with a single hidden layer; though Bayesian networks [4, 9, 21, 23], genetic algorithms [19, 24, 25], and support vector machines [10, 15, 21] have been widely used for surface roughness prediction.

For example, in [10] the decision trees, Naive Bayes, Nearest Neighbors Classifiers, Multilayer Perceptrons, and Logistic Regression were used to generate methods for the early detection of multitooth tool breakages in the milling process. Decision trees have been used in conjunction with sound signal analysis in [26] in order to generate a predictive model of surface roughness. Also, fuzzy logic is used to generate predictive models of surface roughness in [27, 28]. Another example is the work described in [16], where a predictive model of surface roughness in deep drilling operations under high-speed conditions of steel molds was created using Bayesian networks. Furthermore, Bayesian networks were used in [4, 9] and a combination of Bayesian networks and Tree-augmented Network algorithms was used in [23] to generate a preprocessing model of surface roughness on high-speed machining (HSM) over metal probes.

The profitability of metal cutting operations depends to a great extent on factors such as precision in mechanical cutting, excellent surface finish, and minimum wear of the tool [14, 29]. All the relationships between surface roughness and factors such as lubrication, friction, and the mechanical force applied to the cutting process are closely linked and this is currently a point of great interest in the companies and the profitability of mechanical cutting [30, 31].

There are multiple works on the literature that use soft computing to estimate surface roughness or to study the factors that can affect surface roughness. However, in our review of the literature for this work, there were few works that used decision trees and to the best of our knowledge, no previous works with the specific technique of Gradient Boosted Trees to generate a predictive model of surface roughness.

This document presents a surface roughness prediction model that considers a subset of elements involved in the milling process that is related to the machined piece, the tool, and characteristics of the machine tool. To generate the predictive model of surface roughness, metal alloy pieces commonly used in the industry have been employed. The data used to generate the model are the result of experiments on two different machines and, in each one, various combinations of variables that typically influence the surface quality results in the milling process have been used.

The rest of the document is structured as follows. Section 2 details various concepts and related works in the field of predicting surface quality and machining, it also details predictive models, in particular, and it focuses on Gradient Boosted Trees. Section 3 details the materials and methods used in this work, presenting the experimental description, the implementation, and evaluation of the models, as well as their parametrization. Section 4 presents the results obtained with the implemented models alongside a comparison with other methods. Section 5 contains a discussion of the obtained results and the comparison with other state-of-the-art methods. Section 6 closes with the conclusions of this paper and potential future lines of works.

#### 2. Concepts and Related Works

##### 2.1. Surface Quality and Machining

The surface quality or surface roughness is intimately connected with the appearance of the machined or manufactured surface, which is normally expressed with a Ra value [3, 11]. In many cases, surface quality must comply with established standard values to be functional in certain industries (such as in the molds used to manufacture parts with plastic injection [16]). There are several parameters to establish the surface roughness value. The surface roughness average is the most widely used in the industry thanks to the ease with which it can be assessed (generally after processing) and the closeness with which it represents the surface texture of the mechanized part [32, 33].

The Ra value is usually calculated by integrating the arithmetic mean of the absolute values of ordinates* f*(x) within a sampling length (L). Each partial value of surface roughness can be measured using profilometers along the length of sampling L. The standard 4288 (1996) is the internationally used way to measure surface roughness in machining processes and this standard is further complemented with the standard 1302 (1992) which establishes 12 levels of surface roughness; these levels range from 0.006 to 50 nanometers (nm) [34].

Although low-speed machining can provide better surface roughness, it reduces the efficiency of the industry, implies more machining time, and consequently increases production costs [5]. High-speed machining is one of the processes with the greatest economic impact in the metal fabrication industry [2], thanks to the high level of surface finish that can be obtained with it [4], which influences the functional behavior of the resulting piece when subjected to friction, abrupt temperature changes, and other factors that may affect its functioning [6].

Currently, there is plenty of face milling research aimed at predicting surface roughness, a parameter that will decrease with respect to changes in other parameters like increased tool wear or flank wear, cutting force, depth-cut, or feed per tooth. There are several works that have been presented in the last 5 years in the topic of predicting surface roughness based on artificial intelligence techniques. For example, on [22], probes of AISI 4340 steel were used and cutting speed; feed and cut-depth were considered as the governing parameters for surface roughness prediction, while workpiece surface temperature, machining forces, and tool flank wear were taken as measures to check the performance of the estimation. In [30] 100CrMoV5 steel molds were machined with minimum quantity lubrication (MQL) and tool lifetime, flank face, and cutting speed were used as predictive variables.

Fuzzy logic and regression analysis were used in [28] for an empirical model for surface roughness and this model was used to calculate the influence of surface roughness predicted over the product profile of surface finish in a face milling process. Random forests, regression trees, and radial-basis functions were used in [35] to estimate the adequate thresholds values required to avoid rapid tool wear and predict the finish quality on a flat surface using face milling. In the same way, the relation between face milling (over steel-45 workpieces) and parameters like tooth flank, cutting speed, and feed was studied in [36]. Genetic algorithms and the Grey Wolf Optimizer algorithm were used in [24] to generate a prediction of surface roughness in ball-end milling over X210CR12 steel. Some of the works described above use postprocessing techniques to estimate the surface roughness (i.e., [22, 30, 37]), while others, such as [24, 28, 35], apply the techniques to estimate surface roughness in-process.

Furthermore, in [38] the authors present a comparative work of milling models on aluminum alloys A7075 with a diamond-like coating (DLC) tool and tool without DLC; the models were made with the aim of predicting the least wear of the tool. The results were experimentally better when using the tool model with DLC. Also, in [37] a model is presented that relates to the influence of factors such as tool wear and flank wear with the surface quality.

In spite of the high number of works, such as those described above, the in-process measurement of surface roughness is difficult and often unfeasible. Therefore, as it has been said in the introduction, having techniques to predict surface quality using postprocess data is a way of working that is gaining interest in the parts manufacturing industry. In this sense, predictive models have much to contribute.

##### 2.2. Predictive Models

In artificial intelligence, predictive tasks are one of the central topics of machine learning that involves inducing a model from training data (known as training instances), then this model can be applied to future instances to predict a target variable of interest [39]. There are several prediction algorithms, such as Logistic Regression, neural networks, decision trees, Bayesian networks, among others. These algorithms typically induce a model to learn to predict the best value of a target variable from training instances of a domain, with the aim of finding an optimal value of the target variable in all future instances of the said domain [40].

Many scientific articles in the literature work with predictive algorithms and particular training instances in a domain selected according to research interests. One advantage of working with training instances in a domain is that the predictive algorithm will find a more precise model that can generate good values of the target variable in the presence of new data.

There are several recent works in which artificial intelligence techniques are used to estimate the surface roughness in machining. In [26] a proposal for a semisupervised approach to the development of roughness prediction models, based on machine learning, is described; also, in [31] a modular road roughness classification system operates with the vehicle’s transfer functions (according to ISO 8608) is described; for example [16] applies Bayesian networks in the context of turn-milling using steel to generate a prediction of surface roughness. Also, in [4] neural networks are used for the same purpose. In [6] genetic algorithms were used for the prediction of surface roughness in micromachining of Copper C360. These artificial intelligence techniques have been shown to be able to generate predictive models with excellent accuracy even in with a lack of available data in a domain through the identification of patterns in the data [18].

##### 2.3. Decision Trees

Learning based on decision trees is a type of predictive model that uses a decision tree to go from observations of an object (represented as the branches of a tree) to a certain conclusion about a target value of the object (represented by the tree leaves). It is used in statistics, data mining, and machine learning and has had several applications, both at the academic level and in the industry [41].

This classifier is one of the easiest modeling techniques to interpret thanks to its graphic representation; they are didactic and easy to understand. They base their predictions on inductive learning; that is, they consider the values that the different attributes or variables take, creating in this way, a series of rules to be able to determine what value the dependent variable will take based on certain situations. It should be noted that the results delivered by the decision tree depend to a large extent on the volume of data contained in each category. The accuracy of the model with respect to reality will be better the greater the amount of data available of that combination of features.

Finally, in the industry of crafting pieces from machine-cutting, it is highly important to also obtain information about the factors that affect surface roughness and to also have the ability to influence such factors. In particular, decision trees are a useful technique to explain the aforementioned information, according to the works of [21, 42]. In this work, we have selected a boosting model based on decision trees, explained in the next section, to develop our models. This approach is appropriate since we are interested in both predicting surface roughness and explaining the relationships and influences among the different factors affecting the cutting process.

##### 2.4. Gradient Boosted Trees

The term boosting refers to a family of algorithms that convert weak learners into strong learners, understanding that a weak learner is only slightly better than a random choice, while a strong learner has an almost perfect performance [5]. The Gradient Boosting model is a machine learning technique that can be used in both regression and classification problems. This approach produces a predictive model in the form of an ensemble from weak predictive models, which normally correspond to decision trees as in the particular case used in this work (Gradient Boosted Trees (GBT)). This method builds the model in stages like other boosting methods but also generalizes these by optimizing an arbitrary differentiable loss function [43].

There are many works that use artificial intelligence techniques. The Gradient Boosting model is a machine learning technique that can be used in both regression and classification problems. The techniques of gradient boosting use an ensemble of weak models, in the case of this work trees, which together allow forming a stronger model. The ensemble is constructed in a stage-wise process by gradient descent in function space. The final model is a function that takes as input a vector of attributes to get a score so that , where each is a function that models a single tree and is the weight associated with the -th tree, so that these two terms are learned during the training phase [43].

On the other hand, one of the reasons for using GBT in contrast to other predictive models is that ensemble methods, in general, are usually the classifiers/regressors that deliver the best* out-of-the-box *results. In addition, considering that the basic model used to study the problem has been a classic decision tree, it has been considered natural to use its extension by means of an ensemble method.

One of the advantages of ensemble models is that their tuning process is reliable and easy when compared to other approaches such as artificial neural networks [3]. On the other side, a potential disadvantage of ensemble methods is the lack of interpretability [42]; however, this problem is also shared by other models such as neural networks, and in the case of GBT this can be ameliorated because of the ability to easily extract the importance of each feature, as done in this paper.

#### 3. Materials and Methods

High-speed machining is a type of milling operation. Milling is defined in previous works [44] as the process in which the cutting tool rotates at a fixed speed while linearly moving in a perpendicular direction to the axis of the tool.

On the field of machining processes, there are many inputs and parameters that have an influence on the resulting surface roughness; some of them can be controlled while others cannot be directly controlled [2]. The datasets from these processes are usually of limited size, mainly due to the high costs of machining tests [44]. This section describes the experimentation parameters and the sizes of the used datasets. Thus, we now present an overview of our methodology for replicability considerations:(1)Experiment Preparation. Two different kinds of experiments (Exp-1 and Exp-2) were implemented. Accordingly, two types of probes have been used. For Exp-1, we prepared test pieces of steel F-114 (F-114 is the Spanish notation for this steel (http://www.splav-kharkov.com/en/e_mat_start.php?name_id=87) with 175-220 Brinell of hardness) with dimensions of 190x100x20 mm. For Exp-2, we prepared test pieces of aluminum known as Planoxal (hardness 85-150 Brinell), with dimensions of 180x110x20 mm. Since aluminum is more malleable than steel, in Exp-2 it was important to know the hardness of the material, and so the hardness (HB) was measured before the process in each test specimen of Exp-2.(2)Experimentation. This step consists of performing the milling for each one of the prepared experiments. The different setup of Exp-1 and Exp-2 are made using all the combinations of values for the predictor variables, these are described in Table 1. In all cases, the surface roughness was measured with the Karl Zeiss Surfcom 130 stylus profilometer.(3)Preprocessing. In this step, the Exp-1 and Exp-2 datasets are prepared. In detail, this step consists of computing the average values of surface roughness (Ra) and associating this result with the experimental conditions (i.e., the values of the predictor variables).(4)Model Generation. In this step the models for Exp-1 and Exp-2 are generated through machine learning; the details are in Sections 3.2 to 3.4.(5)Analysis of Results. This analysis is done in two dimensions: (1) checking the quality of the models and (2) validating its practical utility. For (1), classical machine learning metrics (e.g., recall, precision, and accuracy) were used to measure the performance obtained in the classification of surface roughness. For (2), the validation is done by comparing results (the GBT models) with other classifiers that have been used in the literature.

##### 3.1. Experimental Description

As mentioned before, two different experiments were designed to obtain the experimental data: one for machining slots on steel F-114 test specimens (Exp-1) and another for machining geometries using aluminum alloy (Exp-2). Each of these two experiments was performed initially in a machining center and later in a second machining center with different characteristics to the first, to validate the experimental design and the predictive model of surface roughness obtained in the first machining center. Also, in both experiments the machining centers were equipped with high-pressure coolant fluid: Houghton HOCUT b-750 cutting oil at 5%, a type of coolant fluid frequently used in the industry for its high quality and anticorrosive properties [45].

Predictive models have been used by authors on this domain in order to analyze the behavior of machines in particular cutting conditions. For example, Pimenov et al. [35] declare that a predictive model of surface roughness has the ability to understand new machining working conditions. In our case, predictive models help us analyze the value of surface roughness in new experimental cases. A descriptive model is not adequate in this case because it does not directly provide the ability to predict new cases.

The experimentation in HSM is generally expensive; thus, for the experimentation described in this paper the experimental model described in [9, 23] has been used and the same machining centers described in [9]. In detail, the first process of obtaining experimental data was performed with a Kondia HS1000 machining center (hereinafter referred to as M1) with CNC Siemens 840D, axis = 3, spindle speed of 24000 rpm, and maximum power of 17.5 KW and the second on a Versa model machine (variant 675004) (hereinafter referred to as M2) with CNC Heidenhain TNCi530, axis = 5, Spindle speed of 15000 rpm, and maximum power of 50.0KW.

In each of the experiments (Exp-1 and Exp-2) conditions were set related to the tool, machine (cutting conditions), and material to be machined or test specimen. In that order, said characteristics are described below for each experiment. Table 1 shows the information of the variables used on Exp-1 and Exp-2; the symbol, units, and admitted values are described for each variable.

Exp-1 was designed to generate linear cuts over steel probes and measure surface roughness on a linear surface; the tool characteristics were its diameter and the number of flutes. Karnasch tools (models 30.6455 and 30.6465) of different diameters were used. Considering standard machining of slots in the manufacture of molds, four different diameters of tools were used: 6,8,10 and 12 mm, and for each type of diameter, variations of 2 and 6 flutes were used per tool (flutes). Two slots lengths were made for each diameter and flutes variation (2x4x2), for a total of 16 experimental-tool combinations.

The characteristics of the cutting conditions are the axial depth of cut (ap), advance speed/feed rate (F), and spindle rotation speed (n). The machining of slots was done with F = 1500 mm/min and n = 5000 rpm initially, and then increments of 25, 50, and 75% of the initial F and n were applied. For each of the experimental-tool combination (described above) variations of ap (0.2, 0.4, 0.6, and 1.0), F (1500, 1875, 2250, and 2625), and n (5000, 6250, 7500, and 8750) were applied (see Table 1). Thus, the complete experimental set included 124 different conditions. All the tests were repeated on each cutting-machine to increase the amount of data to obtain a dataset of 270 samples for M-1 and 150 samples for Machine-2; for each experiment the incomplete records were removed leaving a training dataset of 251 records and 123 records, respectively.

Examples of combinations of the variables described above (Table 1) are shown in Table 2; these values correspond to the parameters used in the tests and to the calculation of the surface roughness (postprocess) according to the way of calculating the Ra value described in the introduction. In synthesis, for Exp-1, five predictive variables and the surface roughness class were used. Both the class and the rest of the continuous variables were discretized as shown in Table 2 and based on the experimental design and the discretization described on previous works such as [4, 9, 16, 23].

In order to calculate the surface roughness average (Ra), partial values of surface roughness in a slot were measured. In order to do this, contact profilometers were used such as described in [4, 23]. For each slot, four partial measurements of surface roughness were made, then a final value of surface roughness per slot was obtained by averaging. The continuous values of surface roughness were grouped in ranges; these ranges were created based on the following criteria:* next_value* = (*previous_value* + 60%* previous_value*) +* dX*, where* dX* is a variation that considers the error margin from contact measurement (by manual profilometer).

In accordance with ISO: 4288 (1996) and ISO: 1302 (2002), there are several discrete values for Ra that are related to continuous values (all values in nanometers) [4, 23]: 0.10 = Mirror, 0.20 = Polished, 0.40 = Ground, 0.80 = Smooth, 1.60 = Fine, 3.20 = Semifine, 6.30 = Medium, 12.50 = Semirough, and 25.0 and 50.0 = Clear. Thus, in accordance with previous works such as [4, 9, 16, 23], the labels Smooth, Fine, Semifine, and Medium were created for surface roughness average as shown in Table 2. Ra values greater than 10.1 were discarded because these values are of little use for the metal parts manufacturing industry or the aerospace industry. Examples from the datasets and the discrete values of surface roughness average are shown in Table 2. Table 3 shows the ranges and discrete values for variables and for the surface roughness average in the case of the slots.

Exp-2 was designed to generate nonlinear cuts (geometries) to measure surface roughness on a radial surface. The tool characteristic was tool diameter (diam); tools Karnash of different diameters (10, 12, 16, and 20 mm) were used, but the same number of flutes (4 flutes). The characteristics of the cutting conditions for Exp-2 are ae, ap, f, and n (see Table 1). For Exp-2, circumferences with a radius of 3.5 cm and height of 1,5 cm were initially considered and two different radial cut-depths were made (1 mm and 0.50 mm).

The machining of geometries was performed with n = 8000 rpm initially; after making increments of 20% for each subsequent value the set n = was generated. Also, the initial value of F was 500 and two representative values based on the analysis of previous experimental results were selected; thus, the set F = was considered.

As it has been said above, the surface roughness labels were assigned in accordance with the average roughness established in ISO: 4288 and ISO: 1302 (see Table 3). For each machined circumference, six partial measurements of surface roughness were made, then, a final value of surface roughness per geometry was obtained by averaging. Thus, the complete experimental set for geometries included 164 different conditions. As discussed in the previous description of Exp-1, all the tests were repeated on each cutting-machine to increase the amount of data. After excluding incomplete data, a total of 431 records were obtained for M-1 and a total of 242 records were obtained for M-2. Examples from the datasets and the discrete values of surface roughness are shown in Table 4.

A summary of both the class labels and the continuous variables discretization (described above) is shown in Table 5. As a summary, Table 6 shows values related to the class and the dataset in each case.

##### 3.2. Implementation of the Models

To obtain the models described above, RapidMiner Studio 7.6 ® (free version) will be used. This tool allows for obtaining various machine learning models from the data [46]; in particular, we can use it to build GBT models. The workflow used with this tool is shown in Figure 1. This process is applied to each one of the four distinct datasets evaluated.

The datasets have been acquired following the experimental design described in [9, 23] and those datasets have been prepared as described in Section 3.1 of this document. Therefore, the workflow in this work starts in the Read Data task (see Figure 1).

##### 3.3. Evaluation of the Methods

In this research, a series of methods will be evaluated, measuring the performance obtained in the classification of the variables of interest. The different evaluation metrics to be used in this work will be detailed below; in particular, the performance indicators to be used to compare methods are* recall*,* precision*, and* accuracy*. These metrics are standard in the machine learning literature and the presentation made in the following paragraphs is based on the work of Sammut [47].

Based on the data obtained in the experiments, we will obtain a confusion matrix. This matrix will facilitate the analysis needed to determine where classification errors occur. The confusion matrix is a table that shows the distribution of errors in the different categories. The performance indicators necessary to evaluate the performance of the classifier to be implemented, specifically* accuracy*,* recall*, and* precision,* will be calculated using this matrix. An example of the structure of this matrix is shown in Table 7 for the case of two classes (positive and negative in this example).

*a* is the number of correct predictions for positive instances,* b* is the number of incorrect predictions for negative instances,* c* is the number of incorrect predictions for positive instances, and* d* is the number of correct predictions for negative instances. The simplest indicator to evaluate the performance of a classifier is* accuracy (**)*, corresponding to the ratio of correctly classified examples on the total of examples in the dataset [48]. This indicator can be calculated based on the data of the confusion matrix according to (1) (it is assumed that the dataset is not empty).The other indicators, that is,* precision* and* recall*, are understood as measures of relevance.* Precision* is the proportion of true positives* (a)* among the elements predicted as positive. Conceptually,* precision* refers to the dispersion of the set of values obtained from repeated measurements of a quantity. Specifically, a high* precision* value () implies a low dispersion in the measurements.* Recall* () is the proportion of true positives predicted among all elements classified as positive, that is, the fraction of relevant instances that have been classified.* Precision* and* recall* are calculated according to (2) and (3) (assuming and , respectively).Recall and precision and particularly well-suited for unbalanced datasets [10]. Thus, given the unbalanced nature of our data, these metrics are appropriate for the evaluation of the dataset. Meanwhile, accuracy is not necessarily useful in the case of unbalanced data (e.g., it is easy to create an “accurate” classifier by choosing the most recurring class, but this would hardly be useful); however, it allows obtaining a general view of the performance of our models when taken in the context of the other metrics.

##### 3.4. Parameterization of the Models

The original data was split into 80% for training and hyperparameter tuning and 20% for testing the final model in order to obtain an unbiased estimate. The optimal model was found using K-fold cross-validation with 3-folds and using a grid search. For GBT the optimized parameters were(i)number of trees: ;(ii)maximal depth: ;(iii)minimum rows: .

There are other parameters in this model that remained fixed for simplicity and thus they were not optimized. In particular, they are the number of bins (20), the learning rate (0.1), and the sample rate (1.0). The optimal hyperparameters alongside its accuracy on cross-validation and the test set are shown in Table 8.

#### 4. Results

##### 4.1. Gradient Boosted Trees for Slots

The results obtained are shown for the slots dataset of both machines with the models given by GBT; both the confusion matrices and the models obtained in each case are presented. Table 9 shows the results obtained with GBT for the slots dataset of M-1. The final accuracy is 78.00% on the test set. The results seem to be mainly balanced; however, the Semifine class presents the lowest precision and recall of each one of the classes.

Table 10 shows the importance of each of the variables with respect to the slots dataset M1. As can be seen here, the most important variable corresponds to axial cut-depth (ap), followed by the rotation speed (n). On the other hand, the diameter of the tool (diam), the feed rate (F), and the number of teeth (flutes) are considered less relevant for a prediction according to the analysis carried out in this dataset, since they take importance values of around 20% or less.

Table 11 shows the results obtained with GBT for the slots dataset of M-2. In particular, it should be noted that the final accuracy is 61.54%, the lowest one in all the performed experiments. The failure of the model seems to occur with the Semifine class, which by analyzing the results of the confusion matrix seems to be hard to distinguish from the Fine class.

Table 12 shows the importance of each of the variables with respect to the M2 slots dataset. As can be seen here, the variable of greater importance corresponds to the axial cut-depth (ap) and then the diameter of the tool (diam) in second place. Noting that all other variables, except those two, provide an importance value lower than 20%, they are considered of less importance for the predictive capacity in this model.

##### 4.2. Gradient Boosted Trees for Geometries

The results obtained for the geometries dataset of both machines with the models given by GBT are shown below; both the confusion matrices and the models obtained in each case are presented.

Table 13 shows the results obtained with GBT for the geometries dataset of M-1. In particular, it should be noted that the final accuracy is 88.51%. The best classification obtained has been for the label “Fine”, with a 100% of precision for the 97.22% of the total of the cases for this classification. Note that again the metrics present a good performance for a problem of four classes (compared to a random choice classifier). The results, in this case, seem to be mostly balanced, with no class bringing down the performance in a major way.

Table 14 shows the importance of each one of the variables with respect to the M1 geometries dataset. As can be seen here, the most important variable corresponds to the hardness of the material (HB), followed by the diameter of the tool (diam). The rest of the variables seem to be less relevant according to the analysis carried out in this dataset, being below a 20% of importance on this experiment.

Table 15 shows the results obtained with GBT for the geometries dataset of M-2. In particular, it should be noted that the final accuracy is 85.71%. Similar to the results of the classification in M1, the best classification obtained has been for the label “Fine”, with a 93.33% of precision for the 100% of the total of cases. In general, the result of M2 is similar to the one from M1. Again, in this case, there does not seem to be any class that is bringing the classification results down in any major way. Although the precision of the Fine class is lower than the other ones, the recall of the Smooth class is also comparably lower than the results of the other ones.

Table 16 shows the importance of each of the variables with respect to the M2 geometries dataset. As can be seen here, the most important variable corresponds to the diameter of the tool (diam) and then rotation speed (n). All the other variables have a lower than 20% importance, with “feed rate” and “ap” being particularly low. Note that “ap” is the variable of least importance in both machines, while the diameter seems to be important in both cases.

Figure 2 summarizes the importance of the variables according to the GBM models used for all experiments and machines. In particular, Figure 2 shows that in Exp-1 the axial depth of cut (ap) is the most important variable, but for Exp-2 the hardness of the material (HB) and diameter of the tool (diam) are the most important variables, although arguably the rotation speed (n) could be considered important too, depending on the machine. These results highlight the importance of the particular characteristics of each one of machining centers, because despite using the same working model in both machining centers (same forces, “ap”, etc.) the results are very different in each case.

##### 4.3. Comparison with Other Methods

We compare the GBT models with other classifiers that have been used in the literature, such as SVM with RBF kernels or other powerful classifiers like Random Forests. In order to have a fair comparison, we find the optimal parametrization for each one of the compared algorithms using the same strategy as before.

For SVM we used a 1-vs-1 scheme for multiclass classification and we tried two kinds of kernels (RBF and Linear) with the parameters C in the range (0.00001, 0.0001, 0.001, 0.01, 0.1, 1.0, 10, 100, 1000, 10000, 100000) and gamma in the range (0.00001, 0.0001, 0.001, 0.01, 0.1, 1.0, 10, 100, 1000, 10000, 100000). For Random Forests we considered as hyperparameters the Number of Trees (range: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100) and Maximal Depth (range: 1, 2, 3, 4, 5, 6, 7, 8), which are analogous to the parameters of GBT.

The optimal hyperparameters are then used to train 10 classifiers using a hold-out strategy with randomly sampled data from the original dataset. The idea is now to find an estimate of the accuracy of each classifier on the test set and average the results of the 10 classifiers for the different samples. All methods were evaluated using accuracy. We perform a 1-factor (i.e., choice of method) Analysis of Variance (ANOVA) for each dataset and report the corresponding p-values.

The results are shown in Table 17 alongside the optimal parametrization in each case, for SVM the optimal hyperparameters are shown in the format given by the tuple (Kernel, C, gamma), for GBT we have the format (Number of Trees, Maximal Depth, Minimum Rows), and for Random Forests we have the format (Number of Trees, Maximal Depth).

The ANOVA reveals that there is no statistically significant difference for the slots (M1 and M2) and Geometries M2, while there is a potentially significant difference in the performance of the methods on Geometries M1. Further inspection reveals that there’s a difference between RBF SVM and the other two methods, but there’s no significant difference between GBT and Random Forests. This suggests that the results from GBT are competitive with other classifiers in the state of the art.

#### 5. Discussion

Previous works generate surface roughness predictive models using several soft computing techniques; however, there are no standard models to predict surface roughness, the models being usually generated under specific conditions of machining, coolant, machine tool, and tool. In the literature, it is possible to find works in which artificial neural networks or Bayesian networks are applied to generate predictive models of surface roughness; also classic decision trees have been used as a technique for pattern identification in the behavior of variables that influence surface roughness in the industry [41], but not many works were found where techniques based on decision trees are used to predict surface roughness (such as, for example, Random Forest or Gradient Boosted Trees).

A recently published work is [35] where they use random forest (RF), multilayer perceptron (MLP), regression trees (RT), and radial base functions (RBF); this paper presents a comparative study of the surface roughness prediction quality; RF is the one that provides the best results in terms of* accuracy*, followed by RT and MLP. In our work, we use GBT, obtaining better results in terms of accuracy than similar works performed with the same experimental design and reported, for example, in [4, 23].

In this work, we use real training data and we have also obtained a graphical representation of knowledge using classic decision trees to complement the results obtained by GBT; in this way the joint result provides greater graphic expressivity regarding conditional influences and the values of the predictor variables on the class labels than, for example, Bayesian networks. This is important since it can be easily used to create a domain representation model and can also be interpreted to generate rules that contain dynamic knowledge of the machining process, which facilitates the construction of knowledge and inference bases for an eventual expert system of surface quality prediction in real time under concrete tool conditions, material to be machined, and machine tool.

All the pieces-of-knowledge derived from this research and other obtained from previous related works can be used, for example, to generate inference rules that help to establish the impact of measures of predictor variables on the surface roughness class. Figure 3 shows the syntax diagram for if-statements, based on what was said previously. For example, if_statement_1 shows that the axial depth of cut (ap) has the highest influence on workpiece roughness, but if the machine is M-1, then the least important variable is the number of teeth (flutes), while for M-2 the variable of least importance is the advance speed (F). The If-Then statement described above can be generalized for mechanical cutting with machines that have similar characteristics to M-1 or M-2, and some characteristics of the cutting process using steel F-114 can now be tested.

Finally, the ability of Gradient Boosted Trees to determine the importance of each variable with respect the labels could be considered similar to the ability of Bayesian networks to model influences between these variables and the labels. This is very important in the domain of micro- and nanomachining, where precision in the machining influences heavily the final results. Again, the knowledge gained here can be combined with previous knowledge that has been obtained by elicitation from experts or from state of the art, analysis of results, among others, so that this knowledge can be represented in a knowledge base and used by a Rule-based System or Decision Support System. In order to obtain conclusions about surface roughness as influenced by the predictive variables or to be able to predict surface roughness given a dataset or a particular data record.

#### 6. Conclusions

The integration of AI algorithms into computational solutions and techniques for analyzing large amounts of data are gaining interest in the modern industry. This integration is part of what some authors call the sixth technological revolution [27]. In keeping with this idea, this document provides a model of the surface roughness average (Ra) as a result of a predictive analysis using Gradient Boosted Trees. The models presented in this work evaluate the influence of cutting conditions, the characteristics of two different end-milling machines, and the material (steel and aluminum test pieces) on the surface quality of high-speed machining.

Previous works, such as [1, 32], provide mathematical models to predict surface roughness considering operational characteristics of HSM; the second group of works provide surface roughness models using various soft computing and AI techniques; specifically, they apply ANNs (for example, [17, 24]), Genetic Algorithms (for example, [1, 25]), or Bayesian networks (for example, [4, 13, 21]). Generally, these models consider aspects related to the tool, the machine, and the material to be machined, and some other works incorporate other aspects such as the lubricant or coolant (for example, the works in [15, 49]).

As a soft computing and AI approach, this work falls in the second group, with the advantage that the resulting model in each experiment has a high accuracy value in comparison with other techniques such as decision trees, thanks to the accuracy of the GBT algorithm. A potential improvement that could be made to our model is incorporating the characteristics of the coolant used and study the behavior of the conditional influences on the surface roughness.

The main contributions of this work are the predictive model itself and the subsequent analysis of variable importance. In particular, our results show accuracies ranging from 61.54% to 88.51% on the datasets, which are competitive results when compared with the other approaches shown in this paper. An important advantage of applying this model is that we have been able to analyze which variables have the most impact on the predictive ability of our model in a natural way. In this context, we find that the axial cut-depth is the most influential feature for the slots datasets. The axial cut deep is the axial contact length between the cutting tool and the workpiece [44]; this means that when developing an experimental model for working with steel F-114, the influence of the cut-depth and the hardness of the material must be carefully considered. On the other hand, the hardness and the diameter of the cutting tool are the most influential features for the geometries datasets. Thus, similar considerations as before could be held for these variables in this case.

As a potential future line of work and for practical applications, the results of this study could form the basis of a decision support system. In particular, such a system could use the knowledge contained in the predictive models as a core knowledge base. In particular, since the predictive model GBT is based on decision trees, it would be relatively easy to extract the information from the tree branches as a knowledge base. These branches could be used as rule sets for technicians and other users to interpret and understand the current behavior of the machining systems. Furthermore, this knowledge base could be enriched with more data as it becomes available, although this avenue of work would eventually lead to concerns of scalability of such a decision support system.

Finally, as has been said in the discussion, the application of the GBT algorithm (derived from decision trees) is not very common in the industry, even though decision trees are one of the most widely used machine learning techniques because of the ease they provide in generating clear production rules and being easily understood by end users. This brings confidence to the predictive models presented here in the presence of new cases that might be taken as input to predict surface roughness.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

The authors want to thank the Centro de Automática y Robótica at CSIC (Spain) for the use of the Kondia machine tool where a part of the experimentation was made. The authors also acknowledge the collaboration of Nicolas Correa S.A. and thank Dr. Andrés Bustillo from Nicolas Correa, who enabled the rest of experimentation in this company, using the Versa machining center.