Abstract

A set of designed experiments, involving the use of a pulsed Nd:YAG laser system milling 316L Stainless Steel, serve to study the laser-milling process of microcavities in the manufacture of drug-eluting stents (DES). Diameter, depth, and volume error are considered to be optimized as functions of the process parameters, which include laser intensity, pulse frequency, and scanning speed. Two different DES shapes are studied that combine semispheres and cylinders. Process inputs and outputs are defined by considering the process parameters that can be changed under industrial conditions and the industrial requirements of this manufacturing process. In total, 162 different conditions are tested in a process that is modeled with the following state-of-the-art data-mining regression techniques: Support Vector Regression, Ensembles, Artificial Neural Networks, Linear Regression, and Nearest Neighbor Regression. Ensemble regression emerged as the most suitable technique for studying this industrial problem. Specifically, Iterated Bagging ensembles with unpruned model trees outperformed the other methods in the tests. This method can predict the geometrical dimensions of the machined microcavities with relative errors related to the main average value in the range of 3 to 23%, which are considered very accurate predictions, in view of the characteristics of this innovative industrial task.

1. Introduction

Laser-milling technology has become a viable alternative to conventional methods for producing complex microfeatures on difficult-to-process materials. It is increasingly employed in the industry, because of its established advantages [1]. As a noncontact material-removal process, laser machining removes smaller and more precise amounts of material, applies highly localized heat inputs to the workpiece, minimizes distortion, involves no tool wear, and is not subject to certain constraints such as maximum tool force, buildup edge formation, and tool chatter. Micromanufacturing processes in the field of electronics and medical and biological applications are a growing area of research. High-resolution components, high precision, and small feature size are needed in this field, as well as real 3D fabrication. Thus, the use of laser machining to produce medical applications has become a growing area of research, one example of which is the fabrication of coronary stents. This research looks at the fabrication and performance of the DES. Some of these DES are metallic stents that include reservoirs that contain the polymer and the drug [2], such as the Janus TES stent [3] which incorporates microreservoirs cut into its abluminal side that are loaded with the drug. The selection of the laser system and the process parameters significantly affects the quality of the microfeature that is milled and the productivity of the process. Although there are several studies which deal with the effect of the process parameters on the quality of the final laser-milled parts, few of them study this effect on a microscale. The literature contains many examples of experimental research looking at the influence of scanning speed, pulse intensity, and pulse frequency on the quality and productivity of laser milling in different materials on a macroscale [47]. There are many works on microscale machining that have investigated laser-machining processes in laser microdrilling [810], laser microcutting [1113], and laser micromilling in 2D [14, 15]. However, there is little research on laser 3D micromilling. Pfeiffer et al. [16] studied the effects of laser-process parameters on the ablation behaviour of tungsten carbide hard metal and steel using a femtosecond laser for the generation of complex 3D microstructures. Karnakis et al. [17] demonstrated the laser-milling capacity of a picoseconds laser in different materials (stainless steel, alumina, and fused silica). Surface topology information was correlated with incident power density, in order to identify optimum processing. Qi and Lai [18] used a fiber laser to machine complex shapes. They developed a thermal ablation model to determine the ablated material volume and the dimensions and optimized the parameters to achieve maximum efficiency and minimum thermal effects. Finally, Teixidor et al. [19] studied the effects of scanning speed, pulse intensity, and pulse frequency on target width and depth dimensions and surface roughness for the laser milling of microchannels on tool steel. They presented a second-order model and a multiobjective process optimization to predict the responses and to find the optimum combinations of process parameters.

Although the manufacturing industry is interested in laser micromilling and some research has been done to understand the main physical and industrial parameters that define the performance of this process, the conclusions show that analytical approaches are necessary for all real cases due to their complexity. Data-mining approaches represent a suitable alternative to such tasks due to their capacity to deal with multivariate processes and experimental uncertainties. Data-mining is defined in [20] as “the process of discovering patterns in data” and “‘extracting’ or ‘mining’ knowledge from large amounts of data” [21]. “Useful patterns allow us to make nontrivial predictions on new data” [20] (e.g., predictions on laser-milling results based on experimental data). Many artificial intelligence techniques have been applied to macroscale milling [2224], but there are few examples of the application of such techniques to laser micromilling [25]. Artificial neural networks (ANNs) have been proposed to predict the pulse energy for a desired depth and diameter in micromilling [26] and the material removal rate (MRR) for a fixed ablation depth depending on scanning velocity, pulse frequency [27], and cut kerf quality, in terms of dross adherence during nonvertical laser cutting of 1 mm thick mild-steel sheets [28]. Finally, regression trees have been proposed to optimize geometrical dimensions in the micromanufacturing of microchannels [29]. Neither of these studies used ensembles for process modeling, a learning paradigm in which multiple learners (or regressors) are combined to solve a problem. A regressor ensemble can significantly improve the generalization ability of a single regressor and can provide better results than an individual regressor in many applications [3032]. Ensembles have demonstrated their suitability for modeling macroscale milling and drilling [3336], especially because they can achieve highly accurate prediction with lower tuning time of the model parameters [35]. In view of the lack of published research on modeling the laser milling of 3D microgeometries with ensembles, the objective of this work is to study the modeling capability of these data-mining techniques through the different inputs and outputs that may be considered priorities from an industrial point of view.

2. Experimental Procedure and Data Collection

The experimental apparatus described in this study gathered the data needed to create the models. The experimentation consisted of milling microcavities in a 316L Stainless Steel workpiece using a laser system. A Deckel Maho Nd:YAG Lasertec 40 machine with a 1,064 nm wavelength was used to perform the experiments. The system is a lamp-pumped solid-state laser that provides an ideal maximum (theoretically estimated) pulse intensity of 1.4 W/cm2 [14], due to the 100 W average power and 30μm beam spot diameter. The SS316L workpiece material was selected because it is a biocompatible material commonly used in biomedical applications and specifically for the fabrication of coronary stents. Two different geometries were used for the experiments. The first geometry consisted of a half-spherical shape defined by depth and diameter dimensions. The second geometry was a half-cylindrical shape with a quarter sphere on both sides, defined by depth, diameter, and length dimensions. Both geometries and an example of a laser-milled cavity are presented in Figure 1. The geometries were fabricated with different combinations of dimensions, while maintaining the same volume. Tables 1 and 2 present the three combinations of dimensions for the spherical and the cylindrical geometries, respectively. These geometries and dimensions were selected, because they provide sufficient space to machine the cavities of these cardiovascular drug-eluting stent struts, which is an important part of their manufacturing process.

A full factorial design of experiments was developed, in order to analyze the effects of pulse frequency (PF), scanning speed (SS), and pulse intensity levels (PI, percentage of the ideal maximum pulse intensity) on the responses. Some screening experiments were performed to determine the proper parametric levels. Three different levels were selected from the results of each input factor, which are presented in Table 3. This design of experiments resulted in a total of 162 experiments: 27 combinations for each geometry under study. All the experiments were machined from the same 316L SS blank under the same ambient conditions. The response variables were the cavity dimensions (depth and radius) and the volume of removed material. A confocal Axio CSM 700 Carl Zeiss microscope was used for the dimensional measurements and for characterization of the cavities. Moreover, negatives of some of the samples were obtained with surface replicant silicone, in order to obtain 3D SEM images.

Having performed the experimental tests, the inputs and outputs for the datasets had to be defined, to generate the data sets for the data-mining modeling. On the whole, the selection of the inputs is easy, because they are set by the specifications of the equipment: the inputs are the parameters that the process engineer can change in the machine. They are the same as those considered to define the experimental tests explained above. Table 4 summarizes all the selected inputs, their units, ranges, and the relationship that they have with other inputs.

The definition of the data set outputs takes different interests into account that relate to the industrial manufacturing of DES. In some cases, a productivity orientation will encourage the process engineer to optimize productivity (in terms of the MRR) keeping geometrical accuracy under certain acceptable thresholds (by fixing a maximum relative error in geometrical parameters). In other cases, the geometrical accuracy will be the main requirement and productivity will be a secondary objective. In yet other cases, only one geometrical parameter, for example, the depth of the DES, will be critical and the other geometrical parameters should be kept under certain thresholds, once again by fixing a maximum relative error for these geometrical parameters. Therefore, this work considers the geometrical dimensions and the MRR that is actually obtained as its output, their deviance from the programmed values, and the relative errors between the programmed and the real values. Table 5 summarizes all the calculated outputs, their units, ranges, and the relationship they have with other input or output variables. In summary, the 162 different laser conditions that were tested provided 14 data sets of 162 instances each with 9 attributes and one output variable to be predicted.

3. Data-Mining Techniques

In our study we consider the analysis of each output variable separately, by defining a one-dimensional regression problem for each case. A regressor is a data-mining model in which an output variable, , is modelled as a function, , of a set of independent variables, , called attributes in the conventional notation of data-mining, where is the number of attributes. The function is expressed as follows:

The aim of this work is to determine the most suitable regressor for this industrial problem. The selection is performed by comparing the root mean squared error (RMSE) of several regressors over the data set. Having a data collection of pairs of real values , the RMSE is an estimation of the expected difference between the real and the forecasted output by a regressor. It is expressed as the square root of the mean of the squares of the deviations, as shown in the following equation:

We tested a wide range of the main families of state-of-the-art regression techniques as follows.(i)Function-based regressors: we used two of the most popular algorithms, Support Vector Regression (SVR) [37] and ANNs [38], and also Linear Regression [39], which has an easier formulation that allows direct physical interpretation of the models. We should note the widespread use of SVR [40], while ANNs have been successfully applied to a great variety of industrial modeling problems [4143].(ii)Instance-based methods, specifically their most representative regressor, k-nearest neighbors regressor [44]: in this type of algorithm, it is not necessary to express an analytic relationship between the input variables and the output that is modeled, an aspect that makes this approach totally different from the other. Instead of using an explicit formulation to obtain a prediction, it is calculated from set values stored in the training phase.(iii)Decision-tree-based regressors: we have included these kinds of methods because they are used in the ensembles as regressors, as explained in Section 3.2.(iv)Ensemble techniques [45] are among the most popular in the literature. These methods have been successfully applied to a wide variety of industrial problems [34, 4649].

3.1. Linear Regression

One of the most natural and simplest ways of expressing relations between a set of inputs and an output is by using a linear function. In this type of regressor the variable to forecast is given by a linear combination of the attributes, with predetermined weights [39], as detailed in the following equation: where denotes the output of the th training instance and the th attribute of the th instance. The sum of the squares of the differences between real and forecasted output is minimized to calculate the most adequate weights, , following the expression given in the following equation:

We used an improvement to this original formulation, by selecting the attributes with the Akaike information criterion (AIC) [50]. In (5), we can see how the AIC is defined, where k is the number of free parameters of the model (i.e., the number of input variables considered) and is the probability of the estimation fitting the training data. The aim is to obtain models that fit the training data but with as few parameters as possible:

3.2. Decision-Tree-Based Regressors

The decision tree is a data-mining technique that builds hierarchical models that are easily interpretable, because they may be represented in graphical form, as shown in Figures 2 and 3, with an example of the output measured length as a function of the input attributes. In this type of model, all the decisions are organised around a single variable, resulting in the final hierarchical structure. This representation has three elements: the nodes, attributes taken for the decision (ellipses in the representation), the leaves, final forecasted values (squares), and these two elements being connected by arcs, with the splitting values for each attribute.

As base regressors, we have two types of regressors, the structure of which is based on decision trees: regression trees and model trees. Both families of algorithms are hierarchical models represented by an abstract tree, but they differ with regard to what their leaves store [51]. In the case of regression trees, a value is stored that represents the average value of the instances enclosed by the leaf, while the model trees have a linear regression model that predicts the output value for these instances. The intrasubset variation in the class values down each branch is minimized to build the initial tree [52].

In our experimentation, we used one representative implementation of the two families, reduced-error pruning tree (REPTree) [20], a regression tree, and M5P [51], a model tree. In both cases we tested two configurations, pruned and unpruned trees. In the case of having one single tree as a regressor for some typologies of ensembles, it is more appropriate to prune the trees to avoid overfitting the training data, while some ensembles can take advantage of having unpruned trees [53].

3.3. Ensemble Regressors

An ensemble regressor combines the predictions of a set of so-called base regressors using a voting system [54], as we can see in Figure 4. Probably the three most popular ensemble techniques are Bagging [55], Boosting [56], and Random Subspaces [53]. For Bagging and Boosting, the ensemble regressor is formed from a set of weak base regressors, trained by applying the same learning algorithm to different sets obtained from the training set. In Bagging, each base regressor is trained with a dataset obtained from random sampling with replacement [55] (i.e., a particular instance may appear repeated several times or may not be considered in any base regressor). As a result, the base regressors are independent. However, Boosting uses all the instances and a set of weights to train each base regressor. Each instance has a weight pointing out how important it is to predict that instance correctly. Some base regressors can take weights into account (e.g., decision trees). Boosting trains base regressors sequentially, because errors for training instances in the previous base regressor are used for reweighting. The new base regressors are focused on instances that previous base regressors have wrongly predicted. The voting system for Boosting is also weighted by the accuracy of each base regressor [57].

Random Subspaces follow a different approach: each base regressor is trained in a subset of fewer dimensions than the original space. This subset of features is randomly chosen for all regressors. This procedure is followed with the intention of avoiding the well-known problem of the curse of dimensionality that occurs with many regressors when there are many features as well as improving accuracy by choosing base regressors with low correlations between them. In total, we have used five ensemble regression techniques of the state of the art for regression, two variants of Bagging, one variant of Boosting, and Random Subspaces. The list of ensemble methods used in the experimentation is enumerated below.(i)Bagging is in its initial formulation for regression.(ii)Iterated Bagging combines several Bagging ensembles, the first one keeping to a typical construction and the others using residuals (differences between the real and the predicted values) for training purposes [58].(iii)Random Subspaces are in their formulation for regression.(iv)AdaBoost.R2 [59] is a boosting implementation for regression. Calculated from the absolute errors of each training example, , the so-called loss function, , was used to estimate the error of each base regressor and to assign a suitable weight to each one. Let Den be the maximum value of in the training set; then three different loss functions are used: linear, , square, , and exponential, .(v)Additive regression is this regressor that has a learning algorithm called Stochastic Gradient Boosting [60], which is a modification of Adaptive Bagging, a hybrid Bagging Boosting procedure intended for least squares fitting on additive expansions [60].

3.4. k-Nearest Neighbor Regressor

This regressor is the most representative algorithm among the instance-based learning. These kinds of methods forecast the output value using stored values of the most similar instances of the training data [61]. The estimation is the mean of the k most similar training instances. Two configuration decisions have to be taken:(i)how many nearest neighbors to use to forecast the value of a new instance?(ii)which distance function to use to measure the similarity between the instances?

In our experimentation we have used the most common definition of the distance function, the Euclidean distance, while the number of neighbors is optimized using cross-validation.

3.5. Support Vector Regressor

This kind of regressor is based on a parametric function, the parameters of which are optimized during the training process, in order to minimize the RMSE [62]. Mathematically, the goal is to find a function, , that has the most deviation, , from the targets that are actually obtained, , for all the training data, and at the same time is as flat as possible. The following equation is an example of SVR with the particular case of a linear function, called linear SVR, where denotes the inner product in the input space, :

The norms of have to be minimized to find a flat function, but in real data solving this optimization problem can be unfeasible. In consequence, Boser et al. [37] introduced three terms into the formulation: the slack variables , and , a trade-off parameter between the flatness and the deviations of the errors larger than . In the following equation, the optimization problem is shown that is associated with a linear SVR:

We have an optimization problem of the convex type that is solved in practice using the Lagrange method. The equations are rewritten using the primal objective function and the corresponding constraints, a process in which the so-called dual problem (see (8)) is obtained as follows:

From the expression in (8), it is possible to generalize the formulation of the SVR in terms of a nonlinear function. Instead of calculating the inner products in the original feature space, a kernel function, , that computes the inner product in a transformed space was defined. This kernel function has to satisfy the so-called Mercer’s conditions [63]. In our experimentation, we have used the two most popular kernels in the literature [40]: linear and radial basis.

3.6. Artificial Neural Networks

We used the multilayer perceptron (MLP), the most popular ANN variant [64], in our experimental work. It has been demonstrated to be a universal approximator of functions [65]. ANNs are a particular case of neural networks, the mathematical formulation of which is inspired by biological functions, as they aim to emulate the behavior of a set of neurons [64]. This network has three layers [66], one with the network inputs (features of the data set), a hidden layer, and an output layer where the prediction is assigned to each input instance. Firstly the output of the hidden layer, , is calculated from the inputs, and then the output, , is obtained according to the expressions shown in the following equation [66]: where is the weight matrix of the hidden layer, is the bias of the hidden layer, is the weight matrix of the output layer (e.g., the identity matrix in the configurations tested), is the bias of the output layer, is the activation function of the hidden layer, and is the activation function of the output layer. These two functions depend on the chosen structure but are typically the identity for the hidden layer and for the output layer [66].

4. Results and Discussion

We compared the RMSE obtained for the regressors in a 10 10 cross-validation, in order to choose the method that is most suited to model this industrial problem. The experiments were completed using the WEKA [20] implementation of the methods described above.

All the ensembles under consideration have 100 base regressors. The methods that depend on a set of parameters are optimized as follows:(i)SVR with linear kernel: the trade-off parameter, , in the range 2–8;(ii)SVR with radial basis kernel: from 1 to 16 and the parameter of the radial basis function, gamma, from to ;(iii)multilayer perceptron: the training parameters momentum, learning rate, and number of neurons are optimized in the ranges 0.1–0.4, 0.1–0.6, and 5–15;(iv)kNN: the number of neighbours is optimized from 1 to 10.

The notation used to describe the methods is detailed in Table 6.

Regarding the notation, two abbreviations have been used besides those that are indicated in Table 6. On the one hand, we used the suffixes “L,” “S,” and “E” for the linear, square, and exponential loss functions of Adaboost.R2, and, on the other hand, the trees that are either pruned or unpruned appear between brackets. Tables 7, 8, and 9 set out the RMSE of each of the 14 outputs for each method.

Finally, a summary table with the best methods per output is shown. The indexes of the 39 methods that were tested are explained in Tables 10 and 11, and in Table 12 the method with minimal RMSE is indicated, according to the notation for these indexes. In the third column, we also indicate those methods that have larger RMSE, but, using a corrected resampled t-test [67], the differences are not statistically significative at a confidence level of 95%. Analyzing the second column of Table 12, among the 39 configurations tested, only 4 methods obtained the best RMSE for one of the 14 outputs: Adaboost.R2 with exponential loss and pruned M5P as base regressors—index 26—(5 times), SVR with radial basis function kernel—index 6—(4 times), Iterated Bagging with unpruned M5P as base regressors—index 15—(4 times), and Bagging with unpruned RP as the base regressor - index 9—(1 time).

The performance of each method may be ranked. Table 13 presents the number of significative weaker performances of each method, considering the 14 outputs modeled. The most robust method is Iterated Bagging with unpruned M5P as base regressors - index 15 -, as in none of the 14 outputs was it outperformed by other methods. Besides selecting the best method, it is possible to obtain some additional conclusions from this ranking table.(i)Linear models like SVR linear—index 5— and LR—index 3—do not fit the datasets in the study very well. Both methods are ranked together at the middle of the table. The predicted variables therefore need methods that can operate with nonlinearities.(ii)SVR with Radial Basis Function Kernel—index 6—is the only nonensemble method with competitive results, but it needs to tune 2 parameters. MLP—index 7—is not a good choice. It needs to tune 3 parameters and is not a well-ranked method.(iii)For some ensemble configurations, there are differences in the number of statistically significative performances between the results from pruned and unpruned trees, while in other cases these differences do not exist; in general, though, the results of unpruned trees are more accurate, specially with the top-ranked methods. In fact, the only regressor which is not outperformed by other methods in any output has unpruned trees. Unpruned trees are more sensitive to changes in the training set. So, the predictions of unpruned trees, when their base regressors are trained in an ensemble, are more likely to output diverse predictions. If the predictions of all base regressors agreed there would be little benefit in using ensembles. Diversity balances faulty predictions by some base regressors with correct predictions by others.(iv)The top-ranked ensembles use the most accurate base regressor (i.e., M5P). All M5P configurations have fewer weaker performances than the corresponding RP configuration. In particular, the lowest rank was assigned to the AR - RP configurations, while AR M5P U came second best.(v)Ensembles that lose a lot of information, such as RS, are ranked at the bottom of the table. The table shows that the lower the percentage of features RS use, the worse they perform. In comparison to other ensemble methods, RS is a method that is very insensitive to noise, so it can point to data that are not noisy.(vi)In R2 M5P ensembles, the loss function does not appear to be an important configuration parameter.(vii)IB M5P U is the only configuration that never had significant losses when compared with the other methods.

Once the best data-mining technique for this industrial task is identified, the industrial implementation of these results can follow the procedure outlined below.(1)The best model is run to predict one output, by changing two input variables of the process in small steps and maintaining a fixed value for the other inputs.(2)3D plots of the output related to the two varied inputs should be generated. The process engineer can extract information from these 3D plots on the best milling conditions.

As an example of this methodology, the following case was built. Two Iterated Bagging ensembles with unpruned M5P as their base regressors are built for two outputs, the width and the depth errors, respectively. Then, the models were run by varying two inputs in small steps across the test range: pulse intensity (PI) and scanning speed (SS). This combination of inputs and outputs presents an immediate interest from the industrial point of view because, in a workshop, the DES geometry is fixed by the customer and the process engineer can only change three parameters of the laser milling process (pulse frequency (PF), scanning speed, and pulse intensity). In view of these restrictions, the engineer will wish to know the expected errors for the DES geometry depending on the laser parameters that can be changed. The rest of the inputs for the models (DES geometry) are fixed at 70μm depth, 65μm width, and 0μm length, and PF is fixed at 45 KHz. Figure 5 shows the 3D plots obtained from these calculations, allowing the process engineer to choose the following milling conditions: SS in the range of 325–470 mm/s and PI in the range of 74–91% for minimum errors in DES depth and width.

5. Conclusions

In this study, extensive modeling has been presented with different data-mining techniques for the prediction of geometrical dimensions and productivity in the laser milling of microcavities for the manufacture of drug-eluting stents. Experiments on 316L Stainless Steel have been performed to provide data for the models. The experiments vary most of the process parameters that can be changed under industrial conditions: scanning speed, laser pulse intensity, and laser pulse frequency; moreover 2 different geometries and 3 different sizes were manufactured within the experimental test to obtain informative data sets for this industrial task. Besides, a very extensive analysis and characterization of the results of the experimental test were performed to cover all the possible optimization strategies that industry might require for DES manufacturing: from high-productivity objectives to high geometrical accuracy in just one geometrical axis. By doing so, 14 data sets were generated, each of 162 instances.

The experimental test clearly outlined that the geometry of the feature to be machined will affect the performance of the milling process. The test also shows that it is not easy to find the proper combination of process parameters to achieve the final part, which makes it clear that the laser micromilling of such geometries is a complex process to control. Therefore the use of data-mining techniques is proposed for the prediction and optimization of this process. Each variable to predict was modelled by regression methods to forecast a continuous variable.

The paper shows an exhaustive test covering 39 regression method configurations for the 14 output variables. A cross-validation was used in the test to identify the methods with a relatively better RMSE. A corrected resampled t-test was used to estimate significative differences. The test showed that ensemble regression techniques using M5P unpruned trees gave a better performance than other well-established soft computing techniques such as ANNs and Linear Regression techniques. SVR with Radial Basis Function Kernel was also a very competitive method but required parameter tuning. We propose the use of an Iterated Bagging technique with an M5P unpruned tree as a base regressor, because its RMSE was never significantly worse than the RMSE of any of the other methods for any of the 14 variables.

Future work will consider applying the experimental procedure to different polymers, magnesium, and other biodegradable and biocompatible elements, as well as to different geometries of industrial interest other than DES, such as microchannels. Moreover, as micromachining is a complex process where many variables play an important role in the geometrical dimensions of the machined workpiece, the application of visualization techniques, such as scatter plot matrices and start plots, to evaluate the relationships between inputs and outputs will help us towards a clearer understanding of this promising machining process at an industrial level.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors would like to express their gratitude to the GREP research group at the University of Girona and the Tecnológico de Monterrey for access to their facilities during the experiments. This work was partially funded through Grants from the IREBID Project (FP7-PEOPLE-2009-IRSES-247476) of the European Commission and Projects TIN2011-24046 and TECNIPLAD (DPI2009-09852) of the Spanish Ministry of Economy and Competitiveness.