Parametric and Nonparametric Empirical Regression Models: Case Study of Copper Bromide Laser Generation
In order to model the output laser power of a copper bromide laser with wavelengths of 510.6 and 578.2 nm we have applied two regression techniques—multiple linear regression and multivariate adaptive regression splines. The models have been constructed on the basis of PCA factors for historical data. The influence of first- and second-order interactions between predictors has been taken into account. The models are easily interpreted and have good prediction power, which is established from the results of their validation. The comparison of the derived models shows that these based on multivariate adaptive regression splines have an advantage over the others. The obtained results allow for the clarification of relationships between laser generation and the observed laser input variables, for better determining their influence on laser generation, in order to improve the experimental setup and laser production technology. They can be useful for evaluation of known experiments as well as for prediction of future experiments. The developed modeling methodology is also applicable for a wide range of similar laser devices—metal vapor lasers and gas lasers.
The object of research of this paper is a low-temperature copper bromide vapor (CuBr) laser, with a wavelength of 510.6 nm and 578.2 nm. This type of laser is one of the most promising of the group of metal vapor lasers. It is characterized as the most efficient laser in the visible spectrum which allows for many practical applications [1, 2]. Analytical and numerical modeling of metal vapor lasers, including the CuBr laser in question, has been developing rapidly in the last few decades [3–5]. The main results are in the sphere of modeling of discharge kinetics processes which are based on a wide range of experimental data. In recent years, complex kinetic models which include tens and hundreds of coupled differential equations have been created, describing the physics phenomena occurring in different laser devices [4, 5].
Another fundamentally different approach to metal vapor laser research is the use of accumulated experiment data to construct statistic models and to design experiments. This is a fairly new approach in the field of lasers. In principle, it is considered that the physics processes and phenomena connected with lasers are deterministic. In practice, actual experimental measurements do not provide readings for all related physics and technical parameters and phenomena; they do not take into account specific internal and external conditions; furthermore, there is an error factor in the accuracy of the measurement itself. Consequently, experimental data contains a number of random components which makes it suitable for analysis (processing) using statistical methods . What is more, the complexity of the processes, the great number of independent parameters, and the difficulty in determining their interrelations (part of which have not been properly researched) also support the idea of applying a statistical approach. This is especially true when studying significant laser output characteristics such as laser generation, laser efficiency, lifetime, and laser beam quality.
In [7–9], for the copper bromide vapor laser, multivariate statistical techniques were applied for the first time to the field of metal vapor lasers. The effect of ten basic input laser variables on laser efficiency and generation was studied. It was established that only six input laser variables have a significant effect on output characteristics. In order to deal with multicolinearity, Principal Component Analysis (PCA) factors were used to construct simple linear regression models. The classification of variables based on hierarchical agglomerative cluster analysis for samples from the same set of data, confirming the relevance of the factor models, was obtained in [9, 10]. Statistical methods for the design of the experiment, for some laser components, for example, for the electric power circuits, have been included in .
The object of this paper is the construction and comparison of several types of parametric and nonparametric regression models for estimation and prediction of output laser power (laser generation) of CuBr laser devices. This is achieved using the PCA factors of the data and the following statistical methods: Multiple Linear Regression (MLR) and multivariate adaptive regression splines (MARS).
Modeling was based on experimental data, obtained at the Laboratory of Metal Vapor Lasers with the Georgi Nadjakov Institute of Solid State Physics, Bulgarian Academy of Sciences. For the purpose we used the statistical package SPSS, Mathematica and MARS predictive software [12–14].
2. Data Description
This study includes experimental data for various CuBr lasers, published in [15–22]. According to their geometry, the CuBr lasers which are studied herein can be divided into three basic groups: small-bore lasers of inside diameter mm, medium-bore lasers of = 20–40 mm, and large-bore lasers of mm. From the available data for about 300 experiments with the three general types of lasers, a random sample with size = 109 has been used. Since over 60% of all data is about small-bore lasers, the sample is partially stratified, in order to avoid the imbalance of the available data. Each experiment includes data about six independent basic laser characteristics and one dependent variable-laser generation (see also [7, 8]). The data is of historical type. Here we have to mention the complexity, long duration, and high cost of each conducted experiment.
The independent input laser variables included in the analysis are as follows.
(mm) is the inside diameter of the laser tube; (mm) is the inside diameter of the internal rings; (cm) is the length of the active area (electrode separation); (kW) is the input electric power; (kW) is the input electric power per unit length; (Torr) is the hydrogen gas pressure.
The response variable is laser generation, (W).
It has to be added that laser generation is also affected by other quantities such as pulse repetition frequency, neon gas pressure, capacity of the capacitor bank, and temperature of the CuBr reservoirs. Their values for the lasers being studied have been experimentally optimized and exhibit statistical nonsignificance (see also [7–9]). For this reason they have not been included in the analysis.
3. Factor Analysis and Selection of Predictors
3.1. Calculation of PCA Factors
Both for all the data and for the sample, obtaining regression models based on input variables is impeded by their multicolinearity. For this reason the first process used is multiple factor analysis in order to obtain orthogonal to each other factor variables describing the data cloud. Using the SPSS software for our data sample we obtained the Kaiser-Meyer-Olkin measure of sampling adequacy KMO = 0.660 and Bartlett’s test of sphericity with significance level equal to 0.000. The respective measures of sampling adequacy (MSA) are also of significance for each variable. This indicates that the factor analysis of the sample is adequate and can be carried out. The factors have been extracted using PCA. Usually the number of factors chosen is equal to the number of eigenvalues of the correlation matrix greater than 1. However, as shown in , the low-variance principal components may also be important. In our case, although there is only one eigenvalue greater than one, we have chosen the number of factors to be three. When variables are grouped in three factors the subsequent rotation using the Varimax method clearly reveals the following orthogonal factors: (including , , , ), (including ), and (including ). They account for 95.413% of the total variability of the data sample. The choice of three factors is justified as follows. When hydrogen is added, this leads to a twofold increase of , which is an indisputable fact proven by experimental results [1, 15] and so the factor must not be overlooked. The variable (factor ) also plays a special role and during experiments it has been detected to noticeably affect laser generation. Omission of this variable leads to regression models which do not provide sufficiently good estimates.
Table 1 shows a rotated component matrix with the factor loadings of the observed six input variables, obtained using PCA. For a sample with size = 109 and level of significance , the statistically significant factor loadings are those over 0.5 . The good quality of the factor model is confirmed by the calculated reproduced correlations matrix, for which there is only one nonredundant residual with absolute value greater than 0.05 (actually it is equal to 0.059).
The factor scores which are used in all methods of this study have also been calculated at this stage of the statistical calculations.
3.2. General Relationship between Factors and Laser Output Power
Resulting factors affect differently laser output power . Figures 1(a)–1(c) show the scaterplots of against each of the factors as well as the LOESS smoothing curves. We can come to the conclusions that in addition to the linear members we should also be taking into account second- and even third-degree interactions between factor variables. The 3D plots in Figures 2(a)–2(c) show the general relationships between pairs of factors with regard to .
Based on these graphical relationships, in an exploratory manner, we will later on construct regression models using three groups of variables as predictors: first group and second and third groups as follows:
The corresponding models will be noted as 0, 1, and 2 order models, respectively.
4. Multiple Linear Regression Models with PCA Factors
The results from the modeling have been presented in this and the following sections. For parametric methods it is assumed that data and population distribution are nearly normal. All calculations and analyses have been carried out at level of significance 0.05. The comparison between models has been conducted via the commonly used indices, such as multiple correlation coefficient , coefficient of multiple determination (RSquare), and adjusted .
4.1. Multiple Linear Models Employing the First Group of Predictors
With the help of the three orthogonal PCA factors (first group of predictors) and the stepwize or linear procedure we obtain the MLR-0th order models for estimation of the dependent variable : These equations refer, respectively, to the nonstandardized and standardized estimated value of . All coefficients as well as all subsequent estimates have been obtained at a significance level 0.000.
The basic statistics of the constructed models are presented in Table 2. In this table we include the commonly used indices, such as multiple correlation coefficient R, coefficient of multiple determination R2 (RSquare), adjusted R2, and standard error of the estimate.
The histogram of the model residuals showed that the residuals of the MLR model (4.1)-(4.2) of the output laser power estimates appear to be normally distributed centered around zero. The normal Q-Q plot of regression standardized residuals is shown in Figure 4. The normality of residuals has been also tested formally on the basis of the Kolmogorov-Smirnov test (with Lilliefors correction) and the -value is . It can be considered that the residuals are normally distributed.
4.2. Multiple Linear Models Employing the Second and Third Groups of Predictors
Using the nine predictors (3.1) a more precise model was constructed, marked as MLR-1st order. The corresponding equations are
For the second-order model the obtained equations are, respectively,
The basic statistics of these models are given in Table 2.
5. Nonparametric Models Using Multivariate Adaptive Regression Splines
5.1. Characteristics of MARS
The MARS method is a relatively new but adaptable instrument for the construction of nonparametric regression models. It was developed by Friedman in  and is applied using a software product named after it—MARS . MARS combines classical linear regression, mathematical construction of splines, and binary recursive partitioning to produce a local model where relationships between response and predictors are either linear or nonlinear. To do this, MARS approximates the underlying function through a set of adaptive piecewise linear regressions termed basis functions (BFs). The points in which changes in slope occur are called knots. Knots are defined according to the forward/backward stepwize procedure. At first, a model which overfits the data is constructed. After that, those knots which contribute to the effectiveness of the model the least are systematically removed. The best model is selected via the generalized cross validation measure criterion (GCV) [14, 26].
An important advantage of MARS over the parametric approach is that it describes local changes in the data behavior. What is more, nonlinear relationships fit local interactions between generated basis functions in the respective subregions. In principle, we have to note the possibility of a problematic sudden increase in the number of possible interactions when dealing with a large number of (several thousand) BF and a large number of subregions. However, this is not the case with our data.
Another important advantage of MARS is that being a nonparametric technique it overcomes the requirement for normal distribution of data, which makes it applicable to a much broader range of problems. Furthermore, MARS can be applied to both big and small size data samples and its basis functions making resulting models easy to interpret and subsequently utilize .
5.2. MARS Models Using PCA Factors with and without Interactions
Within this study only the best MARS models are calculated, respectively, to the same cases as for MLR models, obtained in Section 4. The basic statistic figures of the models are given in Table 2. We will describe in more detail two of the models—piecewise linear model with no interactions and the model with first-order interactions. All models have been calculated using MARS software.
The first model (MARS-0th order) with three predicators , without interaction between them, includes the following six basis functions:
Their graphs are shown in Figures 5(a)–5(c). Although it is difficult to directly read the changes in the initial six laser input variables in relation to the factors, this figure shows piecewise linear changes in the behavior of the response for each factor when respective factor values change in the interval .
The estimated values of laser generation are calculated using the formula:
The model (5.1)-(5.2) is tested by generating the best MARS model, which is selected so as to allow no overfitting of the model, as well as by using the algorithm for applying the least squares method . The obtained basic statistics are given in Table 2. The model is significant at level 0.000.
The relative factor variable importance for the model (5.1)-(5.2) is given in Table 3 (in MARS the most important variable always has a value of 100). To a degree this distribution matches the relationship between factors, given in (4.2), which is .
With the help of MARS model (5.1)-(5.2) it is easy to calculate the estimate of when predictor values are known. The same is valid for predicting a future response. For example, a maximum laser output power = 120 W has been measured at = 58, = 58, = 200, = 5, = 12.5, and = 0.6. Their respective factor values are , and . After substituting the latter in (5.1)-(5.2) we find the approximate estimate: .
The second MARS model which we will describe in more detail is the one which accounts for possible first order interactions. The resulting best model includes the following ten basis functions and a constant:
We can see that basis functions in the best model include only five predictors: .
The respective equation that can be used to calculate the estimates of is
The model (5.3)-(5.4), as well the next model with two interactions, gives the best test estimates when compared to all other models, as it can be seen from Table 2. Partial contributions of separate predicators (PCA factors and some of their exponents) in model (5.3)-(5.4) to the value of can be observed in Figure 6. The biggest contribution is made by the interaction between and , which increases sharply, reaching 115. In that case the maximum is achieved at values for close to 2 and values for from 2 to 1.5. Predicators and provide the second biggest contribution, which amounts to about 50 units. The other two interactions only have a corrective effect.
6. Validation of Models
In order to have a reliable estimate of the prediction power of each model the following cross validation technique is used. The initial data sample was splitted randomly into one raining and one evaluation data set, containing approximately 70% and 30% of the total cases, respectively. The training data sets were used to generate the models which were then tested with the independent evaluation data sets.
The following model is obtained using the MLR method with three predicators for 70% of the training data set:
Using this model we have calculated the values for a 30% evaluation data set. The statistic indexes for three MLR are given in Table 4.
For the MARS models the same validation technique is applied, utilizing the same two independent estimation data sets which were used for MLR, respectively, with 70% and 30% of the data sample. The results from the cross-validation of all MARS models are given in Table 4.
The initial data set includes six independent input laser variables, some of which indicate high multicolinearity. The problem of predictor multicolinearity is commonly encountered not only in engineering data but also in ecology, medicine, and many other types of data. In order to solve this problem we utilize preliminary the method of multiple factor analysis, based on PCA. We obtained three orthogonal to each other factor variables. These factors are then used to construct regression models. In essence this is the well-known projection method which is also called Principal Component Regression. We have to note that usually the application of this technique limits the accuracy of models because factors do not carry all the information contained in the sample. For this reason, all constructed models are to an extent—“rough” .
If we compare the obtained results from Table 2, we should note that the best, almost identical results have been achieved using MARS models employing first- and second-order interactions. These are preferable since they can be described using fewer basis functions. In addition, the same models provide better validation results, as shown in Table 4.
As a whole, results obtained by modeling laser output power indicate that the lowest of considered indexes are those of parametric models, in this case MLR. MARS models generally provide better indexes and descriptions of local interactions between predicators. Furthermore, generated models are easily interpreted and have good prediction power, which is obvious from the results of their validation.
Let us now consider the physics interpretation of the models. All models include the three basic factor variables (), among which the first factor is dominantly influential. This is in full agreement with the general linear tendency towards an increase in laser output power as a result of an increase in geometric and energy parameters included in (tube diameter, internal ring diameter, tube length, and supplied electric power). The relative influence of the other two factors also corresponds to the actual experiment. From a practical point of view, the constructed models are adequate and provide a relatively good description of the dependence between input laser variables and laser output power. What is more, to some extent, models can provide guidelines for the experiment when designing new laser devices with increased laser output power.
The comparison between parametric and nonparametric methods for modeling output laser power of a CuBr vapor laser shows that generally nonparametric models have slightly better characteristics. The constructed MARS models allowed for a more adequate description of the data in question at the same time overcoming the problems with multicolinearity, local nonlinearities, and interactions between first- and second-order interactions between predictors. Although the regression does not give causation, the models can be of great use for evaluation of known experiments as well as for prediction and direction regarding future experiments. The presented methods are also applicable to experimental data of similar laser devices in the group of metal vapor and gas lasers.
This study was conducted with the financial support of the Scientific National Fund of Bulgarian Ministry of Education, Youth and Science, project no. VU-MI-205/2006, and the Scientific Fund of Plovdiv University “Paisii Hilendarski”-NPD, projects IS-M4 and RS2009-M-13.
N. V. Sabotinov, “Metal vapor lasers,” in Gas Lasers, M. Endo and R. F. Walter, Eds., pp. 449–494, CRC Press, Boca Raton, Fla, USA, 2006.View at: Google Scholar
P. G. Foster, Industrial applications of copper bromide laser technology, Ph.D. dissertation, Deprtment of Physics and Mathematical Physics, School of Chemistry and Physics, University of Adelaide, Adelaide, Australia, 2005.
I. P. Iliev and S. G. Gocheva-Ilieva, “Statistical techniques for examining copper bromide laser parameters,” in Proceedings of the International Conference on Numerical Analysis and Applied Mathematics (ICNAAM '07), vol. 936, pp. 267–270, Corfu, Greece, September 2007.View at: Publisher Site | Google Scholar
J. L. Lu and L. J. Wang, “The orthonormal design of experiments for the optimization of the parameters of the discharge circuit in the CuBr vapour lasers power supply,” Laser Technology, vol. 30, pp. 113–115, 2006.View at: Google Scholar
D. N. Astadjov, N. V. Sabotinov, and N. K. Vuchkov, “Effect of hydrogen on CuBr laser power and efficiency,” Optics Communications, vol. 56, no. 4, pp. 279–282, 1985.View at: Google Scholar
“NATO contract SfP, 97 2685, 50W Copper Bromide laser,” 2000.View at: Google Scholar
D. N. Astadjov, K. D. Dimitrov, D. R. Jones et al., “Influence on operating characteristics of scaling sealed-off CuBr lasers in active length,” Optics Communications, vol. 135, no. 4-6, pp. 289–294, 1997.View at: Google Scholar
K. D. Dimitrov and N. V. Sabotinov, “High-power and high-efficiency copper bromide vapor laser,” in Proceedings of the 9th International School on Quantum Electronics: Lasers—Physics and Applications, vol. 3052 of Proceeding of SPIE, pp. 126–130, Varna, Bulgaria, September 1996.View at: Publisher Site | Google Scholar
D. N. Astadjov, K. D. Dimitrov, D. R. Jones et al., “Copper bromide laser of 120-W average output power,” IEEE Journal of Quantum Electronics, vol. 33, no. 5, pp. 705–709, 1997.View at: Google Scholar
N. P. Denev, D. N. Astadjov, and N. V. Sabotinov, “Analysis of the copper bromide laser efficiency,” in Proceedings of the 4th International Symposium on Laser Technologies and Lasers, pp. 153–156, Plovdiv, Bulgaria, 2006.View at: Google Scholar
I. T. Jolliffe, “A note on the use of principal components in regression,” Journal of the Royal Statistical Society: Series C, vol. 31, pp. 300–303, 1982.View at: Google Scholar
“Computation with solo power analysis,” BMDP Statistical Software Inc., LA, 1993.View at: Google Scholar