An Improved Approach for Estimating the Hyperparameters of the Kriging Model for High-Dimensional Problems through the Partial Least Squares Method
During the last years, kriging has become one of the most popular methods in computer simulation and machine learning. Kriging models have been successfully used in many engineering applications, to approximate expensive simulation models. When many input variables are used, kriging is inefficient mainly due to an exorbitant computational time required during its construction. To handle high-dimensional problems (100+), one method is recently proposed that combines kriging with the Partial Least Squares technique, the so-called KPLS model. This method has shown interesting results in terms of saving CPU time required to build model while maintaining sufficient accuracy, on both academic and industrial problems. However, KPLS has provided a poor accuracy compared to conventional kriging on multimodal functions. To handle this issue, this paper proposes adding a new step during the construction of KPLS to improve its accuracy for multimodal functions. When the exponential covariance functions are used, this step is based on simple identification between the covariance function of KPLS and kriging. The developed method is validated especially by using a multimodal academic function, known as Griewank function in the literature, and we show the gain in terms of accuracy and computer time by comparing with KPLS and kriging.
During the last years, the kriging model [1–4], which is referred to as the Gaussian process model , has become one of the most popular methods in computer simulation and machine learning. It is used as a substitute of high-fidelity codes representing physical phenomena and aims to reduce the computational time of a particular process. For instance, the kriging model is used successfully in several optimization problems [6–11]. Kriging is not well adapted to high-dimensional problem, principally due to large matrix inversion problems. In fact, the kriging model becomes much time consuming when a large number of input variables are used since a large number of sampling points are required. Indeed, it is recommended in  to use sampling points, with the number of dimensions, for obtaining a good accuracy of the kriging model. As a result, we need to increase the size of the kriging covariance matrix which becomes computationally very expensive to invert. Moreover, this inversion’s problem induces difficulty in the classical hyperparameters estimation through the maximization of the likelihood function.
A recent method, called KPLS , is developed to reduce computational time which uses, during a construction of the kriging model, the dimensional reduction method “Partial Least Squares” (PLS). This method is able to reduce the number of hyperparameters of a kriging model, such that their number becomes equal to the number of principal components retained by the PLS method. The KPLS method is thus able to rapidly build a kriging model for high-dimensional problems (100+) while maintaining a good accuracy. However, it has been shown in  that the KPLS model is less accurate than the kriging model in many cases, in particular for multimodal functions.
In this paper, we propose an extra step that supplements  in order to improve its accuracy. Under hypothesis that kernels used for building the KPLS model are of exponential type with the same form (all Gaussian kernels, e.g.), we choose the hyperparameters found by the KPLS model as an initial point to optimize the likelihood function of a conventional kriging model. In fact, this approach is performed by identifying the covariance function of the KPLS model as a covariance function of a kriging model. The fact of considering the identified kriging model, instead of the KPLS model, leads to extending the search space where the hyperparameters are defined and thus to making the resulting model more flexible than the KPLS model.
This paper is organized in 3 main sections. In Section 2, we present a review of the KPLS model. In Section 3, we discuss our new approach under the hypothesis needed for its applicability. Finally, numerical results are shown to confirm the efficiency of our method followed by a summary of what we have achieved.
2. Construction of KPLS
In this section, we introduce the notation and describe the theory behind the construction of the KPLS model. Assume that we have evaluated a cost deterministic function of points () with , and we denote by the matrix . For simplicity, is considered to be a hypercube expressed by the product between intervals of each direction space; that is, , where with for . Simulating these inputs gives the outputs with , for .
2.1. Construction of the Kriging Model
For building the kriging model, we assume that the deterministic response is realization of a stochastic process [14–17]:The presented formula, with an unknown constant, corresponds to ordinary kriging  which is a particular case of universal kriging . The stochastic term is considered as realization of a stationary Gaussian process with and a covariance function, also called kernel function, given bywhere is the process variance and is the correlation function between and . However, the correlation function depends on hyperparameters which are considered to be known. We also denote the vector as and the correlation matrix as . We use to denote the prediction of the true function . Under the hypothesis above, the best linear unbiased predictor for , given the observations , iswhere denotes an -vector of ones andIn addition, the estimation of is given byMoreover, ordinary kriging provides an estimate of the variance of the prediction, which is given by
Note that the assumption of a known covariance function with known parameters is unrealistic in reality and they are often unknown. For this reason, the covariance function is typically chosen from among a parametric family of kernels. In this work, only the covariance functions of exponential type are considered, in particular the Gaussian kernel. Indeed, the Gaussian kernel is the most popular kernel in kriging metamodels of simulation models, which is given by We note that the parameters , for , can be interpreted as measuring how strongly the input variables , respectively, affect the output . If is very large, the kernel given by (7) tends to zero and thus leads to a low correlation. In fact, we see in Figure 1 how the correlation curve rapidly varies from a point to another when .
However, the estimator of the kriging parameters (, and ) makes the kriging predictor, given by (3), nonlinear and makes the estimated predictor variance, given by (6), biased. We note that the vector and the matrix should get hats above but it is ignored in practice .
2.2. Partial Least Squares
The PLS method is a statistical method which searches out the best multidimensional direction that explains the characteristics of the output . It finds a linear relationship between input variables and output variable by projecting input variables onto principal components, also called latent variables. The PLS technique reduces dimension and reveals how inputs depend on output. In the following, we use to denote the number of principal components retained which are a lot lower than (); does not generally exceed 4, in practice. In addition, the principal components can be computed sequentially. In fact, the principal component , for , is computed by seeking the best direction which maximizes the squared covariance between and :where , , and, for , and are the residual matrix from the local regression of onto the principal component and from the local regression of onto the principal component , respectively, such thatwhere (a vector) and (a coefficient) contain the regression coefficients. For more details of how PLS method works, please see [19–21].
The principal components represent the new coordinate system obtained upon rotating the original system with axes, . For , can be written asThis important relationship is mainly used for developing the KPLS model which is detailed in Section 2.3. The vectors , for , are given by the following matrix which is obtained by (for more details, see ) where and .
2.3. Construction of the KPLS Model
The hyperparameters , for , given by (7) are found by maximum likelihood estimation (MLE) method. Their estimation becomes more and more expensive when increases. The vector can be interpreted as measuring how strongly the variables affect the output , respectively. For building KPLS, coefficients given by vectors will be considered as measuring of the influence of the input variables on the output . By some elementary operations on the kernel functions, we define the KPLS kernel bywhere is an isotropic stationary kernel and
More details of such construction are given in . Considering the example of the Gaussian kernel given by (7), we obtainSince a small number of principal components are retained, the estimation of the hyperparameters is faster than the hyperparameters given by (7), where is very high (100+).
3. Transition from the KPLS Model to the Kriging Model Using the Exponential Covariance Functions
In this section, we show that if all kernels , for , used in (12) are of the exponential type with the same form (all Gaussian kernels, e.g.), then the kernel given by (12) will be of the exponential type with the same form as (Gaussian if all are Gaussian).
3.1. Proof of the Equivalence between the Kernels of the KPLS Model and the Kriging Model
Let us define, for , ; we haveIn the same way, we can show this equivalence for the other exponential kernels where :
However, we must caution that the above proof shows equivalence between the covariance functions of KPLS and kriging only on a subspace domain. More precisely, the KPLS covariance function is defined in a subspace from whereas the kriging covariance function is defined in the complete domain. Thus, our original idea is to extend the space where the KPLS covariance function is defined for the complete space .
3.2. A New Step during the Construction of the KPLS Model: KPLS+K
By considering the equivalence shown in the last section, we propose to add a new step during the construction of the KPLS model. This step occurs just after the -estimation, for . It involves making local optimization of the likelihood function of the kriging model equivalent to the KPLS model. Moreover, we use , for , as a starting point of the local optimization by considering the solution , for , found by the KPLS method. Thus, this optimization is done in the complete space, where the vector .
This approach, called KPLS+K, aims to improve the MLE of the kriging model equivalent to the associated KPLS model. In fact, the local optimization of the equivalent kriging offers more possibilities for improving the MLE, by considering a wider search space, and thus it will be able to correct the estimation of many directions. These directions are represented by for the th direction which is badly estimated by the KPLS method. Because estimating the equivalent kriging hyperparameters can be time consuming, especially when is large, we improve the MLE by a local optimization at the cost of a slight increase of computational time.
Figure 2 recalls the principal stages of building a KPLS+K model.
4. Numerical Simulations
We now focus on the performance of KPLS+K by comparing it with the KPLS model and the ordinary kriging model. For this purpose, we use the academic function, named Griewank, over the interval which is studied in . 20 and 60 dimensions are considered for this function. In addition, an engineering example, done at Snecma for a multidisciplinary optimization, is used. This engineering case is chosen since it was shown in  that KPLS is less accurate than ordinary kriging. The Gaussian kernel is used for all surrogate models used herein, that is, ordinary kriging, KPLS, and KPLS+K. For KPLS and KPLS+K using principal components, for , will be denoted by KPLS and KPLS+K, respectively, and this is varied from 1 to 3. The Python toolbox Scikit-learn v.014  is used to achieve the proposed numerical tests, except for ordinary kriging used on the industrial case, where the Optimus version is used. The training and the validation points used in  are reused in the following.
4.1. Griewank Function over the Interval [−5, 5]
The Griewank function [13, 24] is defined by Figure 3 shows the degree of complexity of such function which is very multimodal. As in , we consider and input variables. For each problem, ten experiments based on the random Latin-hypercube design are built with (number of sampling points) equal to 50, 100, 200, and 300. To better visualize the results, boxplots are used to show the CPU time and the relative error RE given by where represents the usual norm and and are the vectors containing the prediction and the real values of 5000 randomly selected validation points for each case. The mean and the standard error are given in Tables 2 and 3, respectively, in Appendix. However, the results of the ordinary kriging model and the KPLS model are reported from .
For 20 input variables and 50 sampling points, the KPLS models always give a more accurate solution than ordinary kriging and KPLS+K, as shown in Figure 4(a). Indeed, the best result is given by KPLS3 with a mean of RE equal to 0.51%. However, the KPLS+K models give more accurate models than ordinary kriging in this case (0.58% for KPLS2+K and KPLS3+K versus 0.62% for ordinary kriging). For the KPLS model, the rate of improvement with respect to the number of sampling points is less than for ordinary kriging and KPLS+K (see Figures 4(b)–4(d)). As a result, KPLS+K, for , and ordinary kriging give almost the same accuracy (0.16%) when 300 sampling points are used (as shown in Figure 4(d)), whereas the KPLS models give a RE of 0.35% as a best result, when .
(a) RE (%) for 20 input variables and 50 sampling points
(b) RE (%) for 20 input variables and 100 sampling points
(c) RE (%) for 20 input variables and 200 sampling points
(d) RE (%) for 20 input variables and 300 sampling points
Nevertheless, the results shown in Figure 5 indicate that the KPLS+K models lead to an important reduction in CPU time for the various number of sampling points compared to ordinary kriging. For instance, 20.49 s are required for building KPLS3+K when 300 training points are used, whereas ordinary kriging is built in 94.31 s; in this case, KPLS3+K is thus approximately 4 times cheaper than the ordinary kriging model. Moreover, the computational time required for building KPLS+K is more stable than the computational time for building ordinary kriging; standard deviations of approximately 3 s for KPLS+K and 22 s for the ordinary kriging model are observed.
(a) CPU time for 20 input variables and 50 sampling points
(b) CPU time for 20 input variables and 100 sampling points
(c) CPU time for 20 input variables and 200 sampling points
(d) CPU time for 20 input variables and 300 sampling points
For 60 input variables and 50 sampling points, a slight difference of the results occurs compared to the 20 input variables case (Figure 6(a)). Indeed, the KPLS models remain always better, with a mean of RE approximately equal to 0.92%, than KPLS+K and ordinary kriging. However, the KPLS+K models give more accurate results than ordinary kriging with an accuracy close to that of KPLS (0.99% versus 1.39%). Increasing the number of sampling points, the accuracy of ordinary kriging becomes better than the accuracy given by the KPLS models, but it remains less accurate than for the KPLS+K models, for or 3. For instance, we obtain a mean of RE with 0.60% for KPLS2+K against 0.65% for ordinary kriging (see Figure 6(d)), when 300 sampling points are used.
(a) RE (%) for 60 input variables and 50 sampling points
(b) RE (%) for 60 input variables and 100 sampling points
(c) RE (%) for 60 input variables and 200 sampling points
(d) RE (%) for 60 input variables and 300 sampling points
As we can observe from Figure 7(d), a very important reduction in terms of computational time is obtained. Indeed, a mean time of 2894.56 s is required for building ordinary kriging, whereas KPLS2+K is built in 23.03 s; KPLS2+K is approximately 125 times cheaper than ordinary kriging in this case. In addition, the computational time for building KPLS+K is more stable than ordinary kriging, except the KPLS3+K case; a standard deviation of approximately 0.30 s for KPLS1+K and KPLS2+K is observed, against 728.48 s for ordinary kriging. However, the relatively large standard of deviation of KPLS3+K (26.59 s) is probably due to the dispersion caused by KPLS3 (26.59 s). But, it remains too lower than the standard deviation of the ordinary kriging model.
(a) CPU time for 60 input variables and 50 sampling points
(b) CPU time for 60 input variables and 100 sampling points
(c) CPU time for 60 input variables and 200 sampling points
(d) CPU time for 60 input variables and 300 sampling points
For the Griewank function over the interval , the KPLS+K models are slightly more time consuming than the KPLS models, but they are more accurate, in particular when the number of observations is greater than the dimension , as is implied by the rule-of-thumb in . They seem to perform well compared to the ordinary kriging model with an important gain in terms of saving CPU time.
4.2. Engineering Case
In this section, let us consider the third output, , from problem studied in . This test case is chosen because the KPLS models, from 1 to 3 principal components, do not perform well (see Table 1). We recall that this problem contains 24 input variables. 99 training points and 52 validation points are used and the relative error (RE) given by (18) is considered.
As we see in Table 1, we improve the accuracy of KPLS by adding the step for building KPLS+K. This improvement is verified whatever the number of principal components used (1, 2, and 3 principal components). For these three models, a better accuracy is even found than the ordinary kriging model. The computational time required to build KPLS+k is approximately twice lower than the time required for ordinary kriging.
Motivated by the need to accurately approximate high-fidelity codes rapidly, we develop a new technique for building the kriging model faster than classical techniques used in literature. The key idea for such construction relies on the choice of the start point for optimizing the likelihood function of the kriging model. For this purpose, we firstly prove equivalence between KPLS and kriging when an exponential covariance function is used. After optimizing hyperparameters of KPLS, we then choose the solution obtained as an initial point to find the MLE of the equivalent kriging model. This approach will be applicable only if the kernels used for building KPLS are of the exponential type with the same form.
The performance of KPLS+K is verified in the Griewank function over the interval with 20 and 60 dimensions and an industrial case from Snecma, where the KPLS models do not perform well in terms of accuracy. The results of KPLS+K have shown a significant improvement in terms of accuracy compared to the results of KPLS, at the cost of a slight increase in computational time. We have also seen, in some cases, that accuracy of KPLS+K is even better than accuracy given by the ordinary kriging model.
Results of Griewank Function in 20D and 60D over the Interval [−5, 5]
Symbols and Notation (Matrices and Vectors Are in Bold Type)
|:||Set of real numbers|
|:||Set of positive real numbers|
|:||Number of sampling points|
|:||Number of principal components retained|
|:||The th element of a vector|
|:||A matrix containing sampling points|
|:||The true function performed on the vector|
|:||vector containing simulation of|
|:||The prediction of the true function|
|:||A stochastic process|
|:||The th training point for (a vector)|
|:||A vector containing -weights given by the th PLS-iteration for|
|:||Matrix containing residual of the inner regression of the th PLS-iteration for|
|:||A covariance function|
|:||Superscript denoting the transpose operation of the vector|
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors extend their grateful thanks to A. Chiplunkar from ISAE-SUPAERO, Toulouse, for his careful correction of the paper.
D. Krige, “A statistical approach to some basic mine valuation problems on the Witwatersrand,” Journal of the Chemical, Metallurgical and Mining Society, vol. 52, pp. 119–139, 1951.View at: Google Scholar
C. E. Rasmussen and C. K. Williams, Gaussian Processes for Machine Learning, Adaptive Computation and Machine Learning, MIT Press, Cambridge, Mass, USA, 2006.View at: MathSciNet
A. Forrester, A. Sobester, and A. Keane, Engineering Design via Surrogate Modelling: A Practical Guide, John Wiley & Sons, New York, NY, USA, 2008.
J. Laurenceau, Kriging based response surfaces for aerodynamic shape optimization [Thesis], Institut National Polytechnique de Toulouse (INPT), Toulouse, France, 2008.
M. A. Bouhlel, N. Bartoli, A. Otsmane, and J. Morlier, “Improving kriging surrogates of high-dimensional design models by partial least squares dimension reduction,” Structural and Multidisciplinary Optimization, vol. 53, no. 5, pp. 935–952, 2016.View at: Publisher Site | Google Scholar | MathSciNet
M. Sasena, Flexibility and efficiency enhancements for constrained global design optimization with Kriging approximations [Ph.D. thesis], University of Michigan, 2002.
O. Roustant, D. Ginsbourger, and Y. Deville, “DiceKriging, DiceOptim: two R packages for the analysis of computer experiments by kriging-based metamodeling and optimization,” Journal of Statistical Software, vol. 51, no. 1, pp. 1–55, 2012.View at: Google Scholar
J. P. Kleijnen, Design and Analysis of Simulation Experiments, vol. 230, Springer, Berlin, Germany, 2015.