An Improved Approach for Estimating the Hyperparameters of the Kriging Model for High-Dimensional Problems through the Partial Least Squares Method

Bouhlel, Mohamed Amine; Bartoli, Nathalie; Otsmane, Abdelkader; Morlier, Joseph

doi:https://doi.org/10.1155/2016/6723410

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusions Appendix Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2016 | Article ID 6723410 | https://doi.org/10.1155/2016/6723410

An Improved Approach for Estimating the Hyperparameters of the Kriging Model for High-Dimensional Problems through the Partial Least Squares Method

Mohamed Amine Bouhlel,¹Nathalie Bartoli,²Abdelkader Otsmane,³and Joseph Morlier⁴

Academic Editor: Erik Cuevas

Received31 Dec 2015

Revised10 May 2016

Accepted24 May 2016

Published28 Jun 2016

Abstract

During the last years, kriging has become one of the most popular methods in computer simulation and machine learning. Kriging models have been successfully used in many engineering applications, to approximate expensive simulation models. When many input variables are used, kriging is inefficient mainly due to an exorbitant computational time required during its construction. To handle high-dimensional problems (100+), one method is recently proposed that combines kriging with the Partial Least Squares technique, the so-called KPLS model. This method has shown interesting results in terms of saving CPU time required to build model while maintaining sufficient accuracy, on both academic and industrial problems. However, KPLS has provided a poor accuracy compared to conventional kriging on multimodal functions. To handle this issue, this paper proposes adding a new step during the construction of KPLS to improve its accuracy for multimodal functions. When the exponential covariance functions are used, this step is based on simple identification between the covariance function of KPLS and kriging. The developed method is validated especially by using a multimodal academic function, known as Griewank function in the literature, and we show the gain in terms of accuracy and computer time by comparing with KPLS and kriging.

1. Introduction

During the last years, the kriging model [1–4], which is referred to as the Gaussian process model [5], has become one of the most popular methods in computer simulation and machine learning. It is used as a substitute of high-fidelity codes representing physical phenomena and aims to reduce the computational time of a particular process. For instance, the kriging model is used successfully in several optimization problems [6–11]. Kriging is not well adapted to high-dimensional problem, principally due to large matrix inversion problems. In fact, the kriging model becomes much time consuming when a large number of input variables are used since a large number of sampling points are required. Indeed, it is recommended in [12] to use sampling points, with the number of dimensions, for obtaining a good accuracy of the kriging model. As a result, we need to increase the size of the kriging covariance matrix which becomes computationally very expensive to invert. Moreover, this inversion’s problem induces difficulty in the classical hyperparameters estimation through the maximization of the likelihood function.

A recent method, called KPLS [13], is developed to reduce computational time which uses, during a construction of the kriging model, the dimensional reduction method “Partial Least Squares” (PLS). This method is able to reduce the number of hyperparameters of a kriging model, such that their number becomes equal to the number of principal components retained by the PLS method. The KPLS method is thus able to rapidly build a kriging model for high-dimensional problems (100+) while maintaining a good accuracy. However, it has been shown in [13] that the KPLS model is less accurate than the kriging model in many cases, in particular for multimodal functions.

In this paper, we propose an extra step that supplements [13] in order to improve its accuracy. Under hypothesis that kernels used for building the KPLS model are of exponential type with the same form (all Gaussian kernels, e.g.), we choose the hyperparameters found by the KPLS model as an initial point to optimize the likelihood function of a conventional kriging model. In fact, this approach is performed by identifying the covariance function of the KPLS model as a covariance function of a kriging model. The fact of considering the identified kriging model, instead of the KPLS model, leads to extending the search space where the hyperparameters are defined and thus to making the resulting model more flexible than the KPLS model.

This paper is organized in 3 main sections. In Section 2, we present a review of the KPLS model. In Section 3, we discuss our new approach under the hypothesis needed for its applicability. Finally, numerical results are shown to confirm the efficiency of our method followed by a summary of what we have achieved.

2. Construction of KPLS

In this section, we introduce the notation and describe the theory behind the construction of the KPLS model. Assume that we have evaluated a cost deterministic function of points () with , and we denote by the matrix . For simplicity, is considered to be a hypercube expressed by the product between intervals of each direction space; that is, , where with for . Simulating these inputs gives the outputs with , for .

2.1. Construction of the Kriging Model

For building the kriging model, we assume that the deterministic response is realization of a stochastic process [14–17]:The presented formula, with an unknown constant, corresponds to ordinary kriging [8] which is a particular case of universal kriging [15]. The stochastic term is considered as realization of a stationary Gaussian process with and a covariance function, also called kernel function, given bywhere is the process variance and is the correlation function between and . However, the correlation function depends on hyperparameters which are considered to be known. We also denote the vector as and the correlation matrix as . We use to denote the prediction of the true function . Under the hypothesis above, the best linear unbiased predictor for , given the observations , iswhere denotes an -vector of ones andIn addition, the estimation of is given byMoreover, ordinary kriging provides an estimate of the variance of the prediction, which is given by

Note that the assumption of a known covariance function with known parameters is unrealistic in reality and they are often unknown. For this reason, the covariance function is typically chosen from among a parametric family of kernels. In this work, only the covariance functions of exponential type are considered, in particular the Gaussian kernel. Indeed, the Gaussian kernel is the most popular kernel in kriging metamodels of simulation models, which is given by We note that the parameters , for , can be interpreted as measuring how strongly the input variables , respectively, affect the output . If is very large, the kernel given by (7) tends to zero and thus leads to a low correlation. In fact, we see in Figure 1 how the correlation curve rapidly varies from a point to another when .

However, the estimator of the kriging parameters (, and ) makes the kriging predictor, given by (3), nonlinear and makes the estimated predictor variance, given by (6), biased. We note that the vector and the matrix should get hats above but it is ignored in practice [18].

2.2. Partial Least Squares

The PLS method is a statistical method which searches out the best multidimensional direction that explains the characteristics of the output . It finds a linear relationship between input variables and output variable by projecting input variables onto principal components, also called latent variables. The PLS technique reduces dimension and reveals how inputs depend on output. In the following, we use to denote the number of principal components retained which are a lot lower than (); does not generally exceed 4, in practice. In addition, the principal components can be computed sequentially. In fact, the principal component , for , is computed by seeking the best direction which maximizes the squared covariance between and :where , , and, for , and are the residual matrix from the local regression of onto the principal component and from the local regression of onto the principal component , respectively, such thatwhere (a vector) and (a coefficient) contain the regression coefficients. For more details of how PLS method works, please see [19–21].

The principal components represent the new coordinate system obtained upon rotating the original system with axes, [21]. For , can be written asThis important relationship is mainly used for developing the KPLS model which is detailed in Section 2.3. The vectors , for , are given by the following matrix which is obtained by (for more details, see [22]) where and .

2.3. Construction of the KPLS Model

The hyperparameters , for , given by (7) are found by maximum likelihood estimation (MLE) method. Their estimation becomes more and more expensive when increases. The vector can be interpreted as measuring how strongly the variables affect the output , respectively. For building KPLS, coefficients given by vectors will be considered as measuring of the influence of the input variables on the output . By some elementary operations on the kernel functions, we define the KPLS kernel bywhere is an isotropic stationary kernel and

More details of such construction are given in [13]. Considering the example of the Gaussian kernel given by (7), we obtainSince a small number of principal components are retained, the estimation of the hyperparameters is faster than the hyperparameters given by (7), where is very high (100+).

3. Transition from the KPLS Model to the Kriging Model Using the Exponential Covariance Functions

In this section, we show that if all kernels , for , used in (12) are of the exponential type with the same form (all Gaussian kernels, e.g.), then the kernel given by (12) will be of the exponential type with the same form as (Gaussian if all are Gaussian).

3.1. Proof of the Equivalence between the Kernels of the KPLS Model and the Kriging Model

Let us define, for , ; we haveIn the same way, we can show this equivalence for the other exponential kernels where :

However, we must caution that the above proof shows equivalence between the covariance functions of KPLS and kriging only on a subspace domain. More precisely, the KPLS covariance function is defined in a subspace from whereas the kriging covariance function is defined in the complete domain. Thus, our original idea is to extend the space where the KPLS covariance function is defined for the complete space .

3.2. A New Step during the Construction of the KPLS Model: KPLS+K

By considering the equivalence shown in the last section, we propose to add a new step during the construction of the KPLS model. This step occurs just after the -estimation, for . It involves making local optimization of the likelihood function of the kriging model equivalent to the KPLS model. Moreover, we use , for , as a starting point of the local optimization by considering the solution , for , found by the KPLS method. Thus, this optimization is done in the complete space, where the vector .

This approach, called KPLS+K, aims to improve the MLE of the kriging model equivalent to the associated KPLS model. In fact, the local optimization of the equivalent kriging offers more possibilities for improving the MLE, by considering a wider search space, and thus it will be able to correct the estimation of many directions. These directions are represented by for the th direction which is badly estimated by the KPLS method. Because estimating the equivalent kriging hyperparameters can be time consuming, especially when is large, we improve the MLE by a local optimization at the cost of a slight increase of computational time.

Figure 2 recalls the principal stages of building a KPLS+K model.

4. Numerical Simulations

We now focus on the performance of KPLS+K by comparing it with the KPLS model and the ordinary kriging model. For this purpose, we use the academic function, named Griewank, over the interval which is studied in [13]. 20 and 60 dimensions are considered for this function. In addition, an engineering example, done at Snecma for a multidisciplinary optimization, is used. This engineering case is chosen since it was shown in [13] that KPLS is less accurate than ordinary kriging. The Gaussian kernel is used for all surrogate models used herein, that is, ordinary kriging, KPLS, and KPLS+K. For KPLS and KPLS+K using principal components, for , will be denoted by KPLS and KPLS+K, respectively, and this is varied from 1 to 3. The Python toolbox Scikit-learn v.014 [23] is used to achieve the proposed numerical tests, except for ordinary kriging used on the industrial case, where the Optimus version is used. The training and the validation points used in [13] are reused in the following.

4.1. Griewank Function over the Interval [−5, 5]

The Griewank function [13, 24] is defined by Figure 3 shows the degree of complexity of such function which is very multimodal. As in [13], we consider and input variables. For each problem, ten experiments based on the random Latin-hypercube design are built with (number of sampling points) equal to 50, 100, 200, and 300. To better visualize the results, boxplots are used to show the CPU time and the relative error RE given by where represents the usual norm and and are the vectors containing the prediction and the real values of 5000 randomly selected validation points for each case. The mean and the standard error are given in Tables 2 and 3, respectively, in Appendix. However, the results of the ordinary kriging model and the KPLS model are reported from [13].

For 20 input variables and 50 sampling points, the KPLS models always give a more accurate solution than ordinary kriging and KPLS+K, as shown in Figure 4(a). Indeed, the best result is given by KPLS3 with a mean of RE equal to 0.51%. However, the KPLS+K models give more accurate models than ordinary kriging in this case (0.58% for KPLS2+K and KPLS3+K versus 0.62% for ordinary kriging). For the KPLS model, the rate of improvement with respect to the number of sampling points is less than for ordinary kriging and KPLS+K (see Figures 4(b)–4(d)). As a result, KPLS+K, for , and ordinary kriging give almost the same accuracy (0.16%) when 300 sampling points are used (as shown in Figure 4(d)), whereas the KPLS models give a RE of 0.35% as a best result, when .

(a) RE (%) for 20 input variables and 50 sampling points

(b) RE (%) for 20 input variables and 100 sampling points

(c) RE (%) for 20 input variables and 200 sampling points

(d) RE (%) for 20 input variables and 300 sampling points

Nevertheless, the results shown in Figure 5 indicate that the KPLS+K models lead to an important reduction in CPU time for the various number of sampling points compared to ordinary kriging. For instance, 20.49 s are required for building KPLS3+K when 300 training points are used, whereas ordinary kriging is built in 94.31 s; in this case, KPLS3+K is thus approximately 4 times cheaper than the ordinary kriging model. Moreover, the computational time required for building KPLS+K is more stable than the computational time for building ordinary kriging; standard deviations of approximately 3 s for KPLS+K and 22 s for the ordinary kriging model are observed.

(a) CPU time for 20 input variables and 50 sampling points

(b) CPU time for 20 input variables and 100 sampling points

(c) CPU time for 20 input variables and 200 sampling points

(d) CPU time for 20 input variables and 300 sampling points

For 60 input variables and 50 sampling points, a slight difference of the results occurs compared to the 20 input variables case (Figure 6(a)). Indeed, the KPLS models remain always better, with a mean of RE approximately equal to 0.92%, than KPLS+K and ordinary kriging. However, the KPLS+K models give more accurate results than ordinary kriging with an accuracy close to that of KPLS (0.99% versus 1.39%). Increasing the number of sampling points, the accuracy of ordinary kriging becomes better than the accuracy given by the KPLS models, but it remains less accurate than for the KPLS+K models, for or 3. For instance, we obtain a mean of RE with 0.60% for KPLS2+K against 0.65% for ordinary kriging (see Figure 6(d)), when 300 sampling points are used.

(a) RE (%) for 60 input variables and 50 sampling points

(b) RE (%) for 60 input variables and 100 sampling points

(c) RE (%) for 60 input variables and 200 sampling points

(d) RE (%) for 60 input variables and 300 sampling points

As we can observe from Figure 7(d), a very important reduction in terms of computational time is obtained. Indeed, a mean time of 2894.56 s is required for building ordinary kriging, whereas KPLS2+K is built in 23.03 s; KPLS2+K is approximately 125 times cheaper than ordinary kriging in this case. In addition, the computational time for building KPLS+K is more stable than ordinary kriging, except the KPLS3+K case; a standard deviation of approximately 0.30 s for KPLS1+K and KPLS2+K is observed, against 728.48 s for ordinary kriging. However, the relatively large standard of deviation of KPLS3+K (26.59 s) is probably due to the dispersion caused by KPLS3 (26.59 s). But, it remains too lower than the standard deviation of the ordinary kriging model.

(a) CPU time for 60 input variables and 50 sampling points

(b) CPU time for 60 input variables and 100 sampling points

(c) CPU time for 60 input variables and 200 sampling points

(d) CPU time for 60 input variables and 300 sampling points

For the Griewank function over the interval [], the KPLS+K models are slightly more time consuming than the KPLS models, but they are more accurate, in particular when the number of observations is greater than the dimension , as is implied by the rule-of-thumb in [12]. They seem to perform well compared to the ordinary kriging model with an important gain in terms of saving CPU time.

4.2. Engineering Case

In this section, let us consider the third output, , from problem studied in [13]. This test case is chosen because the KPLS models, from 1 to 3 principal components, do not perform well (see Table 1). We recall that this problem contains 24 input variables. 99 training points and 52 validation points are used and the relative error (RE) given by (18) is considered.

As we see in Table 1, we improve the accuracy of KPLS by adding the step for building KPLS+K. This improvement is verified whatever the number of principal components used (1, 2, and 3 principal components). For these three models, a better accuracy is even found than the ordinary kriging model. The computational time required to build KPLS+k is approximately twice lower than the time required for ordinary kriging.

5. Conclusions

Motivated by the need to accurately approximate high-fidelity codes rapidly, we develop a new technique for building the kriging model faster than classical techniques used in literature. The key idea for such construction relies on the choice of the start point for optimizing the likelihood function of the kriging model. For this purpose, we firstly prove equivalence between KPLS and kriging when an exponential covariance function is used. After optimizing hyperparameters of KPLS, we then choose the solution obtained as an initial point to find the MLE of the equivalent kriging model. This approach will be applicable only if the kernels used for building KPLS are of the exponential type with the same form.

The performance of KPLS+K is verified in the Griewank function over the interval with 20 and 60 dimensions and an industrial case from Snecma, where the KPLS models do not perform well in terms of accuracy. The results of KPLS+K have shown a significant improvement in terms of accuracy compared to the results of KPLS, at the cost of a slight increase in computational time. We have also seen, in some cases, that accuracy of KPLS+K is even better than accuracy given by the ordinary kriging model.

Appendix

Results of Griewank Function in 20D and 60D over the Interval [−5, 5]

In Tables 2 and 3, the mean and the standard deviation (std) of the numerical experiments with the Griewank function are given for 20 and 60 dimensions, respectively.

Symbols and Notation (Matrices and Vectors Are in Bold Type)

:	Absolute value
:	Set of real numbers
:	Set of positive real numbers
:	Number of sampling points
:	Dimension
:	Number of principal components retained
:	A vector
:	The th element of a vector
:	A matrix containing sampling points
:	The true function performed on the vector
:	vector containing simulation of
:	The prediction of the true function
:	A stochastic process
:	The th training point for (a vector)
:	A vector containing -weights given by the th PLS-iteration for
:
:	Matrix containing residual of the inner regression of the th PLS-iteration for
:	A covariance function
:	Superscript denoting the transpose operation of the vector
:	Approximately sign.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors extend their grateful thanks to A. Chiplunkar from ISAE-SUPAERO, Toulouse, for his careful correction of the paper.

References

D. Krige, “A statistical approach to some basic mine valuation problems on the Witwatersrand,” Journal of the Chemical, Metallurgical and Mining Society, vol. 52, pp. 119–139, 1951.
View at: Google Scholar
G. Matheron, “Principles of geostatistics,” Economic Geology, vol. 58, no. 8, pp. 1246–1266, 1963.
View at: Publisher Site | Google Scholar
N. Cressie, “Spatial prediction and ordinary kriging,” Mathematical Geology, vol. 20, no. 4, pp. 405–421, 1988.
View at: Publisher Site | Google Scholar | MathSciNet
J. Sacks, S. B. Schiller, and W. J. Welch, “Designs for computer experiments,” Technometrics, vol. 31, no. 1, pp. 41–47, 1989.
View at: Publisher Site | Google Scholar | MathSciNet
C. E. Rasmussen and C. K. Williams, Gaussian Processes for Machine Learning, Adaptive Computation and Machine Learning, MIT Press, Cambridge, Mass, USA, 2006.
View at: MathSciNet
D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient global optimization of expensive black-box functions,” Journal of Global Optimization, vol. 13, no. 4, pp. 455–492, 1998.
View at: Publisher Site | Google Scholar | MathSciNet
S. Sakata, F. Ashida, and M. Zako, “Structural optimizatiion using Kriging approximation,” Computer Methods in Applied Mechanics and Engineering, vol. 192, no. 7-8, pp. 923–939, 2003.
View at: Publisher Site | Google Scholar
A. Forrester, A. Sobester, and A. Keane, Engineering Design via Surrogate Modelling: A Practical Guide, John Wiley & Sons, New York, NY, USA, 2008.
J. Laurenceau, Kriging based response surfaces for aerodynamic shape optimization [Thesis], Institut National Polytechnique de Toulouse (INPT), Toulouse, France, 2008.
J. P. C. Kleijnen, W. van Beers, and I. van Nieuwenhuyse, “Constrained optimization in expensive simulation: novel approach,” European Journal of Operational Research, vol. 202, no. 1, pp. 164–174, 2010.
View at: Publisher Site | Google Scholar
J. P. Kleijnen, W. van Beers, and I. van Nieuwenhuyse, “Expected improvement in efficient global optimization through bootstrapped kriging,” Journal of Global Optimization, vol. 54, no. 1, pp. 59–73, 2012.
View at: Publisher Site | Google Scholar | MathSciNet
J. L. Loeppky, J. Sacks, and W. J. Welch, “Choosing the sample size of a computer experiment: a practical guide,” Technometrics, vol. 51, no. 4, pp. 366–376, 2009.
View at: Publisher Site | Google Scholar | MathSciNet
M. A. Bouhlel, N. Bartoli, A. Otsmane, and J. Morlier, “Improving kriging surrogates of high-dimensional design models by partial least squares dimension reduction,” Structural and Multidisciplinary Optimization, vol. 53, no. 5, pp. 935–952, 2016.
View at: Publisher Site | Google Scholar | MathSciNet
J. Sacks, W. J. Welch, T. J. Mitchell, and H. P. Wynn, “Design and analysis of computer experiments,” Statistical Science, vol. 4, no. 4, pp. 409–435, 1989.
View at: Google Scholar | MathSciNet
M. Sasena, Flexibility and efficiency enhancements for constrained global design optimization with Kriging approximations [Ph.D. thesis], University of Michigan, 2002.
V. Picheny, D. Ginsbourger, Y. Richet, and G. Caplin, “Quantile-based optimization of noisy computer experiments with tunable precision,” Technometrics, vol. 55, no. 1, pp. 2–13, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
O. Roustant, D. Ginsbourger, and Y. Deville, “DiceKriging, DiceOptim: two R packages for the analysis of computer experiments by kriging-based metamodeling and optimization,” Journal of Statistical Software, vol. 51, no. 1, pp. 1–55, 2012.
View at: Google Scholar
J. P. Kleijnen, Design and Analysis of Simulation Experiments, vol. 230, Springer, Berlin, Germany, 2015.
I. S. Helland, “On the structure of partial least squares regression,” Communications in Statistics. Simulation and Computation, vol. 17, no. 2, pp. 581–607, 1988.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
l. E. Frank and J. H. Friedman, “A statistical view of some chemometrics regression tools,” Technometrics, vol. 35, no. 2, pp. 109–135, 1993.
View at: Publisher Site | Google Scholar
R. A. Pérez and G. González-Farias, “Partial least squares regression on symmetric positive-definite matrices,” Revista Colombiana de Estadística, vol. 36, no. 1, pp. 177–192, 2013.
View at: Google Scholar | MathSciNet
R. Manne, “Analysis of two partial-least-squares algorithms for multivariate calibration,” Chemometrics and Intelligent Laboratory Systems, vol. 2, no. 1–3, pp. 187–197, 1987.
View at: Publisher Site | Google Scholar
F. Pedregosa, G. Varoquaux, A. Gramfort et al., “Scikit-learn: machine learning in Python,” Journal of Machine Learning Research (JMLR), vol. 12, pp. 2825–2830, 2011.
View at: Google Scholar | MathSciNet
R. G. Regis and C. A. Shoemaker, “Combining radial basis function surrogates and dynamic coordinate search in high-dimensional expensive black-box optimization,” Engineering Optimization, vol. 45, no. 5, pp. 529–555, 2013.
View at: Publisher Site | Google Scholar | MathSciNet

Copyright

Copyright © 2016 Mohamed Amine Bouhlel et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

2623

Downloads

1998

Citations