Research Article  Open Access
Kernel Parameter Optimization for Kriging Based on Structural Risk Minimization Principle
Abstract
An improved kernel parameter optimization method based on Structural Risk Minimization (SRM) principle is proposed to enhance the generalization ability of traditional Kriging surrogate model. This article first analyses the importance of the generalization ability as an assessment criteria of surrogate model from the perspective of statistics and proves the applicability to Kriging. Kernel parameter optimization method is used to improve the fitting precision of Kriging model. With the smoothness measure of the generalization ability and the anisotropy kernel function, the modified Kriging surrogate model and its analysis process are established. Several benchmarks are tested to verify the effectiveness of the modified method under two different sampling states: uniform distribution and nonuniform distribution. The results show that the proposed Kriging has better generalization ability and adaptability, especially for nonuniform distribution sampling.
1. Introduction
Computerintensive optimization problem becomes more and more as the increasing requirement for highfidelity model in industry area, especially for aerospace engineering. It is very urgent to improve computation efficiency [1, 2]. Surrogate model is the comprehensive application of the Design of Experiment (DOE), mathematical statistics, and optimization techniques. It approximates the complicated and timeconsuming physical model by building some analytical mathematical model, to reduce the analysis process and smooth the design space. It becomes one of the most effective methods for computerintensive problem and has been widely applied to highfidelity design and optimization [3].
The fitting precision of surrogate model for physical model is one of the most important indicators. For computational resource limitation, it is not rational to assess the entire design space of the physical model directly; using the error between surrogate model and physical model at the sample points to estimate the fitting precision is the most effective way. Fitting precision is a function of the sample capacity. It converges to the true precision only when the sample capacity satisfies the large number theorem. Actually sample capacity and fitting precision are conflicting. We hope less sample points with higher fitting precision. Vapnik proposed the statistical learning theory (SLT) in the late 1970s [4, 5] and became a machine learning method for limiting sample points in the 1990s. The core idea is to control the generalization ability of the learning machine by control the capacity of the machine. That is to improve the fitting precision of the surrogate model by applying SRM principle.
There are many surrogate models, such as Response Surface Model (RSM), Kriging, Radial Basis Function (RBF), and Support Vector Regression (SVR). Kriging is an interpolation method based on statistical theory; the main idea is to evaluate the approximate function of the object based on the dynamic construction of design space to predict the information of unknown points [6, 7]. For Kriging, linear weighted combination of the information of nearby points is also needed for predicting unknown points, which can be determined by minimizing the variance of the estimated value error, which means Kriging is also a best linear unbiased estimator problem [8]. Kriging surrogate model is simple and stable compared to other methods and has been widely used in many fields [9–12]. Some improved Kriging models have also been studied, such as GradientEnhanced Kriging [13], CoKriging [14], and Hierarchical Kriging [15].
Kriging is sensitive to sample points. It has worse predicting result when with less sample points. The quantity of information of the correlation matrix and the parameters of the basis function have obvious influence to fitting precision. It is very necessary to optimize the parameters of the basis function to improve fitting capacity. SRM is the most important theory to evaluate the generalization ability of the surrogate model, which has been widely used in kernel parameter optimization of SVM [16], fuzzy model [17], and chaotic system [18]. But there are a few studies for other surrogate models. Zhu [19] proposed an automatic method to optimize basis function for RBF using the SRM principle. Chen [20] applied to SRM principle to RBF network study to improve the generalization ability. These previous researches indicate that using SRM principle to optimize the parameters of the surrogate model really can improve the fitting precision of surrogate model.
This paper proposes an anisotropic basis function parameter optimization method for Kriging based on SRM principle. The parameters of the correlation functions of Kriging are optimized using the smoothness measure as objective function. Some benchmarks with different scale are used to verify the proposed methods in this paper. The influence of the distribution pattern of sample points with uniform distribution and nonuniform distribution is also studied.
2. Assessment Criteria of Surrogate Model
The purpose of machine learning is to find the internal dependencies by learning given data to predict unknown data or estimate model characteristic. From the perspective of statistics, machine learning can be viewed as to predict the unknown output relation by some given training samples as accurate as possible. To minimize the risk functional, we should find the optimal function in function set : is the prediction function set; is the parameter of the prediction function. So the machine learning is designed to minimize the risk functional of the squared error loss function.
2.1. Empirical Risk Minimization Principle
Usually we cannot predict the specific form of probability distribution and only know there is a proper probability distribution which can fit the training samples. The expected risk depends on probability distribution ; it is hard to minimize the risk functional directly by constructing predictive function of minimized expected risk with several samples. Therefore, Empirical Risk Minimization (ERM) principle is proposed to minimize risk functional in the traditional machine learning method,
ERM has been widely used in the least square method for regression problem, maximum likelihood method for probability density estimation, and neural network learning [21]. But ERM is based on the large number theorem, completely dependent on the sum of squared error of the samples. With less samples, small empirical risk cannot guarantee minimizing expected risk. There may happen oversmoothing for the RBF and neural network, which is based on ERM principle [22]: little training error for sample points but large testing error for other points.
2.2. Structural Risk Minimization Principle
Generalization ability means the capacity to predict or estimate unknown phenomenon by machine learning. ERM principle cannot guarantee minimizing expected risk. Therefore, SRM principle is proposed based on the statistical learning theory. It provides the relationship between empirical risk and real risk, which is known as the bound of the generalization ability. SRM principle divides the real risk of machine learning into empirical risk and confidence interval [4]: is the VapnikChervonenkis (VC) dimension, which is the most important theoretical basis of the statistical learning theory. It defines the capacity of the function set and reflects the generalization ability of the machine learning. It is the best descriptive indicators for the capacity of function set learning until now.
From (3), we know ERM principle is unconscionable with limited samples. When becomes larger, empirical risk decreases, but real risk does not always become lower. When the dimension of samples is fixed, reduced (VC) may decrease confidence interval, so the empirical risk comes close to real risk. In this case, less empirical risk stands for less expected risk. Increasing the capacity of samples also decreases confidence interval to reduce real risk, but it is difficult to get plenty of samples because of the limitation of computational cost.
Therefore, decreasing VC dimension is the most suitable way to reduce real risk. For a given observation set, SRM principle chooses the proper function from the subset with minimized risk to minimize the empirical risk. So SRM principle is a compromise method to balance the fitting precision and the complexity of fitting function.
2.3. SRM Index Based on Smoothness Measure
Smoothness measure is used to evaluate the generalization ability. In the projection space formed by kernel function, smoothness measure has its more direct and natural definition. It can be described as the norm of Hilbert space of the kernel function, which is defined as follows.
Every function in Hilbert space of kernel function can be expressed as
Then the norm of in Hilbert space is is the eigenvalue of kernel . The generalization ability can be measured by evaluating . The optimal regression problem becomeswhere is the smoothness measure of sample points.
2.4. Mercer Theorem
The criteria to determine whether a function is kernel function is as follows.
is the domain of the independent variables and . There is a nonnegative real function ; it is continuous in . The sufficient and necessary condition for to be expressed as a seriesis that all functions satisfy and also satisfy the following condition:where the series is convergence in . is the eigenvalue () of . is the eigenvector of . A function satisfying the above definition is known as Mercer kernel and can be expressed as the inner product in Hilbert space,where is the operator of inner product. is a nonlinear transformation of kernel function.
3. Kriging Based on SRM Principle
3.1. Kriging Model
Kriging model includes two parts:where is a polynomial regression model and is a deviation model. The covariance matrix of iswhere is the correlative coefficient matrix. is the space correlation function of any two sample points and in sample points set. It plays a decisive role to fitting precision. θ is the parameters of . and the common correlation functions include the following [23].
Exponential function is
Gaussian function is
Linear function is
Spherical function is
Cube function is
Applying the linear combination of the response of known samples to predict the response of any given samples,
The interpolated coefficients of Kriging were obtained by the condition of unbiasedness and the principle of minimum variation. The mean value of error must be zero with minimized variance to guarantee the unbiasedness of the fitting process. The following is the corresponding Lagrangian minimization problem:
The parameter of correlative coefficient matrix can be obtained by minimizing the Maximum Likelihood Estimation (MLE) [23].
3.2. Kriging Process
The output of Kriging model can be expressed as
From Section 2.4, if satisfies Mercer theorem, it can be expressed as an inner product in Hilbert spaceEquation (19) is replaced withEquation (22) is simplified aswhere
The transformed equation (23) has the same form as the kernel function of SVR. It constructs a constant nonlinear mapping from input space to feature space . Input variable has been mapped into feature space , and a linear model is generated for linear learning in feature space. The correlative coefficient matrix of Kriging has the same mean to the kernel matrix of SVR; it has also been called kernel matrix in this paper. Then can be used to evaluate the smoothness of the fitting function; it has been called smoothness measure. From the above derivation we knew that SRM principle also can be used to minimize the VC dimension of Kriging model to obtain minimized expected risk [4].
Kernel matrix must be positive definite symmetric matrix or conditioned positive definite symmetric matrix. The kernel matrix formed by Gauss basis function satisfies Mercer theorem. There are also many other functions which can be found in [24].
3.3. Anisotropic Kernel Function
The basis function of Kriging can be expressed as
The parameter of basis function means the contribution rate of every sample point to Kriging model. For a problem with samples, the size of parameter also is . As the size of samples increases, the scale of the kernel parameter optimization problem also becomes larger. These may remarkably increase Kriging modeling time. Usually the parameter is considered as the same value for all samples; the basis function reduces to an isotropic model. This simplification reduces the design space of Kriging model and weakens fitting effect. The following anisotropic kernel function with different contribution by each dimension is used to balance the fitting effect and computational cost: where is the dimension of sample . is the contribution of the th dimension. This improved kernel function has the advantage of constant scale of kernel parameter optimization problem for a given problem with any size of samples.
3.4. SRMKriging
The kernel parameter optimization problem of Kriging model with smoothness measure iswhere is the kernel parameter. is the smoothness measure as objective function:where is the generalized least squares solution of the polynomial problem . The derivative of for is
Let ; the derivative can be
The derivative for each is
Then we getwhere is the matrix composed by every derivative value at . is the derivative of basis function for the distance of samples. Based on this derivative information, the efficient Sequential Quadratic Programming (SQP) method is used to optimize the kernel parameter optimization problem of Kriging model.
4. Test Cases
Nine standard test functions with different dimension are used to validate the effectiveness of the proposed method. The improved Kriging model also has been evaluated by sample points with uniform distribution and nonuniform distribution.
4.1. Standard Test Functions
There are nine standard test functions with different dimension.
Fun1 is
Fun2 is
Fun3 is
Fun4 is
Fun5 is
Fun6 is
Fun7 is
Fun8 is
Fun9 is
These functions are from one dimension to thirty dimensions with minor different nonlinearity, which can be used to evaluate the performance of the surrogate model under different scales of variables.
These test functions are divided into three scales according to their dimension: small scale, middle scale, and large scale. The sampling information of the above functions is in Table 1.

Kriging with SRM (SRMKriging), Kriging with MLE (MLEKriging), and Kriging with constant kernel parameter (CONKriging) are three methods tested using the above test cases. Gauss function is chosen as the kernel function for all the three methods. Multiple correlation coefficient and rootmeansquare error (RMSE) are used to evaluate the fitting precision.
4.2. Samples with Uniform Distribution
The optimal Latin hypercube sampling method is used to generate uniform distributed samples. Figure 1 and Table 2 are the comparison of the generalization ability between different test functions with uniform distribution sampling. It is observed that, as the dimension of variables increases, the fitting precision of CONKriging decreases rapidly, especially for the largescale cases (Fun7~Fun9); MLEKriging has better fitting precision than CONKriging via using optimized kernel parameter, but still with poor fitting precision for largescale cases. SRMKriging has the best fitting precision than CONKriging and MLEKriging; even for the largescale cases, it also has the smallest RMSE.

Figure 2 shows the fitting results of the Fun1 and Fun2 under uniform distribution sampling. From the fitting result of Fun1 we knew that CONKriging and MLEKriging have poor fitting precision, especially for CONKriging; it had oscillated badly. This illustrated kernel function optimization is very important for improving fitting precision of surrogate model. Compared with CONKriging and MLEKriging, SRMKriging is almost closed to original function. It has better generalization ability than others.
(a) Fitting result of Fun1 with uniform distribution
(b) Fitting result of Fun2 with uniform distribution
4.3. Samples with Nonuniform Distribution
In a practical application, the samples are always not satisfied uniform distribution standard. This will do some impact to surrogate model building. A typical example is the sequential surrogate model [25]; as adding points goes on, the distribution of samples changes continuously; an aggregation distribution of samples has appeared. The performance of surrogate model with nonuniform distribution samples is also important. At present there are a few researches for surrogate model with nonuniform distribution samples [26]. In this section, normal distribution is used to imitate and generate nonuniform distribution samples. The above three surrogate models are tested using these nonuniform distribution points. The mean and standard deviation of the normal distribution of test functions are listed in Table 3.

30 times of random normal distribution sampling are generated to test these three methods and the average value of and RMSE is used to make a comparison. Figure 3 and Table 4 are the comparison of the generalization ability between different test functions with nonuniform distribution sampling.

It is observed that, for the test cases with nonuniform distribution sampling, the fitting precision of all methods decreased remarkably, especially for largescale problem. CONKriging has very poor performance, even for smallscale problems. MLEKriging is better than CONKriging. SRMKriging still has the best fitting precision than the others. Although all three methods are failed for largescale problems, but SRMKriging still has the smallest RMSE.
Figure 4 shows the fitting results of the Fun1 and Fun2 with nonuniform distribution sampling. From the fitting result we knew that all fitting curves for Fun1 and Fun2 are distorted. For Fun1, the fitting curves generated by CONKriging and MLEKriging even cannot fit the basic trends; SRMKriging obviously is better than the others and is in line with the basic trends except the left region with little samples. It has been proved that generalization ability is more suitable for evaluating comprehensive performance of surrogate model. For Fun2, three methods are all distorted at the right region with less samples, but the fitting performance of SRMKriging and MLEKriging is much better than CONKriging, which proves the importance of kernel parameter optimization.
(a) The fitting result of Fun1 with nonuniform distribution
(b) The fitting result of Fun2 with nonuniform distribution
5. Conclusion
More and more attention is paid to the improvement of fitting precision as the widely spread and applied surrogate model. This paper studies the assessment criteria of surrogate model and discusses the importance of the generalization ability. Based on these, the kernel function optimization method is proposed to improve the fitting precision. Some standard benchmarks are tested to verify the effectiveness of the improved method based on sample points with uniform distribution and nonuniform distribution. In conclusion, we have the following.(1)SRMKriging method based on anisotropic kernel function optimization is proposed to replace the coefficient of each samples with the coefficient of the component of samples. The computational cost is reduced.(2)A comparison is carried out among CONKriging, SRMKriging, and MLEKriging. Results show that kernel function optimization is very important to improve the fitting precision of surrogate model. SRM principle provides a more effective evaluation for the generalization ability. SRMKriging has better performance than the others, especially for problems with nonuniform distribution sampling.(3)From the Kriging process we know that Kriging can be regarded as the special case of the SVR with zero empirical risk. By setting the insensitive loss function of SVR to zero, the SVR optimization problem with regularization parameters and kernel parameters is transformed to a simple optimization problem with independent kernel parameters and a certain problem to evaluate coefficients. It reduces the design dimension and simplifies the implement process.
The distribution of samples may become more and more complicated. This paper also studies the fitting precision of Kriging model with uniform sampling and nonuniform sampling. The results show that Kriging with SRM principle has better fitting performance, but still with low precision for high dimension problems. Further research may perform full analysis of surrogate model with different nonuniform distribution sampling to find more suitable selection mechanism of the kernel function and efficient kernel parameter optimization method.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This research was supported by funding from the National Natural Science Foundation of China (no. 51505385), the Shanghai Aerospace Science Technology Innovation Foundation (no. SAST2015010), and the Defense Basic Research Program (no. JCKY2016204B102 and no. JCKY2016208C001). The authors are also thankful to Shaanxi Aerospace Flight Vehicle Design Key Laboratory of NPU.
References
 M. Ma, C. Wang, J. ZHang, and Z. Huang, “Multidisciplinary Design Optimization for Complex Product Review,” Chinese Journal of Mechanical Engineering, vol. 44, no. 06, pp. 15–26, 2008. View at: Publisher Site  Google Scholar
 O. de Weck, J. Agte, J. SobieszczanskiSobieski, P. Arendsen, A. Morris, and M. Spieck, “StateoftheArt and Future Trends in Multidisciplinary Design Optimization,” in Proceedings of the 48th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, Honolulu, Hawaii, USA, 2007. View at: Publisher Site  Google Scholar
 D. G. Ullman, “Toward the ideal mechanical engineering design support system,” Research in Engineering Design, vol. 13, no. 2, pp. 55–64, 2002. View at: Publisher Site  Google Scholar
 V. N. Vapnik, Statistical Learning Theory, Wiley, 1998.
 V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, NY, USA, 1995. View at: Publisher Site  MathSciNet
 W. C. M. Van Beers, “Kriging metamodeling in discreteevent simulation: An overview,” in Proceedings of the 2005 Winter Simulation Conference, pp. 202–208, Orlando, Fla, USA, December 2005. View at: Publisher Site  Google Scholar
 T. W. Simpson, T. M. Mauery, J. J. Korte, and F. Mistree, “Kriging models for global approximation in simulationbased multidisciplinary design optimization,” AIAA Journal, vol. 39, no. 12, pp. 2233–2241, 2001. View at: Publisher Site  Google Scholar
 N. Cressie, “Spatial prediction and ordinary kriging,” Mathematical Geology, vol. 21, no. 4, pp. 493494, 1989. View at: Publisher Site  Google Scholar
 J. P. Kleijnen, “Kriging metamodeling in simulation: a review,” European Journal of Operational Research, vol. 192, no. 3, pp. 707–716, 2009. View at: Publisher Site  Google Scholar  MathSciNet
 S. Jeong, M. Murayama, and K. Yamamoto, “Efficient optimization design method using kriging model,” Journal of Aircraft, vol. 42, no. 2, pp. 413–420, 2005. View at: Publisher Site  Google Scholar
 J. P. Kleijnen, “Simulationoptimization via Kriging and bootstrapping: a survey,” Journal of Simulation, vol. 8, no. 4, pp. 241–250, 2014. View at: Publisher Site  Google Scholar
 Z. Chen, H. Qiu, L. Gao, X. Li, and P. Li, “A local adaptive sampling method for reliabilitybased design optimization using Kriging model,” Structural and Multidisciplinary Optimization, vol. 49, no. 3, pp. 401–416, 2014. View at: Publisher Site  Google Scholar
 S. Ulaganathan, I. Couckuyt, T. Dhaene, J. Degroote, and E. Laermans, “Performance study of gradientenhanced Kriging,” Engineering with Computers, vol. 32, no. 1, pp. 15–34, 2016. View at: Publisher Site  Google Scholar
 P. Goovaerts, “Ordinary Cokriging Revisited,” Mathematical Geology, vol. 30, no. 1, pp. 21–40, 1998. View at: Publisher Site  Google Scholar
 Z.H. Han and S. Görtz, “Hierarchical kriging model for variablefidelity surrogate modeling,” AIAA Journal, vol. 50, no. 9, pp. 1885–1896, 2012. View at: Publisher Site  Google Scholar
 J. Xiao, L. Yu, and Y. F. Bai, “Survey of the Selection of Kernels and Hyperparameters in Support Vector Regression,” Journal of Southwest Jiaotong University, vol. 43, no. 3, pp. 297–303, 2008. View at: Google Scholar
 X. Liu, S. Zhou, Z. Xiong, L. Chen, and C. Yan, “Research on Identifying TS Fuzzy Model Based on Structural Risk Minimization,” Journal of Guizhou University, 2016. View at: Google Scholar
 D. Berestin, M. Zimin, T. Gavrilenko, and N. Chernikov, “Determination of an object as chaotic system on the basis of structural risk minimization,” Complexity. Mind. Postnonclassic, vol. 3, no. 4, pp. 73–87, 2014. View at: Publisher Site  Google Scholar
 X. Zhu, W. Luo, Y. Wei, and X. Chen, “Structural Risk Minimization Based Radial Basis Function Surrogate Mode,” Journal of Projectiles, Rockets, Missile and Guidance, vol. 31, no. 5, pp. 169–173, 2011. View at: Google Scholar
 C. Wei, L. I. Qian, and T. Wei, “Radial Basis Function Networks Based on Structural Risk Minimization Principle,” Control and Instruments in Chemical Industry, vol. 36, no. 3, pp. 34–37, 2009. View at: Google Scholar
 I. J. F. Alexander, S. Andras, and J. K. Andy, Engineering Design via Surrogate Modelling, Wiley, 2008.
 X. F. Zhu, Surrogate Model Theory and Application in the Aircraft MDO, National University of Defense Technology, 2010.
 S. N. Lophaven, H. B. Nielsen, and J. Søndergaard, “DACE A MATLAB Kriging Toolbox,” Tech. Rep., Technical University of Denmark, 2002. View at: Google Scholar
 Z. C. Xiong, Some Problems in Approximation Theory with Radial Basis Function, Fudan University, 2007.
 B. Glaz, Active/passive optimization of helicopter rotor blades for improved vibration, noise, and performance characteristics, University of Michigan, 2008.
 D. Zhang, Z. Gao, J. Li, and L. Huang, “Study of Metamodel Sampling Criterion,” Acta Aerodynamica Sinica, vol. 29, no. 6, pp. 719–725, 2011. View at: Google Scholar
Copyright
Copyright © 2017 Hua Su et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.