Table of Contents Author Guidelines Submit a Manuscript
Journal of Applied Mathematics
Volume 2012, Article ID 949654, 12 pages
http://dx.doi.org/10.1155/2012/949654
Research Article

Improving the Solution of Least Squares Support Vector Machines with Application to a Blast Furnace System

College of Science, China University of Petroleum, Qingdao 266580, China

Received 4 May 2012; Revised 23 August 2012; Accepted 20 September 2012

Academic Editor: Chuanhou Gao

Copyright © 2012 Ling Jian et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The solution of least squares support vector machines (LS-SVMs) is characterized by a specific linear system, that is, a saddle point system. Approaches for its numerical solutions such as conjugate methods Sykens and Vandewalle (1999) and null space methods Chu et al. (2005) have been proposed. To speed up the solution of LS-SVM, this paper employs the minimal residual (MINRES) method to solve the above saddle point system directly. Theoretical analysis indicates that the MINRES method is more efficient than the conjugate gradient method and the null space method for solving the saddle point system. Experiments on benchmark data sets show that compared with mainstream algorithms for LS-SVM, the proposed approach significantly reduces the training time and keeps comparable accuracy. To heel, the LS-SVM based on MINRES method is used to track a practical problem originated from blast furnace iron-making process: changing trend prediction of silicon content in hot metal. The MINRES method-based LS-SVM can effectively perform feature reduction and model selection simultaneously, so it is a practical tool for the silicon trend prediction task.

1. Introduction

As one kernel method, SVM works by embedding the input data into a Hilbert space by a high-dimensional mapping , and then trying to find a linear relation among the high-dimensional embedded data points [1, 2]. This process is implicitly performed by specifying a kernel function which satisfies , that is, the inner product of the embedded points. Given observed samples with size , SVM formulates the learning problem as a variational problem of finding a decision function that minimizes the regularized risk functional [3, 4] where is called a loss function, is the so-called regularization parameter to trade off the empirical risk with the complexity of , that is, , the norm in a reproducing kernel Hilbert space . By the representer theorem [3, 5], the optimal decision function satisfying (1.1) has the form where for , . This equation can be easily used to tackle a practical problem if the kernel function is specified. To overcome the high computational complexity of traditional SVM, an interesting variant of the standard SVM, least squares support vector machines, has been proposed by Suykens and Vandewalle [6]. In the case of LS-SVM, the inequality constraints in soft margin SVM are converted into equality constraints. The model training process of LS-SVM is performed by solving a specific linear equations, that is, a saddle point system which can be efficiently solved by iterative methods instead of a quadratic programming problem. Besides computational superiority extensive empirical studies have shown that LS-SVM is comparable to SVM in terms of generalization performance [7]; these features make LS-SVM an attractive algorithm and also a successful alternative to SVM. For the training of the LS-SVM, Van Gestel et al. [7] proposed to reformulate the order saddle point system into two order symmetric positive definite systems which can be solved in turn by the conjugate gradient (CG) algorithm. To speed up the training of LS-SVM, Chu et al. [8] employed the null space method to transform the saddle point system into a reduced order symmetric positive definite system which was solved with the CG algorithm also. The minimal residual (MINRES) method proposed by Paige and Saunders is a specialized method for solving a nonsingular symmetric system [9]. This method can avoid the LU factorization and does not suffer from break-down, so it is an efficient numerical method for solving symmetric but indefinite systems. The Karush-Kuhn-Tucker system of LS-SVM is a specified linear system, that is, a saddle point system. Considering the above point, to speed up the solution of LS-SVM model we employ the MINRES method to solve the linear system directly. The main contribution of this paper is to provide a potential alternative to the solution of LS-SVM model. Theoretical analysis of the three numerical algorithms for the solution of LS-SVM model indicates that the MINRES method is the optimal choice. Experiments on benchmark data sets show that compared with the CG method proposed by Suykens et al. and the null space method proposed by Chu et al., the MINRES solver significantly improves the computational efficiency and at the same time keeps almost the same generalized performance with the above two methods. To heel, the MINRES method-based LS-SVM model is constructed and further employed to identify blast furnace (BF) iron-making process, a complex nonlinear system. Practical application to a typical real BF indicates that the established MINRES method-based LS-SVM model is a good candidate to predict the changing trend of the silicon content in BF hot metal with low time cost. The possible application of this work is to aid the BF operators to judge the inner state of BF getting hot or chilling in time properly, which can provide a guide for them to determine the direction of controlling BF in advance. The rest of this paper is organized as follows. In Section 2, we give a review for LS-SVM. Section 3 presents three numerical solutions for LS-SVM. It is followed by extensive experimental validations of the proposed method in Section 4. Section 5 concludes the paper and points out the possible future research.

2. Formulation of LS-SVM

The primal problem of LS-SVM can be formulated following unified format: for both regression analysis and pattern classification. In (2.1) is the total number of training samples, is the th input vector, is the th output value/label for regression/classification problem, is the th error variable, is the regularization parameter, and is the bias term. The Lagrangian of (2.1) is given below: where is the th Lagrange multiplier. For the convex program (2.1), it is obvious that the Slater constraint qualification holds. Therefore, the optimal solution of (2.1) satisfies its Karush-Kuhn-Tucker system After eliminating variables and the Karush-Kuhn-Tucker system (2.3) can be reformulated following saddle point system [10]: where , stands for unit matrix, denotes an -dimensional vector of all ones, and .

3. Solution of LS-SVM

In this section, we give a brief review and some analysis of the three mentioned numerical algorithms for solution of LS-SVM.

3.1. Conjugate Gradient Methods

The kernel matrix is a symmetric positive semidefinite matrix and the diagonal term is positive, so the matrix is symmetric and positive definite. Through the following matrix transformation where the saddle point system (2.4) can be factorized into a positive definite system [11]

Suykens et al. suggested the use of the CG method for the solution of (3.3) and proposed to solve two order positive definite systems. More exactly, their algorithm can be described as follows.

Step 1. Employ the CG algorithm to solve the linear equations and get the intermediate variable .

Step 2. Solve the intermediate variable from by the CG method.

Step 3. Obtain Lagrange dual variables and bias term .
The output of any new data can subsequently be deduced by computing the decision function .

3.2. Null Space Methods

In what was mentioned previously, to get the intermediate variable and two order positive definite systems need to be solved by CG methods. Chu et al. [8] proposed an interesting method to the numerical solution of LS-SVM by solving one order reduced system of linear equations. The improved method suggested by Chu et al. can be seen as one kind of null space method. The saddle point system (2.4) can be written as Chu et al. specified a particular solution of as and the null space of as Through solving the following reduced system of order for the auxiliary unknown , the solution of the saddle point system (2.4) can be obtained as and .

3.3. Minimal Residual Methods

The vector sequences in the CG method correspond to a factorization of a tridiagonal matrix similar to the coefficient matrix. Therefore, a breakdown of the algorithm can occur corresponding to a zero pivot if the matrix is indefinite. Furthermore, for indefinite matrices the minimization property of the CG method is no longer well defined. The MINRES method proposed by Paige and Saunders [9] is a variant of the CG method that avoids the LU factorization and does not suffer from breakdown. It minimizes the residual in the -norm which is an efficient numerical algorithm for solving symmetric but indefinite systems; the corresponding convergence behavior of the MINRES method for indefinite systems has been analyzed by Van der Vorst [12]. The purpose of this paper is to employ the MINRES method to solve the saddle point system (2.4) directly. Next we gave a brief review of the MINRES algorithm. Let be an initial guess for the solution of the symmetric indefinite linear system . One can obtain the iterative sequence ,   such that where is the th residual for , and is the th Krylov subspace. Lanczos methods can be used to generate an orthonormal basis of , and then only two basis vectors are needed to compute ; see, for example, [12]. The detailed implementation of the MINRES algorithm can be found in [12].

It has been shown that rounding errors are propagated to the approximate solution with a factor proportional to the square of the condition number of coefficient matrix [12]; one should be careful with the MINRES method for ill-conditioned systems.

3.4. Some Analysis on These Three Numerical Algorithms

The properties of short recurrences and optimization [12] make the CG method the first choice for the solution of a symmetric positive definite system. Suykens et al. transformed the order saddle point system (2.4) into two order positive definite systems which are solved by CG methods. However, it is time consuming to solve two order positive definite systems with large scales. To overcome this shortcoming, Chu et al. [8] transformed equivalently the original order system into an order symmetric positive definite system, and then the CG method can be used. This method can be seen as a null space method. Unfortunately, the transformation may destroy heavily the sparse structure and increase greatly the condition number of the original system. This can hugely slow down the convergence rate of the CG algorithm. Theoretical analysis about the influence of the transformation on the condition number is indispensable, but it is rather difficult. We leave it as an open problem. In this paper, the MINRES method is directly applied to solve the original saddle point problem of order. Similar to the CG method, the MINRES method also has properties of short recurrences and optimization.

In light of the analysis mentioned above, the MINRES method should be the first choice for the solution of LS-SVM model, since it avoids solving two linear systems and destroying the sparse structure of the original saddle point system simultaneously.

4. Numerical Implementations

4.1. Experiments on Benchmark Data Sets

In this section we give the experimental test results on the accuracy and efficiency of our method. For comparison purpose, we implement the CG method proposed by Suykens and Vandewalle [6] and the null space method suggested by Chu et al. [8]. All experiments are implemented with MATLAB version 7.8 programming environment running on an IBM compatible PC under Window XP operating system, which is configured with Intel Core 2.1 Ghz CPU and 2 G RAM. The generalized used Gaussian RBF kernel is selected as the kernel function. We use the default setting for kernel width , that is, set kernel width as the dimension of inputs.

We first compare three algorithms on three benchmark data sets: Boston, Concrete, and Abalone, which are download from UCI [13]. Each data set is randomly partitioned into 70% training and 30% test sets. We also list the condition numbers of coefficients matrices solved by three methods for the analysis of the computing efficiencies. As shown in Tables 13 the condition number for the CG method is the least one and the condition number for the null space method significantly increases.

tab1
Table 1: Experimental results of three methods on Boston data set.
tab2
Table 2: Experimental results of three methods on Concrete data set.
tab3
Table 3: Experimental results of three methods on Abalone data set.

The columns of Cond in Tables 1, 2, and 3 show that compared with the CG method the condition number for the MINRES method increases a bit, but much less than the condition number of the null space method. The orders of linear equations solved by the CG method, the null space method, and the MINRES method are , , and , respectively. The condition numbers for the CG method and the MINRES method are very close, but we have to solve two systems of order using CG methods. Hence, the running time of the MINRES method should be less than that of the CG method. CPU column in Tables 13 shows that the MINRES method-based LS-SVM model costs much less running time than the CG method and the null space method-based LS-SVM model in all cases of setting C. So the MINRES method-based LS-SVM model is a preferable algorithm for solving LS-SVM model. In the next subsection, we will employ the MINRES method-based LS-SVM model to solve a practical problem.

4.2. Application on Blast Furnace System

Blast furnace, one kind of metallurgical reactor used for producing pig iron, is often called hot metal. The chemical reactions and heat transport phenomena take place throughout the furnace as the solid materials move downwards and hot combustion gases flow upwards. The main principle involved in the BF iron-making process is the thermochemical reduction of iron oxide ore by carbon monoxide. During the iron-making period, a great deal of heat energy is produced which can heat up the BF temperature approaching 2000°C. The end products consisting of slag and hot metal sink to the bottom and are tapped periodically for the subsequent refining. It will take about 6–8 h for a cycle of iron-making [11]. BF iron-making process is a highly complex nonlinear process with the characteristics of high temperature, high pressure, concurrence of transport phenomena, and chemical reactions. The complexity of the BF and the occurrence of a variety of process disturbances have been obstacles for the adoption of modeling and control in the process. Generally speaking, to control a BF system often means to control the hot metal temperature and components, such as silicon content, sulfur content in hot metal, and carbon content in hot metal within acceptable bounds. Among these indicators, the silicon content often acts as a chief indicator to represent the thermal state of the BF, an increasing silicon content meaning a heating of the BF while a decreasing silicon content indicating a cooling of the BF [11, 14]. Thus, the silicon content is a reliable measure of the thermal state of the BF, and it becomes a key stage to predict the silicon content for regulating the thermal state of the BF. Therefore, it has been the active research issue to build silicon prediction model in the recent decades, including numerical prediction models [15] and trend prediction models [11].

In this subsection, the tendency prediction of silicon content in hot metal is transformed as a binary classification problem. Samples with increasing silicon content are denoted by +1 whereas a decreasing silicon content is denoted by −1. In the present work, the experimental data is collected from a medium-sized BF with the inner volume of about 2500 m3. The variables closely related to the silicon content are measured as the candidate inputs for modeling. Table 4 presents the variables information from the studied BF. There are totally 801 data points collected with the first 601 points as train set and the residual 200 points as testing set. The sampling interval is about 1.5 h for the current BF. Figure 1 illustrates the evolution of the silicon content in hot metal.

tab4
Table 4: A list of input variables.
949654.fig.001
Figure 1: Evolution of silicon content in hot metal.

There are in total 15 candidate variables listed in Table 4 from which to select model inputs. Generally, too many input parameters will increase the complexity of model while too little inputs will reduce the accuracy of model. A tradeoff has to be taken between the model complexity and accuracy when selecting the inputs. Therefore, it is necessary to screen out less important variables as inputs from these 15 candidate variables. Here, the inputs are screened out by an integrative way that combines -score method [16] for variables ranking and cross-validation method for variables and model parameters selection.

-score is an effective tool for feature selection in data mining and can give feature ranking by evaluating the discrimination of two sets with real values. For those 15 candidate variables in Table 4, their -scores are defined as follows: where , and stand for the mean of the th attribute of the whole training, positive and negative examples, respectively, while and are the th variable of the th positive and negative instance, respectively. Hence, a variable ranking can be achieved through -score method. Table 4 gives the results of -scores of all 15 variables, which are ranked according to the -score values. As one kernel-based learning model, the kernel parameter , and regularized parameter play an important role in LS-SVM, so one should pay attention to selecting proper parameters. Grid search-based ten-fold cross-validation is executed on the train set for searching the optimal (,C). The searching grid for model parameters is set as

Mean accuracy in Table 4 stands for the average accuracy under ten-fold cross-validation experiments of LS-SVM model on some grid points with the best performance. In the current work, we first select the variable with highest -score as model input and then add variables one by one according to their -scores. Mean accuracy under all kinds of input variables can be achieved and the results are shown in Table 4. The following are shown by the mean accuracy column: at the beginning, the mean accuracy increases gradually as more candidate variables are taken as model inputs; the largest mean accuracy appears when CO2 is included within the input set; when the mean accuracy is beyond the maximum, it will fluctuate as the residual variables are added by turns into the input set. These results indicate that, as the studied BF is concerned, the optimal input set is [Si, S, BI, FS, BV, CO2] with the model parameters setting . Table 5 lists the LS-SVM model accuracy including with/without feature and model selection versions on testing set. In the case of without feature and model selection version, all candidate variables are selected as inputs, and we use the default setting for LS-SVM model; that is, set kernel width equal to the dimension of input variable and set regularized parameter as 1. The information in the second row of this table, such as 34/42, denotes that there are 42 times predicted results that are ascending trend, and 34 times predictions are successful. The confidence level of the LS-SVM model without model and feature selection fluctuates severely between the ascending and descending prediction from 80.95% to 58.86%. The difference of confidence levels of LS-SVM model with model and feature selection between ascending and descending prediction is reduced to 2.19% indicating that model and feature selection procedure enhances the stability of the LS-SVM model obviously. As the last column of Table 5 shows, TSA of LS-SVM model with feature and model selection procedure is significantly improved compared with LS-SVM model without feature and model selection, so the selection procedure is indispensable for the current practical application. Table 6 lists the running time of three mentioned numerical algorithms when performing feature and model selection procedure. The cost time of the MINRES method is reduced significantly compared with the other algorithms. In a word, the feature and model selection procedure can be effectively performed for the MINRES method-based LS-SVM, and it is meaningful for practical using.

tab5
Table 5: Predictive results of LS-SVM model with/without feature and model selection.
tab6
Table 6: Running time of three numerical methods on model identification.

5. Conclusions and Points of Possible Future Research

In this paper, we have proposed an alternative, that is, the MINRES method, to the solution of LS-SVM model which is formulated as a saddle point system. Numerical experiments on UCI benchmark data sets show that the proposed numerical solution method of LS-SVM model is more efficient than the algorithms proposed by Suykens and Vandewalle [6] and Chu et al. [8]. To heel, the MINRES method-based LS-SVM model including feature selection from extensive candidate and model parameter selection is proposed and employed for the silicon content trend prediction task. The practical application to a typical real BF indicates that the proposed MINRES method-based LS-SVM model is a good candidate to predict the trend of silicon content in BF hot metal with low running time.

However, it should be pointed out that despite the MINRES method-based LS-SVM model displaying low running time, lack of metallurgical information may be the root to the limited accuracy of the current prediction model. So there is much work worth investigating in the future to further improve the model accuracy and increase the model transparency, such as constructing predictive model by integrating domain knowledge and extracting rules. The extracted rules can account for the output results with detailed and definite inputs information, which may further serve for the control purpose by linking the output results with controlled variables. These investigations are deemed to be helpful to further improve the efficiency of predictive model.

Acknowledgment

This work was partially supported by National Natural Science Foundation of China under Grant no. 11126084, Natural Science Foundation of Shandong Province under Grant no. ZR2011AQ003, Fundamental Research Funds for the Central Universities under Grant no. 12CX04082A, and Public Benefit Technologies R&D Program of Science and Technology Department of Zhejiang Province under Grant No. 2011C31G2010136.

References

  1. V. N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, NY, USA, 2nd edition, 2000. View at Zentralblatt MATH
  2. B. Schölkopf and A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, Mass, USA, 2002.
  3. T. Evgeniou, M. Pontil, and T. Poggio, “Regularization networks and support vector machines,” Advances in Computational Mathematics, vol. 13, no. 1, pp. 1–50, 2000. View at Publisher · View at Google Scholar · View at Zentralblatt MATH
  4. N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines, Cambridge University, Cambridge, UK, 2000.
  5. C. M. Bishop, Pattern Recognition and Machine Learning, vol. 4, Springer, New York, NY, USA, 2006. View at Publisher · View at Google Scholar
  6. J. A. K. Suykens and J. Vandewalle, “Least squares support vector machine classifiers,” Neural Processing Letters, vol. 9, no. 3, pp. 293–300, 1999. View at Google Scholar · View at Scopus
  7. T. Van Gestel, J. A. K. Suykens, B. Baesens et al., “Benchmarking least squares support vector machine classifiers,” Machine Learning, vol. 54, no. 1, pp. 5–32, 2004. View at Publisher · View at Google Scholar · View at Scopus
  8. W. Chu, C. J. Ong, and S. S. Keerthi, “An improved conjugate gradient scheme to the solution of least squares SVM,” IEEE Transactions on Neural Networks, vol. 16, no. 2, pp. 498–501, 2005. View at Publisher · View at Google Scholar · View at Scopus
  9. C. C. Paige and M. A. Saunders, “Solutions of sparse indefinite systems of linear equations,” SIAM Journal on Numerical Analysis, vol. 12, no. 4, pp. 617–629, 1975. View at Publisher · View at Google Scholar
  10. L. Jian, C. Gao, L. Li, and J. Zeng, “Application of least squares support vector machines to predict the silicon content in blast furnace hot metal,” ISIJ International, vol. 48, no. 11, pp. 1659–1661, 2008. View at Publisher · View at Google Scholar · View at Scopus
  11. C. Gao, L. Jian, and S. Luo, “Modeling of the thermal state change of blast furnace hearth with support vector machines,” IEEE Transactions on Industrial Electronics, vol. 59, no. 2, pp. 1134–1145, 2012. View at Publisher · View at Google Scholar
  12. H. A. van der Vorst, Iterative Krylov Methods for Large Linear Systems, vol. 13, Cambridge University Press, Cambridge, UK, 2003. View at Publisher · View at Google Scholar
  13. C. Blake and C. Merz, “Uci repository of machine learning databases,” 1998.
  14. C. Gao, J. L. Chen, J. Zeng, X. Liu, and Y. Sun, “A chaos-based iterated multistep predictor for blast furnace ironmaking process,” AIChE Journal, vol. 55, no. 4, pp. 947–962, 2009. View at Publisher · View at Google Scholar · View at Scopus
  15. L. Jian, C. Gao, and Z. Xia, “A sliding-window smooth support vector regression model for nonlinear blast furnace system,” Steel Research International, vol. 82, no. 3, pp. 169–179, 2011. View at Publisher · View at Google Scholar · View at Scopus
  16. Y. W. Chen and C. J. Lin, “Combining SVMs with various feature selection strategies,” Studies in Fuzziness and Soft Computing, vol. 207, pp. 315–324, 2006. View at Google Scholar · View at Scopus