Abstract

Spheroid disturbance of input data brings great challenges to support vector regression; thus it is essential to study the robust regression model. This paper is dedicated to establish a robust regression model which makes the regression function robust against disturbance of data and system parameter. Firstly, two theorems have been given to show that the robust linear ε-support vector regression problem could be settled by solving the dual problems. Secondly, it has been focused on the development of robust support vector regression algorithm which is extended from linear domain to nonlinear domain. Finally, the numerical experiments result demonstrates the effectiveness of the models and algorithms proposed in this paper.

1. Introduction

As a new machine learning algorithm, support vector machine (SVM) is based on the statistical learning theory which was proposed by Vapnik [1] and Wolfe dual programming theory. Compared with other learning algorithms, SVM possesses a rigorous foundation of mathematical theory, well-generalized ability, and global optimum, with the result that is widely used in pattern recognition and functional regression.

In general, the input data is assumed to be accurate. However, in practice, the statistical errors and measuring errors of observed data are inevitable; that is, the data is disturbed. The effect of perturbation of input data on solving the regression problem is often neglected. Although SVR is robust, while the parameters are optimal, these parameters are usually reasonably sensitive to noise data and singular value. The robust optimization method, which focused on treatability of computation in the case of data points disturbing in convex sets, was first proposed by Soyster [2] and developed, respectively, by Ben-Tal and Nemirovski [3, 4] and El-Ghaoui et al. [5, 6]. Since then, Melvyn [7] presented more comprehensive collection of disturbance without increasing the computing difficulty, which enabled the rapid development of the robust optimization method. The research aimed on solving the programming problems with polyhedron disturbance or spheroid disturbance. Bertsimas and Sim [8] and Alizadeh and Goldfarb [9] put forward the robust linear programming degenerated into a second-order cone programming, second-order cone programming into a semidefinite programming, and semi-definite programming into the NP problem. Zhixia et al. [10] applied this method to classification problem and proposed the robust support vector classification. This paper aims to study the regression problems of spheroid disturbance by means of the robust optimization method and to establish a robust regression model which makes the regression functions highly robust for perturbed data and system parameters.

2. Theory

2.1. Training Set

Assume that the training set is where

The input set can be viewed as a sphere of radius centering on the measured value , is closely related to measurement accuracy, and is sample size.

2.2. Original Problem

Referring to the original problem of linear ε-support vector regression, it is not difficult to derive the original problem of robust linear ε-support vector regression, which is a convex quadratic programming, by replacing the original input with the input collection :

Theorem 1. The necessary and sufficient conditions for the solution of the original optimization problem (3) are that , , , and are the solutions for the following second-order cone programming: where is the dimensionality of the input data .

Proof. Since
The original problem (3) can be converted to
If variable is introduced, satisfying , the above problem could prove to be a second-order cone programming:
When variables and are brought in, and are supposed to satisfy the linear constraint and the second-order cone constraint , and the nonlinear terms in the objective function are replaced with ; therefore, the problem (7) can be written as the problem (4). From the above, the problem (3) is equivalent to the problem (4).

2.3. Dual Problem

Introduce the Lagrange function: where are the multipliers vectors corresponding to the constraints. Eliminating variables , and , the following theorem can be obtained.

Theorem 2. The following second-order cone programming is the dual problem of the second-order cone programming (4).

Proof. The feasible region of the second-order cone programming problem (4) is The feasible region of the dual problem (9) is Assuming that the optimal value of the problem (4) is ,
Proceeding from the lower bound of the estimate of , the Lagrange function of the second-order cone programming problem (4) is introduced and written as : let thus
Formula (15) implies , is the lower bound of .
To get the optimal value, the maximal value of , is need to be obtained. So, the dual problem (9) is deduced.

2.4. Robust SVR Algorithm
2.4.1. Linear Robust SVR Algorithm

(1) The given training data should be written as the way of formulae (1)~(2).

(2) By constructing and solving the second-order cone programming problem (9), the solution can be obtained. Besides, it is necessary to select an appropriate penalty parameter .

(3) The solution of the original problem (4) is acquired in the following way:

Select a component of or a component of , which is located in the open interval (0, C); if is chosen, it is got as follows:

If is chosen, it is got as follows:

(4) Then regression function is established:

2.4.2. Nonlinear Robust SVR Algorithm

By making the following modifications to the algorithm shown in Section 2.4.1, the nonlinear robust SVR algorithm can be constructed:(1)Kernel function (common kernel functions include Gaussian kernel function, polynomial kernel function, Cauchy kernel function, and Laplace kernel function) is introduced; replace the inner products and appearing in the linear robust SVR algorithm shown in Section 2.4.1, with and ;(2)Substitute for appearing in the second-order cone programming (9).

To get , the mapping should be written as ; obviously, is equivalent to So .

3. Numerical Experiments

3.1. Linear Numerical Experiment

Example 1. Given that objective function is , the training set could be written as the way of formulae (1)~(2), where is 10 equidistant points distributed in the interval . can be regarded as a constant which equals 0.5. The disturbance of data is a random number which is located in the interval , where . The methods of standard SVR and robust SVR shown in Section 2.4.1 are employed to regress and fit the objective function. Then the errors of ten test points, which are randomly selected, are analysed and compared. The results are shown in Table 1.

3.2. Nonlinear Numerical Experiments

Example 2. Given that objective function is , the training set could be written in the way of formulae (1)~(2), where is 20 equidistant points distributed in the interval . can be regarded as a constant which equals 0.05. The disturbance of data is a random number which is located in the interval , where . The methods of standard SVR and robust SVR shown in Section 2.4.2 are employed to regress and fit the objective function. Then the errors of ten test points, which are randomly selected, are analysed and compared. The results are given in Table 2.

3.3. Discussion

Obviously, the system parameters are very robust. In addition, it is shown that the forecast errors of the robust method are lower than that of the standard SVM method in Table 1. A similar conclusion can also be presented in Table 2. In short, the robust SVR method has been validated to be of advantage in the field of regression analysis with perturbation problem.

4. Conclusions

(1)In the paper, a robust ε-SVR model has been proposed, considering the input data as the convex set of spheroid disturbance, and two important theorems are also given: it is proved by Theorem 1 that the convex quadratic programming problem can be written as the second-order cone programming. Based on duality theory, the dual problem can be deduced by Theorem 2. By solving the dual problem, the optimal solution of original problem is obtained.(2)By virtue of kernel function, the robust linear ε-SVR model could be promoted to the nonlinear domains. Finally, the numerical experiments demonstrate the robustness and effectiveness of the proposed algorithm.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The theorems and algorithms proposed in this paper are the extension of robust support vector classification method [10], which provides a great assistance for the authors. The authors are here to express their sincere gratitude.