Data Mining and Knowledge Discovery in Industrial EngineeringView this Special Issue
Research Article | Open Access
Piecewise-Smooth Support Vector Machine for Classification
Support vector machine (SVM) has been applied very successfully in a variety of classification systems. We attempt to solve the primal programming problems of SVM by converting them into smooth unconstrained minimization problems. In this paper, a new twice continuously differentiable piecewise-smooth function is proposed to approximate the plus function, and it issues a piecewise-smooth support vector machine (PWSSVM). The novel method can efficiently handle large-scale and high dimensional problems. The theoretical analysis demonstrates its advantages in efficiency and precision over other smooth functions. PWSSVM is solved using the fast Newton-Armijo algorithm. Experimental results are given to show the training speed and classification performance of our approach.
In the last several years, support vector machine (SVM) has become one of the most promising learning machines because of its high generalization performance and wide applicability for classification, forecasting, and estimation in small-sample cases [1–6]. In addition, SVMs have surpassed the performance of artificial neural networks in many areas such as text categorization, speech recognition, and bioinformatics [7–11].
Basically, the main idea behind SVM is the construction of an optimal hyper plane, which has been used widely in classification [5, 8–16]. It can be formulated into an unconstrained optimization problem [17–21], but the objective function is nonsmooth. To overcome this disadvantage, Lee and Mangasarian used the integral of the sigmoid function to get a smooth SVM(SSVM) model in 2001 . It is a very important and significant result to SVM since many famous algorithms can be used to solve it. In 2005, Yuan et al. proposed two polynomial functions, namely, the smooth quadratic polynomial function and the smooth forth polynomial function, and got QPSSVM and FPSSVM models [20, 21]. Xiong et al. derived an important recursive formula and a new class of smoothing functions using the technique of interpolation functions in . In 2007, Yuan et al. used a three-order spline function to smooth the objective function of unconstrained optimization problem of SVM and obtained TSSVM model . However, the efficiency or the precision of the algorithms was limited.
A natural problem is whether there is another smooth function to get a more efficient smooth SVM than existing works. In this paper, we introduce a piecewise function to smooth SVM and obtain a novel piecewise smooth support vector machine (PWSSVM). Theoretical analyses show that approximation accuracy of the piecewise smooth function to the plus function is higher than the available. Rough set theory is used to prove the global convergence of PWSSVM and the upper bound of convergence is proposed. The fast Newton-Armijo algorithm [22, 23] is employed to train the PWSSVM. Our new method is implemented in batches and can efficiently handle large-scale and high dimensional problems. Numerical experiments confirm the theoretical results and demonstrate that PWSSVM is more effective than the previous smooth support vector machine models.
The paper is organized as follows. In Section 2, we state the pattern classification and describe the PWSSVM. The approximation performance of smooth functions to the plus function is compared in Section 3. The convergence performance of PWSSVM is given in Section 4. The Newton-Armijo algorithm is applied to train PWSSVM in Section 5. Section 6 shows numerical comparisons. Finally, a brief conclusion of this paper is made.
In this paper, unless otherwise stated, all vectors are column vectors. For a vector in the -dimensional real space , the plus function is defined as , . The scalar (inner) product of two vectors in the -dimensional real space will be denoted by and the -norm will be denoted by . For a matrix , is the th row of which is a row vector in . A column vector of ones of dimension will be denoted by . If is a real valued function defined in the -dimensional real space , the gradient of is denoted by which is a row vector in and the Hessian matrix of at is denoted by .
2. Piecewise-Smooth Support Vector Machine
In this paper, let us consider a binary classification problem with training samples in the -dimensional real space . It is represented by the matrix , according to membership of each point in the class 1 or −1 as specified by a given diagonal matrix with 1 or −1 along its diagonal. For this problem, the standard SVM with a linear kernel is given by the following quadratic program with parameter where is a vector of ones, is the normal to the bounding plane, and is the distance of the bounding plane to the origin. The linear separating plane is defined as follows: The first term in the objective function of (1) is the 1-norm of the slack variable with weight . Replace the first term with the 2-norm vector . Add to the objective function which induces strong convexity but has little or no effect on the problem. SVM model is replaced by the following problem: Let , where replaces negative components of a vector by zeros, then we can convert the SVM problem (3) into the following unconstrained optimization problem: This is a strongly convex minimization problem and it has a unique solution. Let . The function is a continuous but nonsmooth function. Therefore, many optimization algorithms based on derivative and gradient cannot solve the problem (4) directly.
In 2001, Lee and Mangasarian  employed the integral of the sigmoid function to approximate the nondifferentiable function as follows: where is the base of natural logarithm and is a smoothing parameter. They got the SSVM model.
In 2005, Yuan et al.  presented two polynomial functions as follows: Using the above smooth functions to proximate plus function , they got two smooth polynomial support vector machine models (FPSSVM and QPSSVM). The authors also showed that FPSSVM and QPSSVM were more effective than SSVM in .
In 2007, Yuan et al.  presented a three-order spline function as follows: They used the smooth function to approach the plus function and got a new smooth SVM model TSSVM. However, the efficiency or the precision of these algorithms above was limited.
In this paper, we propose a novel smooth function with smoothing parameter to approximate to the function as follows: The first- and second-order derivatives of are The solution of the problem (3) can be obtained by solving the following smooth unconstrained optimization problem with the smoothing parameter approaching infinity as Thus, we develop a new smooth approximation for problem (3).
3. Approximation Performance Analysis of Smooth Functions
In this section, we will compare the approximation performance of smooth functions to plus function.
Theorem 4. The piecewise approximation function defined in (8) has the following properties:(1) is twice rank smooth about ;(2)for any , ,(3)for any , then .
Proof. (1) According to the formulas (8) and (9), one can easily obtain the results in (1).
(2) In the following, we verify the fact . (i) The equation holds while . (ii) Since is a monotone increasing function, we have the following result while . (iii) For , we have . Hence, we have the conclusion .
(3) For , , the inequality in conclusion (3) is satisfied naturally.
For , since , . Because is positive value, continuous, and increasing function for , we have .
For , let In order to obtain the result, making the variable substitution (obviously ), then we have . For , the maximum point of is and .
In conclusion, we have .
Theorem 5. Let , and . Consider the following.(1)If the smooth function is defined as (5), then by Lemma 1, we have
(2)If the smooth functions are defined as (6), by Lemma 2,
(3)If the smooth function is defined as (7), by Lemma 3,
(4)If the smooth function is defined as (8), by Theorem 4,
Theorem 5 shows that the proposed piecewise smooth function achieves the best degree of approximation to the plus function . When is fixed, it is easy to obtain the different smooth capability of the above smooth functions. The smooth performance comparison is given in Figure 1, where we set the smooth parameter and .
4. Convergence Performance of PWSSVM
In this section, the convergence of PWSSVM will be presented. By using rough set theory, we prove that the solution of PWSSVM can closely approximate the optimal solution of the original model (4) when goes to infinity. Furthermore, a formula for computing the upper bound of convergence is deduced.
Theorem 6. Let and . Define the real-valued functions in the -dimensional real space as follows: where is defined in (8), . Then we have the following results:(1) and are strongly convex functions; (2)there exists a unique solution to , and a unique solution to ;(3)for any , and satisfy the following condition: (4).
Proof. (1) For any , and are strongly convex functions because is strong convex function.
(2) Let be the level set of and let be the level set of . Since , it is easy to obtain . Therefore, and are compact subsets in . Using the strong convexity property of and for , there is a unique solution to and , respectively.
(3) By using the first order optimization condition and considering convex property of and , we have Add the two formulas above and notice that , and then we have According to the third result of Theorem 4, we obtain the conclusion .
(4) According to , we have .
5. The Newton-Armijo Algorithm for PWSSVM
Following the results of the previous section, one can obtain the twice continuous differentiability of the objective function of problem (10). In order to take advantage of this feature, we use the Newton-Armijo method to train PWSSVM since it is a faster method than the BFGS algorithm [18, 19, 21]. The Newton-Armijo algorithm for problem (10) works as follows.
5.1. Newton-Armijo Algorithm
Step 1 (initialization). Start with any , and set .
Step 2. Compute and .
Step 3. If , then stop and accept . Otherwise, compute Newton direction from the following linear system: where “” denotes transpose symbol.
Step 4 (Armijo stepsize). Choose a stepsize such that where and set
Step 5. Replace by and go to Step 2.
We need to only solve a linear system of (20) instead of a quadratic program in our smooth approach. Because the objective function is strong convex, it is not difficult to obtain that our Newton-Armijo algorithm for training PWSSVM converges globally to the unique solution [17, 23]. Hence, the start point is not important. In this paper, we always set , where denotes a column vector of ones of dimension.
PWSSVM described above can solve the linear classification problems. In fact, we can extend some of the results in Section 2 to nonlinear PWSSVM with kernel technique as . Furthermore, The Newton-Armijo algorithm can also solve nonlinear PWSSVM successfully.
6. Numerical Experiments
Newton-Armijo cannot be applied to QPSSVM model due to lack of the second-order derivative. In fact, the classification capacity of FPSSVM is slightly better than QPSSVM [18–21]. In our experiment, we do not compare QPSSVM with the other smooth SVM method. To demonstrate the effectiveness and speed of PWSSVM, we compare the performance numerically among SSVM, FPSSVM, TSSVM, and PWSSVM. The four smooth SVMs are all trained by the fast Newton-Armijo algorithm. All experiments are run on Personal Computer with 3.0 GHz and a maximum of 1.99 GB of the memory available. The programs of PWSVM, FPSSVM, and TSSVM are written in the MATLAB language. This computer runs Win7 with MATLAB 7.0.1. The source code of SSVM, ‘‘ssvm.m,’’ is obtained from the author’s website for the linear problem , and ‘‘lsvmk.m’’ for the nonlinear problem. In our experiments, all of the input data and the variables needed in programs are kept in the memory. For SSVM, TSSVM, FPSSVM, and PWSSVM, an optimality tolerance of is used to determine when to terminate. Gaussian kernel is used in all our experiments.
The first experiment is used to demonstrate the capability of PWSSVM in solving larger problems. The results in Table 1 are designed to compare the training correctness, the testing correctness, and the training time among the four smooth SVMs on a massively sized dataset. The datasets are created using Musicants NDC Data Generator  with different sizes. The test samples are 5% of the training samples. The experiment results show that PWSSVM has the highest training accuracy and testing accuracy. Furthermore, PWSSVM can be used to solve problems more quickly than the other three smooth SVMs when the number of the sample data is relative small.
The second experiment is designed to demonstrate the effectiveness of PWSSVM through the “tried and true” checkerboard dataset . One highly nonlinearly separable but simple example is the checkerboard dataset which has often been used to show the effectiveness of nonlinear kernel methods . The checkerboard dataset is generated by uniformly discretizing the regions to points and labeling two classes “White” and “Black” spaced by grid as Figure 2 shows.
In the first trial of this experiment, the training set contains 1000 points randomly sampled from the checkerboard (for comparison, they are obtained from ) which contain 514 “white” samples and 486 “black” samples and the rest 39,000 points are in the testing set. Gaussian kernel function is used and . Total time for the 1000-point training set using PWSSVM with a Gaussian kernel is 4.01 s. The train accuracy of PWSSVM is 99.80%. The test accuracy of PWSSVM is 98.76% on a 39,000-point test set. TSSVM solves the same problem within 4.23 s, and the train accuracy and the test accuracy are 99.62% and 98.28%. FPSSVM and SSVM obtain the train accuracy of 99.60% within 4.35 s and 4.61 s, respectively. The test accuracy of them are 98.28%.
The rest results are presented in Table 2. The training set is randomly selected from the checkerboard with different sizes. The remaining samples are used as test samples. We compare the classification results of PWSSVM, TSSVM, FPSSVM, and SSVM with the same Gaussian kernel function. The results in Table 2 demonstrate that PWSSVM can solve massive problems quickly, followed by TSSVM, FPSVM, and SSVM in turn. The experimental results show that PWSSVM can obtain the highest train precision and test precision.
A novel PWSSVM is proposed in this paper. It only needs to find the unique minima of the unconstrained differentiable convex quadratic function. The proposed method has many advantages over those available, such as good classification performance and less training time cost. The numerical results show that PWSSVM has excellent generalization ability.
The authors would like to thank the anonymous reviewers for their valuable comments. This work was supported in part by the National Natural Science Foundation of China under Grants 61100165, 61100231, and 51205309 and the Natural Science Foundation of Shaanxi Province (2010JQ8004, 2012JQ8044).
- V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, NY, USA, 1995.
- V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons, New York, NY, USA, 1998.
- K. R. Mller, A. J. Smola, G. Rtsch, B. Schlkopf, J. Kohlmorgen, and V. Vapnik, “Using support vector machines for time series prediction,” in Advances in Kernel Methods: Support Vector Machine, B. Scholkopf, J. Burges, and A. Smola, Eds., MIT Press, Cambridge, Mass, USA, 1999.
- T. Farooq, A. Guergachi, and S. Krishnan, “Knowledge-based Green's Kernel for support vector regression,” Mathematical Problems in Engineering, vol. 2010, Article ID 378652, 16 pages, 2010.
- J. Zheng and B. L. Lu Bao-Liang, “A support vector machine classifier with automatic confidence and its application to gender classification,” Neurocomputing, vol. 74, no. 11, pp. 1926–1935, 2011.
- J. Y. Zhu, B. Ren, H. X. Zhang, and Z. T. Deng, “Time series prediction via new support vector machines,” in Proceedings of the International Conference on Machine Learning and Cybernetics (ICMLC '02), pp. 364–366, November 2002.
- J. Ramana and D. Gupta, “LipocalinPred: a SVM-based method for prediction of lipocalins,” BMC Bioinformatics, vol. 10, p. 445, 2009.
- T. Joachims, “Text categorization with support vector machines: learning with many relevant features,” in Proceedings of the 10th European Conference on Machine Learning (ECML '98), pp. 137–142, Springer, Heidelberg, Germany, 1998.
- P. S. Jaume, M. I. Darío, and D. M. Fernando, “Support vector machines for continuous speech recognition,” in Proceedings of the 14th European Signal Processing Conference (EUSIPCO '06), Florence, Italy, September 2006.
- V. Bevilacqua, P. Pannarale, M. Abbrescia, and C. Cava, “Comparison of data-merging methods with SVM attribute selection and classification in breast cancer gene expression,” BMC Bioinformatics, vol. 13, supplement 7, p. S9, 2012.
- E. J. Spinosa and A. C. Carvalho, “Support vector machines for novel class detection in bioinformatics,” Genetics and Molecular Research, vol. 4, no. 3, pp. 608–615, 2005.
- A. Nurettin and G. Cüneyt, An Application of Support Vector Machine in Bioinformatics: Automated Recognition of Epileptiform Patterns in EEG Using SVM Classifier Designed by a Perturbation Method, vol. 3261 of Advances in Information Systems Lecture Notes in Computer Science, 2005.
- Y. F. Sun, X. D. Fan, and Y. D. Li, “Identifying splicing sites in eukaryotic RNA: support vector machine approach,” Computers in Biology and Medicine, vol. 33, no. 1, pp. 17–29, 2003.
- H. J. Lin and J. P. Yeh, “Optimal reduction of solutions for support vector machines,” Applied Mathematics and Computation, vol. 214, no. 2, pp. 329–335, 2009.
- A. Christmann and R. Hable, “Consistency of support vector machines using additive kernels for additive models,” Computational Statistics and Data Analysis, vol. 56, no. 4, pp. 854–873, 2012.
- Y. H. Shao and N. Y. Deng, “A coordinate descent margin based-twin support vector machine for classification,” Neural Networks, vol. 25, pp. 114–121, 2012.
- Y.-J. Lee and O. L. Mangasarian, “SSVM: a smooth support vector machine for classification,” Computational Optimization and Applications, vol. 20, no. 1, pp. 5–22, 2001.
- Y. B. Yuan, J. Yan, and C. X. Xu, “Polynomial smooth support vector machine (PSSVM),” Chinese Journal of Computers, vol. 28, no. 1, pp. 9–17, 2005.
- Y. B. Yuan and T. Z. Huang, A Polynomial Smooth Support Vector Machine for Classification, vol. 3584 of Lecture Note on Artificial Intelligence, 2005.
- J. Z. Xiong, J. L. Hu, H. Q. Yuan, T. M. Hu, and G. M. Li, “Research on a new class of functions for smoothing support vector machines,” Acta Electronica Sinica, vol. 35, no. 2, pp. 366–370, 2007.
- Y. Yuan, W. Fan, and D. Pu, “Spline function smooth support vector machine for classification,” Journal of Industrial and Management Optimization, vol. 3, no. 3, pp. 529–542, 2007.
- D. P. Bertsekas, Nonlinear Programming, Athena Scientific, Belmont, Ma, USA, 2nd edition, 1999.
- C. Xu and J. Zhang, “A survey of Quasi-Newton equations and Quasi-Newton methods for optimization,” Annals of Operations Research, vol. 103, no. 1–4, pp. 213–234, 2001.
- D. R. Musicant and O. L. Managsarian, “LSVM: Lagrangian support vector machine,” 2000, http://www.cs.wisc.edu/dmi/svm/.
- D. R. Musicant, “NDC: normally distributed clustered datasets,” 1998, http://www.cs.wisc.edu/~musicant/data/ndc/.
- T. K. Ho and E. M. Kleinberg, “Checkerboard dataset,” 1996, http://www.cs.wisc.edu/~musicant/data/ndc/.
Copyright © 2013 Qing Wu and Wenqing Wang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.