Information and Modeling in ComplexityView this Special Issue
An Efficient Approach to Solve the Large-Scale Semidefinite Programming Problems
Solving the large-scale problems with semidefinite programming (SDP) constraints is of great importance in modeling and model reduction of complex system, dynamical system, optimal control, computer vision, and machine learning. However, existing SDP solvers are of large complexities and thus unavailable to deal with large-scale problems. In this paper, we solve SDP using matrix generation, which is an extension of the classical column generation. The exponentiated gradient algorithm is also used to solve the special structure subproblem of matrix generation. The numerical experiments show that our approach is efficient and scales very well with the problem dimension. Furthermore, the proposed algorithm is applied for a clustering problem. The experimental results on real datasets imply that the proposed approach outperforms the traditional interior-point SDP solvers in terms of efficiency and scalability.
Semidefinite programming (SDP) is a technique widely used in modeling of complex systems and some important issues in computer vision and machine learning. Examples of application include model reduction , modeling of nonlinear systems [2, 3], optimal control , clustering [5–7], robust Euclidean embedding , kernel matrix learning , and metric learning . It can be seen in [5, 7] that SDP relaxation can produce more accurate estimates than spectral methods. However, the SDP optimization suffers extremely high time complexity. Current SDP solvers utilizing interior-point methods scale as or worse [1, 6, 10], where is the number of rows (or columns) of the semidefinite matrix. As a result, SDP can only run on small datasets. It is meaningful to propose a scalable SDP solver.
Inspired by column generation , which is a classical technology in optimization literature for solving large scale problem, Shen et al. proposed matrix generation in  for solving a semidefinite metric learning problem. By using matrix generation, the particular semidefinite program in metric learning was converted into a sequence of linear programs, and it is possible to use the well-developed linear programming technology.
In this paper, we propose the matrix generation-based iteration approach to solve general SDP optimization problems. At each iteration, the exponentiated gradient (EG) algorithm  is applied to solve the special structure subproblem. Experiments are also carried out on some real datasets to evaluate the proposed method's efficacy and efficiency.
The method proposed here can be seen as an extension of column generation to solve SDP problems. The proposed matrix generation method also has the drawback of column generation. It converges slowly of one wants to achieve high accuracy. This is the so-called “tailing effect”. In practice, for many applications, we do not need a very accurate solution. Typically, a moderately accurate solution suffices. On the other hand, the proposed method can also be considered as a generalization of EG to the matrix case. EG is used to solve problems with a vector optimization variable, and the vector must be on a simplex. Here, we generalize EG in the sense that the proposed method solves optimization with a semidefinite matrix whose eigenvalues are on a simplex.
We present our main results in the next section.
2. The Algorithm
The following notations will be used throughout the paper:(i)a bold lower-case letter : a column vector, (ii)an upper-case letter : a matrix, (iii): a positive semidefinite (p.s.d.) matrix, (iv): the component-wise inequality between two vectors,(v): the vector space of real matrices of size ,(vi): the space of real matrices,(vii): the space of symmetric matrices of size ,(viii): the space of symmetric positive semidefinite matrices of size ,(ix): the inner product defined on the above spaces, (x): the trace of a matrix,(xi): the -dimensional simplex set for vectors, (xii): the density matrix set, (xiii): the dyad set of matrices.
2.2. Solving Optimization Problem with SDP Constraints Using Matrix Generation
We are interested in the following general convex problem: Here, is a convex function defined in .
We need to extend the Fenchel conjugate  to matrices. For a function , the Fenchel conjugate : is the lower semicontinuous and convex function Now, we impose the following two conditions on : (i) is differentiable everywhere,(ii)the gradient is continuous and monotonically increasing at each direction.
In other words, is differentiable and strictly convex. In this case, the Fenchel conjugate is also called Legendre transform and has the following important result: By analogy, we can define a Fenchel conjugate for a matrix as in the vector space . defined in (2.4) must be lower semicontinuous and convex, because is a supremum of linear continuous functions of .
We are ready to reformulate our problem (2.1). It is proved in  that a p.s.d. matrix can always be decomposed as a linear convex combination of a set of rank-one matrices, we decompose into a convex combination of a set of dyads with , , for all . So, we can write (2.1) into Note that here, and are the optimization variables. Although is redundant in (2.6), we need to keep it to arrive at the important Lagrange dual problem later.
This above problems is still very hard to solve, since it has nonconvex rank constraints, and the variable is indefinite, because there are an indefinite number of rank-one matrices. However, if we somehow know matrices () a priori, then all the constraints imposed on () can be dropped, and the problem becomes a linear program.
We apply matrix generation to solve the problem. Matrix generation is an extension of column generation to nonpolyhedral semidefinite constraints for solving difficult large-scale optimization problems . It is a method to avoid considering all variables of a problem explicitly, and only a small subset of the entire variable set is considered. Once the problem with the small subset variables is solved, the question of if there are any other variables that can be included to improve the solution should be answered. For convex programs, the primal variables correspond to the dual constraints. The dual can be used to solve the above subproblem.
The Lagrangian is with , and the dual is where we have defined with , . The dual problem of (2.6) is Essentially the dual is
We now only consider a small subset of the variables in the primal; that is, only a subset of (denoted by ) is used. The LP solved using is usually termed restricted master problem (RMP). Because the primal variables correspond to the dual constraints, solving RMP is equivalent to solve a relaxed version of the dual problem . If all the constraints that we have not added to the dual problem are kept, solving the restricted problem is equivalent to solve the original problem. Otherwise, if there exists at least one constraint that is violated, the violated constraints correspond to variables in primal that are not in RMP. Adding these variables to RMP leads to a new RMP problem that needs to be reoptimized. Here, we have a iterative algorithm that either finds a new such that where is the solution of the current restricted problem, or it guarantees that such a does not exist. To make convergence fast, we find the one by solving the following optimization problem: The optimal value is exactly , where is the eigenvector of that corresponds to the smallest eigenvalue. For the time being, is added to the dyads, and the only variable to estimate is . The primal can be writhen as We generalize the EG algorithm to the matrix case and use it to solve the above problem.
At optimality, because of (2.3), we have the connection between the primal and dual variables We also know that at least one of the dual constraints takes the equality at optimality, which enable us to obtain can be used as the iteration stopping criteria.
2.3. The Algorithm Implementation
2.3.1. The Matrix Generation
The detailed implementation of the algorithm is proposed in Algorithm 1. We can observe, at each iteration, that a new matrix is generated, hence the algorithm is named matrix generation. Besides, at each iteration, only the most significant eigenvalue and its corresponding eigenvector are needed. This makes the algorithm efficient.
2.3.2. The EG Algorithm
In , it is shown that EG style updates can converge very quickly compared to other methods. It has been successfully applied to structured prediction problems such as structured support vector machines and conditional random field. We generalize the EG algorithm to matrix to solve the special structure convex optimization problem in MG algorithm. We first give a brief introduction to the EG algorithm [12, 14, 15]. The EG algorithm efficiently solves the convex optimization problem  under the assumption that the object function is a convex Lipschitz continuous function with Lipschitz constant w.r.t. a fixed given norm . The mathematical definition of is that holds for any , in the domain of . The detail of the EG algorithm is shown in Algorithm 2.
2.3.3. Evaluation of the Algorithm
In this section, we evaluate the convergence, running speed, and memory consumption of the algorithm by solving where is the Frobenius norm, is a symmetric p.s.d. matrix, and is a positive integer. This problem is important in the issue of the affinity matrix normalization of spectral clustering .
The above problem is approximate to where is a scalar, for example, .
Before reporting the results, we compare the proposed algorithm with the convex optimization solver, namely, SDPT3  which is used as internal solvers in the disciplined convex programming software CVX [19, 20]. The CVX toolbox can solve standard problems such as linear programs (LPs), quadratic programs (QPs), second-order cone programs (SOCPs), and semidefinite programs (SDPs).
We randomly generate a p.s.d. matrix in (2.18); let be 5. In Figure 1, we plot the optimal value obtained by the proposed algorithm at each iteration (the blue curve). The red dash line shows the ground truth obtained by directly solving the original problem in (2.16) using the CVX toolbox. We can see that the algorithm converges to the near-optimal solutions quickly.
(2) Running Time
We solve the problem in (2.16) using the proposed algorithm and the CVX toolbox, respectively, and the running time is shown in Figure 2. The blue curve is the running time of the proposed algorithm versus the matrix dimension, and the red curve is the running time of CVX versus the matrix dimension. We can see that the running time of CVX increases quickly with the matrix dimension and the running time of the proposed algorithm is a fraction of the CVX and scales very well with the matrix dimension. The proposed algorithm has a clear advantage on computational efficiency. One reason is that at each iteration, only the most significant eigenvalue and its corresponding eigenvector are needed in the proposed algorithm.
(3) Memory Consumption
Figure 3 shows the memory consumption of the proposed algorithm and the CXV toolbox. The blue curve is the approximate memory consumption of the proposed algorithm versus the matrix dimension, and the red curve is the approximate memory consumption of CVX versus the matrix dimension. It can be seen that the memory consumption of CVX increases quickly with the matrix dimension and the memory consumption of the proposed algorithm is a fraction of the CVX and scales very well with the dimension.
In summary, the proposed algorithm can converge to the near-optimal solutions quickly and scale very well with the dimension, which are very important for solving large-scale SDP problems. As a result, the proposed method is potential to be applied to computer vision to deal with large matrix of 3D visual data  and clustering in data analysis. In next section, we will show how to apply the proposed method to clustering problems.
3. The Application
The problem of partitioning data points into a number of distinct sets, known as the clustering problem, is one of the most fundamental and significant topics in data analysis and machine learning. From an optimization viewpoint, clustering aims at the minimal sum-of-squares (MSSC) based on some definition of similarly or distance. But the variable is a 0-1 discrete indicator matrix ( is the number of samples, and is the number of clusters), thus the optimization becomes a mixed integer programming with nonlinear objective, which is NP-hard. The computation time of NP-hard problems is too heavy to use in practical engineering applications and should be accelerated by some approximation methods. K-means and spectral clustering are the two most well-known algorithms to approximately solve the clustering problem. However, the former is merely able to achieve local optimum through an indirect way, and the later, which can obtain a global optimum through eigenvalue decomposition, is unfortunately a weak relaxation of the N-cut problem . Recent research of semidefinite programming (SDP) relaxation on clustering proposed that the minimal sum-of-squares (MSSC)  is possible to be relaxed as SDP problems, which is more direct than K-means and tighter than spectral clustering .
In this section, we apply the proposed algorithm to solve the SDP relaxed clustering problem (0-1 SDP for clustering)  where is the so-called affinity matrix whose entries represent the similarities or closeness among the entities in the dataset, and is the class number of the dataset. According to , it is reasonable to relax the nonnegative constraint in (3.1)
We solve the problem in (3.2) using the algorithm described in Algorithm 1. To test the algorithm, we carry out some experiments on four real datesets collected from UCI machine learning repository (http://www.ics.uci.edu/~mlearn/MLRepository.html.) The datasets are listed in Table 1 together with some their characteristics. , Size, and Dim are the class number, the number of instances, and the number of attributes of each dataset, respectively. All the experiments are done by using Matlab on a personal computer with a 2.93 GHz processor and a 2 G memory. The accuracy of clustering each dataset of our algorithm is compared with K-means, spectral clustering, and CVX (here CVX means the SDP problem in (3.2) is solved using the CVX toolbox). The results are shown in Table 1. We can see that(i)the proposed algorithm performs better than K-means,(ii)the proposed algorithm achieves higher accuracy than the spectral clustering that is because the proposed algorithm is the SDP relaxation of the clustring problem and is tighter than the spectral relaxation,(iii)the proposed algorithm is as good as CVX. This means it can solve the SDP problem as accurately as the CVX toolbox.
We also compare the speed of the proposed algorithm with CVX, and the results are shown in Table 2. It shows that the proposed algorithm is much faster than CVX.
As observed from the experiments, it can be found that the proposed algorithm performs very well on these datasets and can efficiently solve the 0-1 SDP problem for clustering. Both speed and accuracy are improved as compared with traditional interior-point SDP solvers. We plan to further extend the approach to more general problems and look for other applications such as modeling [23, 24] and stabilizing some kinds of dynamical systems  model reduction.
This paper proposed an approach to solve the optimization problem with SDP constraints using matrix generation and exponentiated gradient. The process is faster and more scalable than the traditional SDP approach. The results of numerical experiments show that the method scales very well with the problem dimensions. We also applied the approach to solve the SDP relaxation clustering problem. Experimental results on real datasets show that the algorithm outperforms much in both speed and accuracy as compared with interior-point SDP solvers.
A. Sootla, Model reduction using semidefinite programming, Licentiate Thesis, Department of Automatic Control, Lund University, Lund, Sweden, 2009.
Z. Mahmood, B. Bond, T. Moselhy, A. Megretski, and L. Daniel, “Passive reduced order modeling of multiport interconnects via semidefinite programming,” in Proceedings of the International Conference on Design, Automation and Test in Europe (DATE '10), pp. 622–625, 2010.View at: Google Scholar
P. S. Ansell, K. D. Glazebrook, I. Mitrani, and J. Niño-Mora, “Semidefinite programming approach to the optimal control of a single server queueing system with imposed second moment constraints,” Journal of the Operational Research Society, vol. 50, no. 7, pp. 765–773, 1999.View at: Google Scholar | Zentralblatt MATH
E. P. Xing and M. I. Jordan, “On semidefinite relaxations for normalized k-cut and connections to spectral clustering,” Tech. Rep., University of California at Berkeley, Berkeley, Calif, USA, 2003.View at: Google Scholar
B. Kulis, A. C. Surendran, and J. C. Platt, “Fast low-rank semidefinite programming for embedding and clustering,” in Proceedings of the International Workshop on Artificial Intelligence and Statistics, San Juan, Puerto Rico, 2007.View at: Google Scholar
T. Zhou and D. Tao, “Fast gradient clustering,” in Proceedings of the Advances in Neural Information Processing Systems, Workshop on Discrete Optimization in Machine Learning, pp. 1–6, 2009.View at: Google Scholar
L. Cayton and S. Dasgupta, “Robust euclidean embedding,” in Proceedings of the International Conference on Machine Learning, 2006.View at: Google Scholar
G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M. I. Jordan, “Learning the kernel matrix with semidefinite programming,” Journal of Machine Learning Research (JMLR), vol. 5, pp. 27–72, 2003/04.View at: Google Scholar
C. Shen, A. Welsh, and L. Wang, “PSDBoost: Matrix-generation linear programming for positive semidefinite matrices learning,” in Proceedings of the Advances in Neural Information Processing Systems, pp. 1473–1480, Vancouver, Canada, 2008.View at: Google Scholar
S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, Cambridge, UK, 2004.
J. Kivinen and M. K. Warmuth, “Additive versus exponentiated gradient updates for linear prediction,” Information and Computation, vol. 132, no. 1, pp. 1–64, 1997.View at: Google Scholar
C. Shen, P. Wang, and H. Li, “LACBoost and FisherBoost: optimally building cascade classifiers,” in Proceedings of the 11th European Conference on Computer Vision (ECCV '10), vol. 2 of Lecture Notes in Computer Science (LNCS) 6312, pp. 608–621, Springer, Crete Island, Greece, September 2010.View at: Publisher Site | Google Scholar
R. Zass and A. Shashua, “Doubly stochastic normalization for spectral clustering,” in Proceedings of theAdvances in Neural Information Processing Systems, 2006.View at: Google Scholar
M. C. Grant and S. P. Boyd, “Graph implementations for nonsmooth convex programs,” in Recent Advances in Learning and Control, V. Blondel, S. Boyd, and H. Kimura, Eds., vol. 371 of Lecture Notes in Control and Informmation Sciences, pp. 95–110, Springer, London, UK, 2008.View at: Publisher Site | Google Scholar | Zentralblatt MATH
B. N. Bond, Z. Mahmood, Y. Li et al., “Compact modeling of nonlinear analog circuits using system identification via semidefinite programming and incremental stability certification,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 29, no. 8, pp. 1149–1162, 2010.View at: Publisher Site | Google Scholar