Abstract

We investigate solving semidefinite programs (SDPs) with an interior point method called SDP-CUT, which utilizes weighted analytic centers and cutting plane constraints. SDP-CUT iteratively refines the feasible region to achieve the optimal solution. The algorithm uses Newton’s method to compute the weighted analytic center. We investigate different stepsize determining techniques. We found that using Newton's method with exact line search is generally the best implementation of the algorithm. We have also compared our algorithm to the SDPT3 method and found that SDP-CUT initially gets into the neighborhood of the optimal solution in less iterations on all our test problems. SDP-CUT also took less iterations to reach optimality on many of the problems. However, SDPT3 required less iterations on most of the test problems and less time on all the problems. Some theoretical properties of the convergence of SDP-CUT are also discussed.

1. Introduction

We consider the semidefinite programming problem (SDP) as in [1]: minimize𝑐𝑇π‘₯,(1.1)subjectto𝐴(𝑗)(π‘₯)≽0for𝑗=1,…,π‘ž,(1.2) where 𝐴(𝑗)(π‘₯)∢=𝐴0(𝑗)+βˆ‘π‘›π‘–=1π‘₯𝑖𝐴𝑖(𝑗). Also, π‘₯βˆˆβ„π‘›, π‘βˆˆβ„π‘›, and each 𝐴𝑖(𝑗) is an π‘šπ‘—Γ—π‘šπ‘— symmetric matrix. The constraint 𝐴(𝑗)(π‘₯)≽0 is known as a linear matrix inequality or LMI. For a given matrix 𝐴, 𝐴≽0 denotes that 𝐴 is positive semidefinite. The notation ≻0 is used when a matrix is positive definite. Semidefinite Programs (SDPs) are convex optimization problems [2], and they generalize many other convex optimization problems including linear programming. SDPs have applications in engineering and combinatorial optimization and other fields (see [2, 3]).

The dual of problem (1.1) is the optimization problem [1]: maximizeβˆ’π‘žξ“π‘—=1𝐴0(𝑗)‒𝑍𝑗(1.3)subjecttoπ‘žξ“π‘—=1𝐴𝑖(𝑗)‒𝑍𝑗=𝑐𝑖𝑍for𝑖=1,…,𝑛(1.4)𝑗≽0for𝑗=1,…,π‘ž.(1.5)

In the above, 𝐴‒𝐡 denotes the matrix dot product. If 𝐴=[π‘Žπ‘–π‘—] and 𝐡=[𝑏𝑖𝑗] are matrices of size π‘š, then βˆ‘π΄β€’π΅=π‘šπ‘–=1βˆ‘π‘šπ‘—=1π‘Žπ‘–π‘—π‘π‘–π‘—=𝐓𝐫[𝐴𝐡𝑇], where Tr denotes the trace. We let β„›={π‘₯∣𝐴(𝑗)(π‘₯)≽0 for all𝑗=1,…,π‘ž} the feasible region and int(β„›)={π‘₯∣𝐴(𝑗)(π‘₯)≻0 for all 𝑗=1,…,π‘ž} the interior of the feasible region. The set int(β„›) is precisely the set of all strictly feasible points. Let π‘βˆ— be the optimal solution to the primal problem (1.1), and let π‘‘βˆ— be the optimal solution to the dual problem (1.3). The duality gap π‘βˆ—βˆ’π‘‘βˆ— is zero if either SDP (1.1)-(1.2) or its dual (1.3)–(1.5) is strictly feasible [2].

Several interior point methods have been developed for solving SDPs (see [2, 4, 5]). When developing our interior point method in this paper, we assume a strictly feasible interior point π‘₯0 is known. There exist methods for finding feasible or near feasible points (see [6–8]). The algorithm we develop uses cutting planes and weighted analytic centers, and it iteratively refines the feasible region until the weighted analytic center approaches the optimal solution. We call our algorithm SDP-CUT. The cutting plane technique was pioneered by Gomory [9], Kelley [10], and also Cheney and Goldstein [11] for solving integer programming problems. An algorithm similar to our SDP-CUT for linear programs (LPs) was proposed by Renegar in his 1988 paper [12].

SDP-CUT is implemented using Newton’s method and different line search techniques. We found that Newton’s method with exact line search is the best implementation of our algorithm. The effects of a weight vector π‘€βˆˆβ„+ on the algorithm are also studied. A larger weight yields a faster and more accurate solution in theory, but in practice, too large a weight may cause numerical errors. We also experienced numerical errors in computing the Hessian matrices in SDP-CUT, when solving very large problems.

Since finding an interior point for an SDP problem is equivalent to solving another SDP problem, we decided to consider test problems with a known interior point and used that point as a starting point for SDP-CUT. We find SDPT3 to be an ideal method for comparison with SDP-CUT because it is known to be efficient and it allows the user to input a starting point. We found SDP-CUT was closer to the actual solution than SDPT3 for the initial iterations on all our test problems. SDP-CUT seems to slow down during later iterations to reach optimality. On the other hand, SDPT3 took less time on all the problems and less iterations on most of them. All codes were written in MATLAB version 7.9.0.

2. Weighted Analytic Center

This section discusses weighted analytic center as given in [13] and how to compute it. Other notions of weighted center for semidefinite programming are described in [14].

Given a weight vector πœ”βˆˆβ„π‘›+, the weighted barrier function for our system of LMIs (1.2) is defined as follows: πœ™πœ”βŽ§βŽͺ⎨βŽͺβŽ©βˆ’(π‘₯)=π‘žβˆ‘π‘—=1πœ”π‘—ξ€Ίπ΄logdet(𝑗)(ξ€»π‘₯)if𝐴(𝑗)(π‘₯)≻0βˆ€π‘—=1,…,π‘ž,+∞otherwise.(2.1)

Note that as π‘₯ approaches the boundary of the feasible region, πœ™πœ”(π‘₯) approaches ∞. We assume the set {diag(𝐴1(1),…,𝐴1(π‘ž)),diag(𝐴2(2),…,𝐴2(π‘ž)),diag(𝐴𝑛(1),…,𝐴𝑛(π‘ž))} is linearly independent. The function πœ™πœ” is analytic and strictly convex over β„› (see [13, 15]). The weighted analytic center is defined to be π‘₯ac(πœ”)=argmin{πœ™πœ”(π‘₯)∣π‘₯βˆˆβ„π‘›}. We call π‘₯ac(𝟏) the analytic center, where 𝟏=[1,1,…,1]. In the case of linear constraints and some other more general LMIs, each weight pushes the analytic center away from the boundary of the corresponding constraint. The gradient βˆ‡πœ™πœ”(π‘₯) and the Hessian βˆ‡2πœ™πœ”(π‘₯) are given by the following: for 𝑖=1,…,𝑛 and π‘˜=1,…,π‘›ξ€·βˆ‡πœ™πœ”ξ€Έ(π‘₯)𝑖=βˆ’π‘žξ“π‘—=1πœ”π‘—ξ€·π΄(𝑗)ξ€Έ(π‘₯)βˆ’1‒𝐴𝑖(𝑗),ξ€·βˆ‡2πœ™πœ”ξ€Έ(π‘₯)π‘–π‘˜=π‘žξ“π‘—=1πœ”π‘—ξ‚ƒξ€·π΄(𝑗)ξ€Έ(π‘₯)βˆ’1𝐴𝑖(𝑗)𝑇‒𝐴(𝑗)ξ€Έ(π‘₯)βˆ’1π΄π‘˜(𝑗)ξ‚„.(2.2) We will use the following specific example, SDP (2.3), to describe our method: SDP:minimize𝑧=π‘₯1+2π‘₯2𝐴subjectto(1)⎑⎒⎒⎣⎀βŽ₯βŽ₯⎦(π‘₯)=2001+π‘₯1⎑⎒⎒⎣⎀βŽ₯βŽ₯βŽ¦βˆ’1000+π‘₯2⎑⎒⎒⎣⎀βŽ₯βŽ₯⎦=⎑⎒⎒⎣01102βˆ’π‘₯1π‘₯2π‘₯21⎀βŽ₯βŽ₯βŽ¦π΄β‰½0,(1)(2)⎑⎒⎒⎣⎀βŽ₯βŽ₯⎦(π‘₯)=0000+π‘₯1⎑⎒⎒⎣⎀βŽ₯βŽ₯⎦1001+π‘₯2⎑⎒⎒⎣⎀βŽ₯βŽ₯⎦=⎑⎒⎒⎣π‘₯βˆ’10001βˆ’π‘₯200π‘₯1⎀βŽ₯βŽ₯βŽ¦β‰½0.(2)(2.3)

Weighted analytic centers for the LMI system (1.2) can be computed with the WAC-NEWTON algorithm described below. (Algorithm 1).

INPUT: point 𝑦 0 ∈ i n t ( β„› ) , weight vector πœ” ∈ ℝ π‘ž , tolerance W T O L > 0 , and maximum
number of iterations M A X
Set 𝑙 = 0
while   𝑙 < M A X   do
 1. Compute the direction vector 𝑠 𝑙 = βˆ’ ( βˆ‡ 2 πœ™ πœ” ( 𝑦 𝑙 ) ) βˆ’ 1 ( βˆ‡ πœ™ πœ” ( 𝑦 𝑙 ) )
 2. Compute the Newton decrement  𝑑 = 𝑠 𝑇 𝑙 ( βˆ‡ 2 πœ™ πœ” ( 𝑦 𝑙 ) ) 𝑠 𝑙
 3. Compute stepsize 𝛼 𝑙
 4. 𝑦 𝑙 + 1 = 𝑦 𝑙 + 𝛼 𝑙 𝑠 𝑙
 if   𝑑 < W T O L   then
  break while
 5. 𝑙 ← 𝑙 + 1
OUTPUT: π‘₯ a c ( πœ” ) = 𝑦 𝑙

An efficient way to calculate the search vector 𝑠𝑙 in WAC-NEWTON is by finding the Cholesky factorization of βˆ‡2πœ™πœ”(𝑦𝑙) and using it to solve the linear system βˆ‡2πœ™πœ”(𝑦𝑙)𝑠𝑙=βˆ’βˆ‡πœ™πœ”(𝑦𝑙). This is how 𝑠𝑙 is calculated in the SDP-CUT algorithm to be discussed in the next section and throughout this paper. Figure 1 shows the contours of πœ™πœ”(π‘₯), the effect of the weight vector πœ”=[3,1] on the barrier function, and the weighted analytic center. The figure uses the feasible region defined by SDP (2.3) and the associated barrier function. The weight of 3 on constraint (1) pushes the analytic center away from the boundary of constraint (1).

3. The SDP-CUT Algorithm

This section describes the development of SDP-CUT. We also discuss WAC-NEWTON*, which implements Newton’s method for finding the weighted analytic center for the new system defined in SDP-CUT. The section finishes with algorithms for computing the Newton stepsize (CONSTANT, ELS, and BACKTRACKING).

Refer again to the example SDP (2.3). Here, we illustrate one iteration of SDP-CUT. Figure 2 shows the setup. We have the feasible region, an initial point π‘₯βˆ—0=[1.2633,βˆ’0.2292]𝑇, and a cutting constraint of the form 𝑐𝑇π‘₯βˆ—π‘˜βˆ’π‘π‘‡π‘₯+πœ–β‰₯0 that accompanies the point. Figure 3 shows the movement of the point π‘₯βˆ—0 to π‘₯βˆ—1=π‘₯ac(𝟏), the (weighted) analytic center of our new feasible region made up of the the cutting constraint and the original LMI constraints. Figure 3 also has the contour lines of the barrier function. The weight on the cutting constraint can be changed from 1 to other larger values.

Given a system of LMI’s (1.2) and an objective function as in the primal SDP problem (1.1), we can numerically solve the problem by iteratively reducing the feasible set. Denote the current feasible region determined by our system of LMIs by β„›π‘˜, and suppose we know a strictly feasible point π‘₯βˆ—π‘˜ in β„›π‘˜. Initially, we set β„›0=β„›. We can find a new feasible region β„›π‘˜+1βŠ‚β„›π‘˜, where for allπ‘₯βˆˆβ„›π‘˜+1, 𝑐𝑇π‘₯≀𝑐𝑇π‘₯βˆ—π‘˜. We do this by adding a new constraint cut 𝑐𝑇π‘₯βˆ—π‘˜βˆ’π‘π‘‡π‘₯+πœ–β‰₯0. The πœ– is added to 𝑐𝑇π‘₯βˆ—π‘˜ to ensure that π‘₯βˆ—π‘˜ is β€œstrictly feasible” in the new system. Given weight π‘€βˆˆβ„+, we define a new barrier function to account for this new constraint: πœ™βˆ—π‘€ξ€Ίπ‘(π‘₯)=βˆ’π‘€log𝑇π‘₯βˆ—π‘˜βˆ’π‘π‘‡ξ€»βˆ’π‘₯+πœ–π‘žξ“π‘—=1𝐴logdet(𝑗)ξ€».(π‘₯)(3.1) The gradient βˆ‡πœ™βˆ—π‘€(π‘₯) and the Hessian βˆ‡2πœ™βˆ—π‘€(π‘₯) for this new barrier function are given by the following: for 𝑖=1,…,𝑛 and π‘˜=1,…,𝑛: ξ€·βˆ‡πœ™βˆ—π‘€ξ€Έ(π‘₯)𝑖=𝑀𝑐𝑖𝑐𝑇π‘₯βˆ—π‘˜βˆ’π‘π‘‡ξ€Έβˆ’π‘₯+πœ–π‘žξ“π‘—=1𝐴(𝑗)ξ€Έ(π‘₯)βˆ’1‒𝐴𝑖(𝑗),ξ€·βˆ‡(3.2)2πœ™βˆ—π‘€ξ€Έ(π‘₯)π‘–π‘˜=π‘€π‘π‘–π‘π‘˜ξ€·π‘π‘‡π‘₯βˆ—π‘˜βˆ’π‘π‘‡ξ€Έπ‘₯+πœ–2+π‘žξ“π‘—=1𝐴(𝑗)ξ€Έ(π‘₯)βˆ’1𝐴𝑖(𝑗)𝑇‒𝐴(𝑗)ξ€Έ(π‘₯)βˆ’1π΄π‘˜(𝑗)ξ‚„.(3.3)

Remark 3.1. Note that in the definition of πœ™βˆ—π‘€(π‘₯) in (3.1), weight [1,1,…,1] is used on the original LMI constraints. The weight 𝑀 is used only on the new cutting constraint in order to push our point toward the optimal solution. (see Algorithm 2).

INPUT: π‘₯ βˆ— 0 ∈ i n t ( β„› ) , weight 𝑀 ∈ ℝ + , πœ– > 0 , S T O L > 0 , and M A X
Set π‘˜ = 0
While   π‘˜ < M A X   do
 1. Compute cutting plain constraint 𝑐 𝑇 π‘₯ βˆ’ 𝑐 𝑇 π‘₯ βˆ— π‘˜ + πœ– β‰₯ 0
 2. Let π‘₯ βˆ— π‘˜ + 1 be the 𝑀 𝑒 𝑖 𝑔 β„Ž 𝑑 𝑒 𝑑 π‘Ž 𝑛 π‘Ž 𝑙 𝑦 𝑑 𝑖 𝑐 𝑐 𝑒 𝑛 𝑑 𝑒 π‘Ÿ of new system with barrier function πœ™ βˆ— 𝑀 ( π‘₯ )
 (3.1) to be computed by WAC-NEWTON* starting from the point π‘₯ βˆ— π‘˜
 Compute πœ• π‘˜ + 1 = 𝑐 𝑇 π‘₯ βˆ— π‘˜ βˆ’ 𝑐 𝑇 π‘₯ βˆ— π‘˜ + 1
 if   πœ• π‘˜ < S T O L   then
  break while
 3. π‘˜ ← π‘˜ + 1
OUTPUT: π‘₯ βˆ— c u t = π‘₯ βˆ— π‘˜ , 𝑝 βˆ— c u t = 𝑐 𝑇 π‘₯ βˆ— π‘˜

If successful, SDP-CUT terminates with an optimal solution π‘₯βˆ—cut and optimal objective function value π‘βˆ—cut. Rather than moving the plane in Step 1 of SDP-CUT, the point π‘₯βˆ—π‘˜ could also be moved in the direction of βˆ’π‘ to obtain a different starting point π‘₯βˆ—π‘˜βˆ’π›Ύπ‘ for some small 𝛾>0, instead of π‘₯βˆ—π‘˜. However, this can be problematic as we approach the optimal solution and the feasible region gets small. The point π‘₯βˆ—π‘˜βˆ’π›Ύπ‘ may pass over the entire remaining feasible region and thus fail to be feasible. If instead, we move the plane as originally suggested in SDP-CUT, π‘₯βˆ—π‘˜ will always be strictly feasible. Note that since the objective is to maximize 𝑐𝑇π‘₯ and π‘₯βˆ—π‘˜ is an interior point, then βˆ’π‘ is a feasible direction from π‘₯βˆ—π‘˜.

We denote by WAC-NEWTON*, the WAC-NEWTON algorithm applied to πœ™βˆ—π‘€(π‘₯) (3.1) for determining π‘₯βˆ—π‘˜+1 in the SDP-CUT algorithm. WAC-NEWTON* will return the weighted analytic center of the current feasible region, which will be our next iterate for SDP-CUT. In WAC-NEWTON*, the stepsize 𝛼𝑙 can be computed in a variety of different ways as discussed in the next section.

3.1. Line Searches: Computing the Newton Stepsize in WAC-NEWTON* Algorithm

We describe different options for computing the Newton stepsizes in the WAC-NEWTON* algorithm. The algorithm first computes the direction vector 𝑠𝑙. The Newton stepsize 𝛼𝑙 determines how far we should move in the direction of 𝑠𝑙 from the point 𝑦𝑙.

The pure Newton’s method uses a constant stepsize 𝛼𝑙=1. We will refer to this method of choosing the stepsize as β€œCONSTANT.” The CONSTANT algorithm has the advantage of not using computational time in deciding what stepsize to use. The disadvantage of Newton’s method with CONSTANT is that it usually results in the need to perform more iterations of WAC-NEWTON*, and it is possible to move out the feasible region. To get 𝛼𝑙 with exact line search, we solve the one-dimensional optimization problem: minimize{𝑔(𝛼)βˆ£π›Ό>0},(3.4) where 𝑔(𝛼)=πœ™βˆ—π‘€(𝑦𝑙+𝛼𝑠𝑙). Solving (3.4) is relatively easy and may cut the number of WAC-NEWTON* iterations in SDP-CUT, which are computationally harder in comparison 𝑔(𝛼)=πœ™βˆ—π‘€ξ€·π‘¦π‘™+𝛼𝑠𝑙𝑐=βˆ’π‘€log𝑇π‘₯βˆ—π‘˜βˆ’π‘π‘‡ξ€·π‘¦π‘™+π›Όπ‘ π‘™ξ€Έξ€»βˆ’+πœ–π‘žξ“π‘—=1𝐴logdet(𝑗)𝑦𝑙+𝛼𝑠𝑙.ξ€Έξ€»(3.5) Let π‘Žπ‘™=𝑀log[𝑐𝑇π‘₯βˆ—π‘˜βˆ’π‘π‘‡(𝑦𝑙+𝛼𝑠𝑙)+πœ–]. Then 𝑔(𝛼)=βˆ’π‘Žπ‘™βˆ’π‘žξ“π‘—=1𝐴logdet(𝑗)𝑦𝑙+𝛼𝑠𝑙=βˆ’π‘Žπ‘™βˆ’π‘žξ“π‘—=1𝐴logdet(𝑗)𝑦𝑙+𝑛𝑖=1𝛼𝑠𝑙𝑖𝐴𝑖(𝑗)ξƒ­.(3.6) Let 𝑀0(𝑗)=𝐴(𝑗)(𝑦𝑙) and 𝑀𝑠(𝑗)=βˆ‘π‘›π‘–=1(𝑠𝑙)𝑖𝐴𝑖(𝑗). Then 𝑔(𝛼)=βˆ’π‘Žπ‘™βˆ’π‘žξ“π‘—=1𝑀logdet0(𝑗)+𝛼𝑀𝑠(𝑗)ξ‚„.(3.7) Also, 𝑔(𝛼)=βˆ’π‘Žπ‘™βˆ’π‘žξ“π‘—=1𝑀logdet0(𝑗)+𝛼𝑀𝑠(𝑗)ξ‚„=βˆ’π‘Žπ‘™βˆ’π‘žξ“π‘—=1𝑀logdet0(𝑗)1/2𝑀𝐼+𝛼0(𝑗)ξ‚βˆ’1/2𝑀𝑠(𝑗)𝑀0(𝑗)ξ‚βˆ’1/2𝑀0(𝑗)1/2ξ‚Ή=βˆ’π‘Žπ‘™βˆ’π‘žξ“π‘—=1𝑀logdet0(𝑗)1/2𝑀det𝐼+𝛼0(𝑗)ξ‚βˆ’1/2𝑀𝑠(𝑗)𝑀0(𝑗)ξ‚βˆ’1/2𝑀×det0(𝑗)1/2ξ‚Ήξ‚Ή=βˆ’π‘Žπ‘™βˆ’π‘žξ“π‘—=1𝑀logdet0(𝑗)1/2ξ‚Ή2𝑀det𝐼+𝛼0(𝑗)ξ‚βˆ’1/2𝑀𝑠(𝑗)𝑀0(𝑗)ξ‚βˆ’1/2ξƒ­ξ‚Άξ‚Ή=βˆ’π‘Žπ‘™βˆ’π‘žξ“π‘—=1𝑀logdet0(𝑗)𝑀det𝐼+𝛼0(𝑗)ξ‚βˆ’1/2𝑀𝑠(𝑗)𝑀0(𝑗)ξ‚βˆ’1/2ξ‚Άξ‚Ήξ‚Ή=βˆ’π‘Žπ‘™βˆ’π‘žξ“π‘—=1𝑀logdet0(𝑗)βˆ’ξ‚ξ‚„π‘žξ“π‘—=1𝑀logdet𝐼+𝛼0(𝑗)ξ‚βˆ’1/2𝑀𝑠(𝑗)𝑀0(𝑗)ξ‚βˆ’1/2.ξ‚Άξ‚Ή(3.8) Let πœ†π‘–(𝑗) be an eigenvalue of (𝑀0(𝑗))βˆ’1/2𝑀𝑠(𝑗)(𝑀0(𝑗))βˆ’1/2. Then 𝑔(𝛼)=βˆ’π‘Žπ‘™βˆ’π‘žξ“π‘—=1𝑀logdet0(𝑗)βˆ’ξ‚ξ‚„π‘žξ“π‘šπ‘—=1𝑗𝑖=1log1+π›Όπœ†π‘–(𝑗)ξ‚„.(3.9) The function 𝑔(𝛼) is convex function over ℝ, and the exact stepsize 𝛼𝑙 can be found using the Newton’s method. Equations (3.7) and (3.9) give two different ways of computing 𝑔(𝛼), when finding the exact stepsize. If one decides to use (3.7), the first and second derivatives of 𝑔(𝛼) are required and given by π‘”ξ…ž(𝛼)=𝑀𝑐𝑇𝑠𝑙𝑐𝑇π‘₯βˆ—π‘˜βˆ’π‘π‘‡ξ€·π‘¦π‘™+π›Όπ‘ π‘™ξ€Έβˆ’+πœ–π‘žξ“π‘—=1𝑀0(𝑗)+𝛼𝑀𝑠(𝑗)ξ‚βˆ’1‒𝑀𝑠(𝑗),π‘”ξ…žξ…žπ‘€ξ€·π‘(𝛼)=βˆ’π‘‡π‘ π‘™ξ€Έ2𝑐𝑇π‘₯βˆ—π‘˜βˆ’π‘π‘‡ξ€·π‘¦π‘™+𝛼𝑠𝑙+πœ–2+(3.10)π‘žξ“π‘—=1𝑀0(𝑗)+𝛼𝑀𝑠(𝑗)ξ‚βˆ’1𝑀𝑠(𝑗)𝑇‒𝑀0(𝑗)+𝛼𝑀𝑠(𝑗)ξ‚βˆ’1𝑀𝑠(𝑗)ξ‚Ή(3.11) If (3.9) is used, the derivatives are simply given by π‘”ξ…ž(𝛼)=𝑀𝑐𝑇𝑠𝑙𝑐𝑇π‘₯βˆ—π‘˜βˆ’π‘π‘‡ξ€·π‘¦π‘™+π›Όπ‘ π‘™ξ€Έβˆ’+πœ–π‘žξ“π‘šπ‘—=1𝑗𝑖=1⋋𝑖(𝑗)1+𝛼⋋𝑖(𝑗),𝑔(3.12)ξ…žξ…žπ‘€ξ€·π‘(𝛼)=𝑇𝑠𝑙2𝑐𝑇π‘₯βˆ—π‘˜βˆ’π‘π‘‡ξ€·π‘¦π‘™+𝛼𝑠𝑙+πœ–2βˆ’π‘žξ“π‘šπ‘—=1𝑗𝑖=1⋋𝑖(𝑗)2ξ‚€1+𝛼⋋𝑖(𝑗)2.(3.13) We will use ELS-MAT to denote the exact line search computations done using (3.7), and ELS-EIG if computations are done from (3.9).

In addition to constant stepsize = 1 and exact line search, another way that was considered in computing the Newton stepsize was backtracking. This method involves starting with a stepsize >= 1 and then, decreasing the stepsize until a stopping condition is met (see [5, 16]). This technique guarantees a sufficient decrease in 𝑔(𝛼), often starting from 𝛼=1, in practice. (see Algorithm 3).

 INPUT: 0  <   𝛽   <  1 and 0  <   𝛾   <  0.5
 Set 𝛼  = 1
 while   𝑔 ( 𝛼 ) >  𝑔 (0) +  𝛼 Ξ³ ( βˆ‡ πœ‘ 𝑀 ( 𝑦 𝑙 ) ) 𝑇 𝑠 𝑙 𝐝 𝐨
   𝛼 ⟡ 𝛽 𝛼
 OUTPUT: 𝛼

It is known that Newton’s method converges quadratically close to the solution. In our case, WAC-NEWTON* converges rapidly for starting points that are not close to the boundary of the feasible region β„›π‘˜. Sometimes, we encounter difficulty when the starting point is too close to the boundary. Note that each time we make a new cut, our starting point is near the boundary. It is also the case, when SDP-CUT iterates approaches optimality. When close to the boundary, our direction vector 𝑠 often is very small, and thus a stepsize of 𝛼=1 may not make much progress. So, in these cases, the CONSTANT stepsize may spend many iterations, while making little progress, and each iteration wasting gradient and Hessian computations. Using the ELS algorithm, we find the proper stepsize and move out of this problem area in just one iteration. Consider the plots in Figure 4, which show the situation described above. Here, again, we are using SDP Example 2.3 and the point shown in Figure 2.

As we can see from Figure 4 on the left, the best choice of stepsize is much greater than 1. In Figure 4 on the right, we can see that we have now moved into an area where a stepsize of 1 is reasonable. BACKTRACKING will not help with this problem. BACKTRACKING only helps when the optimal stepsize is less than 1. If we try to adapt BACKTRACKING to help in the circumstance described above, we must initialize the BACKTRACKING stepsize to a large number, which creates two problems. First, when the optimal stepsize is close to 1, or smaller, we will waste time backtracking. Secondly, if we use a large initial stepsize, this may cause BACKTRACKING to send our point outside the feasible region, causing our algorithm to diverge. As it will be further shown in the next section, it does appear that ELS has a positive effect on SDP-CUT, while BACKTRACKING does not. Figure 5 has plots comparing how far CONSTANT and ELS move our point towards the optimal solution at each iteration. The plots highlight the β€œwasted iteration” problem with CONSTANT, which occurs when the optimal stepsize ismuch great than 1.

4. Experiment I: SDP-CUT Implementations

We have implemented SDP-CUT in four ways, varying in how the Newton stepsize is computed: CONSTANT, ELS-MAT, ELS-EIG, and BACKTRACKING. We will compare the performance of these four implementations against various test problems. For each problem, the SDP-CUT parameters were set at πœ–=10βˆ’6, STOL=10βˆ’12, WTOL=10βˆ’10, 𝑀=7 and MAX=100.

4.1. Performance as π‘ž Varies

Here, we left the number of variables constant at 𝑛=2 and varied the number of constraints π‘ž from 2 to 10. The matrices have their sizes fixed at π‘šπ‘—=5 for all 𝑗. For each value of π‘ž, 10 SDP problems were randomly generated, and SDP-CUT was used to solve each problem. The number of SDP-CUT iterations (which we also call WAC-NEWTON* iterations) and run time for SDP-CUT were recorded for each problem. Figures 6 and 7 give plots of the averages with π‘ž on the horizontal axis. The vertical axis in Figure 6 contains the total SDP-CUT iterations. In Figure 7, the vertical axis gives the total run time of SDP-CUT in seconds.

Figure 6 shows there is not a correlation between the number of LMI constraints and the number of SDP-CUT iterations (WAC-NEWTON* iterations) needed to find the optimal solution. Figure 6 also shows that ELS-EIG and ELS-MAT both effectively reduce the number of SDP-CUT iterations needed, while BACKTRACKING has about the same number of iterations as CONSTANT. Figure 7 shows that in the case of 𝑛=2, ELS-MAT and ELS-EIG run the fastest, while BACKTRACKING runs far slower than anything else. Time increases as π‘ž increases due to the gradient and Hessian computations. As we can see in formulas (3.2) and (3.3), for each constraint, we must perform matrix inverses, dot products, and multiplications. This makes each iteration computationally harder, but does not affect the iterations needed as is seen in Figure 6. BACKTRACKING’s time increases the most with π‘ž due to the eigenvalue computation needed for the stopping condition. BACKTRACKING performs these extra computations, but SDP-CUT does not benefit from them, and consequently BACKTRACKING has the highest run time.

4.2. Performance as 𝑛 Varies

Here, we left the number of constraints constant at π‘ž=3 and varied the number of variables 𝑛 from 2 to 10. The matrices sizes were set at π‘šπ‘—=5 for all 𝑗. For each value of 𝑛, 10 SDP problems were randomly generated, and SDP-CUT was used to solve each problem. The number of SDP-CUT iterations (WAC-NEWTON* iterations) and run time for SDP-CUT were recorded for each problem. Figures 8, 9, and 10 are plots of the averages with 𝑛 on the horizontal axis. The vertical axis contains the SDP-CUT iterations (WAC-NEWTON* iterations) needed in Figure 8. The vertical gives the run time of SDP-CUT in seconds in Figure 9. There is also a third plot, Figure 10, which shows 𝑛 on the horizontal axis and the average time required to perform an iteration of SDP-CUT on the vertical axis.

Figure 8, again, shows that ELS-EIG and ELS-MAT are effective in reducing the number of SDP-CUT iterations needed, while BACKTRACKING fails to reduce the number of iterations needed. Figure 8 also shows a positive correlation between the number of SDP-CUT iterations (WAC-NEWTON* iterations) and 𝑛. From Figure 9, we see that the exact line search reduces the total time needed for our algorithm, especially as 𝑛 gets larger. This is important, because as 𝑛 grows, the number of SDP-CUT iterations and the time required per iteration both increase (see Figure 10). For small 𝑛, CONSTANT is quicker per SDP-CUT iteration, but as 𝑛 grows there is no noticeable difference in the time per iteration between CONSTANT or either ELS implementation. This is because the computations needed for the optimization problem (3.4) becomes so small compared to the computations need to compute βˆ‡πœ™βˆ—π‘€(π‘₯) and βˆ‡2πœ™βˆ—π‘€(π‘₯).

4.3. Performance as π‘š Varies

Here, we left the number of constraints constant at π‘ž=2 and the number of variables fixed at 𝑛=2. We varied the matrix sizes π‘šπ‘—=π‘š from 5 to 60. The number of SDP-CUT iterations (WAC-NEWTON* iterations) and run time were recorded for each problem. Below are plots of the averages with π‘š on the horizontal axis. The vertical axis in Figure 11 contains the total SDP-CUT iterations (WAC-NEWTON* iterations) used to solve the problem. In Figure 12, the vertical axis gives the total run time of SDP-CUT in seconds. There is also a third plot, Figure 13, which shows π‘š on the horizontal axis and the time required to perform an iteration of SDP-CUT on the vertical axis.

Figure 11, again, shows that ELS reduces the required number of SDP-CUT iterations (WAC NEWTON* iterations). However, as the size of the constraint matrices becomes large, the computations needed for the exact line search grow. Figure 12 shows the ELS-MAT is the fastest in the range of matrix sizes we tested. Unlike our other experiments, CONSTANT beats ELS-EIG. The reason for this is that the time per iteration of ELS-EIG grows very rapidly as is seen in Figure 13. This occurs because the computation of eigenvalues becomes increasingly difficult as the matrix grows in size.

ELS appears to be the best implementation of SDP-CUT. ELS is very effective in reducing the number of WAC NEWTON* iterations needed. ELS-MAT outperformed ELS-EIG in general. The eigenvalues allow for fast computation of the gradient and Hessian of 𝑔(𝛼) once the eigenvalues are obtained (see (3.12) and (3.13)). However, the large cost of computing the eigenvalues outweighs this benefit, especially as the matrices become large. ELS-MAT uses a slower means of computing the gradient and Hessian of 𝑔(𝛼), but has no eigenvalue calculation overhead (see (3.10) and (3.11)). The slowest operation needed is matrix inversion. Matrix inversion becomes harder as the matrices get large, but it scales much better than finding the eigenvalues. It is conceivable that very large matrix situations could arise in which both ELS-EIG and ELS-MAT are too expensive; in this case it may be best to use CONSTANT. We were unable to find any cases in our test problems in which BACKTRACKING was beneficial. From now onwards, SDP-CUT will always refer to SDP-CUT with ELS-MAT implementation.

4.4. Convergence and the Weight 𝑀, Leveling Off Effect

Increasing the weight 𝑀 will decrease the number of SDP-CUT iterations needed to find the optimal solution. A larger weight will also allow us to get closer to the optimal solution. However, if 𝑀 becomes too large, numerical errors will prevent convergence. Here, we will discuss the role 𝑀 plays in the rate of convergence, as well as how large we can make 𝑀 before SDP-CUT fails. Figure 4 graphically demonstrates the convergence of SDP-CUT on our example (SDP (2.3)) as the weight 𝑀 varies.

From the plots in Figure 14, we can visibly see that by increasing the weight, SDP-CUT moves toward the optimal solution faster. In attempts to see how fast SDP-CUT converges and what effect the weight has, we performed the following experiment. We ran SDP-CUT on the example and kept track of the iterates π‘₯βˆ—π‘˜ at each iteration. Below is a plot of β€–π‘₯βˆ—βˆ’π‘₯βˆ—π‘˜β€– shown on a log scale against iterations as we varied the weight 𝑀=1,3,5,7,9, where π‘₯βˆ— is the optimal solution. We see in Figure 15 that the distance from the estimate to the actual solution decreases exponentially, and linearly when shown on the log scale. We found that in most cases our implementation of SDP-CUT breaks down around 𝑀=10.

As we can see, increasing the weight means quicker convergence. However, Figure 15 only shows the first 10 iterations. As we approach the optimal solution, the rate of convergence slows and β€œlevels off.” This is caused by the feasible region becoming very small and by moving the cutting constraint back by πœ–, and thus the SDP-CUT iterates stay in approximately the same place. This problem is demonstrated in Figure 16. We see that larger weights allow us to get closer to the optimal solution, but even with a larger weight our convergence β€œlevels off.”

One way to get closer to the optimal solution before the weighted analytic centers start to β€œlevels off,” is to use a smaller πœ–. The difficulty with decreasing πœ– is if πœ– becomes too small, our SDP-CUT iterates come too close to the boundary, giving rise to β€œnumerical problems,” especially in computing the Hessian matrices. We found that SDP-CUT worked with =10βˆ’6, but failed when πœ–=10βˆ’7 in example SDP (2.3). The β€œnumerical problems” are due to the fact that near the boundary, one of the LMI constraint matrices is near-singular, and we need to compute matrix inverses. Figure 17 shows what the β€œleveling off” effect looks like over the feasible region of SDP problem (2.3). Notice that the sequence of iterates π‘₯βˆ—π‘˜, from SDP-CUT, reaches a limit before it reaches the boundary of the region.

5. Experiment II: SDP-CUT versus SDPT3

In this section, the algorithms SDP-CUT and the well-known SDPT3 [17] method are compared on a variety of SDP test problems. Since SDPT3 is known to be efficient, we also used it to find the best possible estimation of the actual optimal objective function values. Another reason for using SDPT3 is its flexibility to allow the user to input a starting point.

For the experiment, we tested 20 SDP problems and solved each with SDP-CUT and SDPT3. Each SDP problem was randomly generated, where the dimension 𝑛 and number of constraints π‘ž are random integers on the intervals [2,20] and [1,20] respectively. The size π‘šπ‘— of each LMI is a random integer on [1,10]. The values of 𝑛 and π‘ž are listed in the table below. The values of π‘šπ‘— are not given because they are too many. For example, SDP 1 has π‘š=[7,5,9,8,5,1]. The problems were generated in a way that makes the origin, an interior point for all of them. For all problems, we used tolerances STOL=10βˆ’3, WTOL=10βˆ’6, πœ–=10βˆ’7, and a weight of 𝑀=7 was used with SDP-CUT. For the first 10 test problems, the iteration limit was MAX=5. For the remaining 10 problem, the iteration limit is raised to MAX=100. We record the total iterations performed for each algorithm as well as the time taken. The last two columns of the table below show the error in the calculated optimal objective function value compared to the actual value π‘βˆ—actual obtained from SDPT3 with no restriction on the number of iterations. In the table below, the suffixescut and SDPT3 are used to distinguish between values pertaining to SDP-CUT and SDPT3, respectively. 𝑁, 𝑇, and 𝐸 are used to denote the number of iterations, time, and absolute error, respectively. For example, 𝐸cut=|π‘βˆ—cutβˆ’π‘βˆ—actual|.

We see in Table 1 that after 5 iterations, SDP-CUT gives a more accurate answer in all the first 10 test problems. However, if allowed to run for more iterations all the algorithms successfully found the optimal solutions, but SDPT3 took less number of iterations to reach optimality in 6 out of the 10 problems. Also SDPT3, took less time in all the 20 test problems. We believe the difference in times is partly because SDPT3 has over the years been optimized to run efficiently, while SDP-CUT is at a beginning of its development. It is interesting to note that SDP-CUT took fewer iterations in 4 out of the 10 problems.

We need to do more work to make SDP-CUT more efficient. We noticed that the Hessian matrix βˆ‡2πœ™βˆ—π‘€(π‘₯) in SDP-CUT is prone to errors in the case of problems that are very large. In those cases, βˆ‡2πœ™βˆ—π‘€(π‘₯) loses its positive definiteness and thus its Cholesky factorization is not possible. That is the main reason why we could not run SDP-CUT on the SDP test problems at SDPLIB [18]. For example, SDP-CUT was not successfully in solving π‘‘π‘Ÿπ‘’π‘ π‘ 2 of SDPLIB, due to numerical problems when computing the Hessian matrices. This problem has 𝑛=58 variables and π‘ž=133 constraints. However, SDP-CUT was successful in solving π‘‘π‘Ÿπ‘’π‘ π‘ 1, but with 𝑀=5.

6. Convergence of SDP-CUT

Let {π‘₯βˆ—π‘˜} be the sequence of estimates for π‘₯βˆ— generated by SDP-CUT. Let β„›π‘˜ be the feasible region after π‘˜-cuts. Thus, π‘₯βˆ—π‘˜ is the weighted analytic center of β„›π‘˜ for all π‘˜>0. Let {πœ•π‘˜} be the sequence defined by πœ•π‘˜=𝑐𝑇(π‘₯βˆ—π‘˜βˆ’π‘₯βˆ—π‘˜+1).

The following theorem shows that when ||𝑐|| and ||π‘₯βˆ—π‘˜βˆ’π‘₯βˆ—π‘˜+1|| are both small, then πœ•π‘˜ is also small. Note that when πœ•π‘˜ becomes very small, SDP-CUT is no longer making progress on the objective function values. This is especially true in the case of β€œleveling off” effect discussed in Section 4. This also indicates that our stopping criterion β€–π‘₯βˆ—π‘˜βˆ’π‘₯βˆ—π‘˜+1β€–<STOL does not always mean convergence to the optimal solution.

Theorem 6.1. If β€–π‘₯βˆ—π‘˜βˆ’π‘₯βˆ—π‘˜+1β€–<STOL, then πœ•π‘˜<‖𝑐‖STOL.

Proof. Assume β€–π‘₯βˆ—π‘˜βˆ’π‘₯βˆ—π‘˜+1β€–<STOL and use the Cauchy-Schwartz Inequality πœ•π‘˜=𝑐𝑇π‘₯βˆ—π‘˜βˆ’π‘₯βˆ—π‘˜+1ξ€Έβ‡’πœ•π‘˜β€–β€–π‘₯β‰€β€–π‘β€–βˆ—π‘˜βˆ’π‘₯βˆ—π‘˜βˆ’1β€–β€–β‡’πœ•π‘˜<‖𝑐‖STOL.(6.1)

How close we can get to the optimal solution depends on the parameters 𝑀 and πœ–. As πœ– becomes smaller we are able to obtain smaller regions, which allows SDP-CUT to get closer to the optimal solution. For a given region, a larger 𝑀 causes SDP-CUT push the estimate of the optimal solution π‘₯βˆ—π‘˜ further in the βˆ’π‘ direction. For any choice of iterate π‘₯βˆ—π‘˜, if we consider its limit as β†’βˆž, we should be able to push π‘₯βˆ—π‘˜ as close to the optimal solution π‘₯βˆ— as we like. In practice, both the decrease πœ–β†’0 and π‘€β†’βˆž cause numerical problems. A question we propose is: in the limit as π‘€β†’βˆž, how many iterations of SDP-CUT are needed for convergence?

Since the cutting plane is a linear constraint, it is possible that only one iteration is needed as π‘€β†’βˆž (at least in some cases). Consider the following SDP (LP) example: SDP:minimizeπ‘₯2subjectto1βˆ’π‘₯1β‰₯01+π‘₯1β‰₯01βˆ’π‘₯2β‰₯01+π‘₯2β‰₯0.(6.2)

The feasible region of this example is given in Figure 18. Consider the point (0,0) in the feasible region and the cutting constraint through this point. Also, consider the corresponding barrier function πœ™βˆ—π‘€(π‘₯) (3.1) and its gradient βˆ‡πœ™βˆ—π‘€(π‘₯) (3.2). At the weighted analytic center, the gradient is zero. This gives 11βˆ’π‘₯1βˆ’11+π‘₯1𝑀=0,πœ–βˆ’π‘₯2+11βˆ’π‘₯2βˆ’11+π‘₯2=0.(6.3) It is clear from Figure 18 that the weighted analytic center has a negative π‘₯2 value. So, the above equations show that the weighted analytic center is π‘₯1π‘₯=0,2=βˆšπœ–βˆ’πœ–2+2𝑀+𝑀2.2+𝑀(6.4) We see that the weighted analytic center (π‘₯1,π‘₯2) converges to the optimal solution (0,βˆ’1) as π‘€β†’βˆž. This example suggests the following conjecture.

Conjecture 6.2. SDP-CUT converges to π‘₯βˆ— as π‘€β†’βˆž.

6.1. Scaling the Vector 𝑐

Theorem 6.1 shows the dependence of πœ•π‘˜ on ||𝑐||. Here, we will consider the effect of scaling 𝑐 on SDP-CUT. If we scale 𝑐 by 𝜈>1 such that we are now trying to optimize πœˆπ‘π‘‡π‘₯, we do not change the optimal solution π‘₯βˆ—. We will compare how close SDP-CUT gets when optimizing for various values of 𝜈. Figure 19 is a plot of the results ||π‘₯βˆ—βˆ’π‘₯βˆ—cut|| on the vertical axis and the scaling factor 𝜈 is on the horizontal axis. We used example SDP (2.3) with STOL=10βˆ’12, MAX=100, 𝑀=10. We can see in Figure 19 that scaling the objective vector 𝑐 by a factor 𝜈>1 does have a positive effect on the accuracy of SDP-CUT.

7. Conclusion

We have developed and studied a new interior point method, SDP-CUT, for solving SDPs. We found that implementing SDP-CUT using Newton’s method with exact line search, in general, gives the best performance. SDP-CUT appears to be very effective in getting into the neighborhood of the optimal solution as seen in problems 1–10 in Table 1. However, there is the β€œleveling off” effect shown in Figure 16. Future work on SDP-CUT can involve investigating Conjecture 6.2. Implementations that can handle a larger weight 𝑀 or smaller πœ– will result in better performance.

Another area for improvement is the stopping criteria. Currently, the stopping criteria is ||π‘₯βˆ—π‘˜βˆ’π‘₯βˆ—π‘˜βˆ’1||<STOL, which lets us know if we are no longer making progress. However, this does not always guarantee optimality of the solution. We have also shown that scaling the objective vector 𝑐 by some factor 𝜈>1 could allow SDP-CUT to give a better solution. Also, it is important to investigate how to make all the other components of SDP-CUT work well and efficiently. For example, we would like to find a way to compute the Hessian βˆ‡2πœ™βˆ—π‘€(π‘₯))π‘–π‘˜ more efficiently and without errors. The way we compute the stepsizes in Newton’s method is an area that requires further attention. The exact line search we preferred over the others is probably too expensive and a better way may be possible. These improvements would allow us to test SDP-CUT on larger problems and compare it with other methods. It is also of interest to study what kind of problems SDP-CUT would work well or fail.

Acknowledgment

The first author would like the thank the National Science Foundation (NSF) for the opportunity to start initial work on this paper through their funding of the Research Experience for Undergraduates (REU) program at Northern Arizona University.