Preconditioned ADMM for a Class of Bilinear Programming Problems
We design a novel preconditioned alternating direction method for solving a class of bilinear programming problems, where each subproblem is solved by adding a positive-definite regularization term with a proximal parameter. By the aid of the variational inequality, the global convergence of the proposed method is analyzed and a worst-case convergence rate in an ergodic sense is established. Several preliminary numerical examples, including the Markowitz portfolio optimization problem, are also tested to verify the performance of the proposed method.
Let , , be the set of real numbers, the set of -dimensional real column vectors, and the set of real matrices, respectively. For any , we use the symbol to denote their inner product and the symbol to stand for the Euclidean norm of , where the superscript is the transpose. Symbol is identity matrix, which is simply denoted by with proper dimension in the context. Consider the following generalized bilinear programming:where is a closed proper convex function (possibly nonsmooth); are given matrices and vectors, respectively; and are closed convex sets with nonempty intersection. Throughout this article, we assume that the solution set of (1) is nonempty.
The bilinear programming (1) arises in many applications, for instance, the Linear-Max-Min problem, the Location-Allocation problem, and the classic Markowitz portfolio optimization problem; see, for example, [1, 2] for more details. As an extension of the linear programming, problem (1) can be recast as the following existing model:withwhich can be solved by some classical optimization methods if the constrained set is easy, such as the proximal point algorithm  and the augmented Lagrangian method . However, using such transformations will lead to a large-scale coefficient matrix and increase the computational complexity and storage requirement. In particular, when sets and are complex or the objective function is not well defined, we would better deal with the variables separately and make the best of the structure properties of the given sets.
Next, we focus on developing a preconditioned alternating direction method of multipliers (P-ADMM) for solving (1), where each subproblem can solved by adding a proximal regularization term and the Lagrange multipliers can be updated by using some preconditioned symmetric positive-definite (SPD) matrices. By the aid of new variables and , problem (1) is firstly reformulated as follows:where , An obvious advantage of such reformulation is that the given sets and can be treated separately instead of regarding them as a whole. Given symmetric positive-definite matrices and , the augmented Lagrangian function of (4) is given by where is a penalty parameter andis the Lagrangian function of problem (4) with a multiplier Then, the introduced P-ADMM obeys the following iterative scheme:where , are two independent proximal parameters that control the proximity of the new iterative value to the last one; see, for example,  for more explanations.
Noting that scheme (7) can be regarded as an extended alternating direction method of multipliers (ADMM) of , because the two parameters , are independent instead of the same value and all the preconditioned matrices are not always identity matrices. For excellent reviews of the ADMM, we refer the reader to, for example, [6–14] and the references therein, and also the recent published symmetric ADMM with larger step sizes in  which is an all-sided work for the two-block separable convex minimization problem. Besides, Goldstein et al.  developed a Nesterov’s accelerated ADMM for problem (2) with partitions: where the global convergence bounds were derived in terms of the dual objective to the original under the assumption that the two objectives were strongly convex. In 2016, He et al.  proved that both the Jacobian decomposition of the augmented Lagrangian method and the proximal point method were equivalent for solving the multiblock separable convex programming with one linear equality constraint, that is, problem (2) with partitions: Although there are lots of results about ADMM, most of concerned problems are still with one linear equality constraint, and research results of a preconditioned ADMM for (1) are few as far as we know.
Contributions of this paper are summarized as two aspects. One is that we introduce a novel preconditioned alternating direction method for solving the bilinear programming problem (1), and we prove that the constructed method is global convergent with a worst-case convergence rate in an ergodic sense. Another contribution is that several large-scale practical examples are tested to show the effectiveness of the proposed method. In Section 2, we analyze the convergence of the proposed method in detail. In Section 3, we first introduce a linearization technique for solving the involved subproblems of the P-ADMM (7) and then carry out some numerical experiments about the large-scale quadratic programming with two linear equality constraints. Finally, we conclude the paper in Section 4.
2. Convergence Analysis
By making use of the variational inequality, this section presents a unified framework to characterize the solution set of the reformulated problem (4) and the optimality conditions of the involved subproblems in (7). Moreover, the global convergence and the worst-case convergence rate of the proposed method are analyzed in detail. We begin with a basic lemma given in .
Lemma 1. Let and be convex functions defined on a closed convex set and be differentiable. Assume that the solution set of the problem is nonempty. Then we have
Any tuple is called a saddle point of Lagrangian function (6) if it satisfies which implies that finding a saddle point of is equivalent to finding a point such that By rewriting the above inequalities as a compact variational inequality (VI), we havewhere Note that the mapping is skew-symmetric, so the following fundamental property holds:By the assumption that the solution set of (1) is nonempty, the solution set of is also nonempty. The next theorem describes a concrete way to characterizing the set , whose proof is the same as that of Theorem 2  and is omitted here.
Theorem 2. The solution set of in (14) is convex and can be expressed as
For any , Theorem 2 shows that if then the vector is called an -approximate solution of , where is an accuracy, especially .
Lemma 3. Let the sequence be generated by the algorithm P-ADMM. Then we havewhere
Proof. Applying Lemma 1, the optimality conditions of the two subproblems in (7) areSince the update of the Lagrange multipliers in (7) satisfiessubstituting (22) into (21) we obtainNotice that (22) can be rewritten asCombining (23) and (24), we immediately complete the proof.
Note that matrix in Lemma 3 is strictly SPD, because the upper-left block matrix is SPD for any , and the lower-right diagonal matrix is SPD from the symmetric positivity of the matrices and Compared with the inequalities (14) and (19), the key to prove the convergence of the algorithm P-ADMM is to verify that the cross term of (19) converges to zero, that is,In other words, the sequence would be contractive under the weighted matrix . In what follows, we will show such assertion by using the definition for any
Lemma 4. The sequence generated by the algorithm P-ADMM satisfies
Proof. Setting in (19), it follows that By making use of (16) and (14), we get which leads toBased on (29) and the symmetric positivity of , we can obtain
Theorem 5. Let the sequence be generated by the algorithm P-ADMM, then the following assertions hold:(a)(b)The sequence is bounded.(c)Any accumulation point of is a solution point of (d)There exists such that
Proof. Summing the inequality (26) over , we have which implies because of the symmetric positivity of the matrix . The assertion (b) is evident followed by (a). By taking the limit of (19) and using the assertion (a), we get which shows that is a solution point of , that is, the assertion (c) holds.
Let be an accumulation point of . Then the third assertion implies that and Using the above inequality together with the assertion (a), the proof of (d) is completed.
Theorem 6. For any integer and the sequence generated by the P-ADMM (7), letThen it holds that
Proof. Clearly, since it can be treated as a convex combination of . Substituting (16) into (19), we deduce thatBy utilizing an identity we obtain which makes (36) become Summing the above inequality over , we have Since is convex and , so it holds that Substituting it into (40), the proof is completed.
Remark 7. Theorem 5 shows that the P-ADMM (7) is globally convergent. And Theorem 6 tells us that for any given compact set and , the vector must satisfy which shows that the proposed method converges in a worst-case rate in an ergodic sense.
Remark 8. The penalty parameter in (7) can be updated by the formula with , and it is a constant when taking The preconditioned matrices and are usually chosen as the identity matrix, the diagonal matrix with positive diagonal entries, or the tridiagonal matrix.
3. Numerical Experiments
In this section, we investigate the feasibility and efficiency of the proposed method by some numerical experiments about the quadratic programming model with two linear equality constraints. The codes of the algorithm P-ADMM are written in MATLAB 7.10 (R2010a) and the experiments are carried out on a PC with Intel Core i5 processor (3.3 GHz) with 4 GB memory. Inspired by Theorem 5, we take an easily implementable stopping criterion for the proposed method, that is, where is the given tolerance and is the th iteration generated by scheme (7).
To avoid the case that the subproblems of (7) have no explicit solution, the two preconditioned matrices are simply chosen as the identity matrix, and we then can use a linearized strategy to accelerate the convergence of solving the subproblems. Without loss of generality, we take the -subproblem, for example. In such case, the -subproblem is equivalent towhere By the well-known Taylor formula in mathematical analysis, the quadratic term can be approximated by where is a proximal factor and is the gradient of the quadratic term at . Hence, the objective function in (44) is of the equivalent form:which makes the -subproblem have closed solution form. The -subproblem can be also tackled in a similar way as the -subproblem. For more cases that are analogous to (47), the explicit solution form can be dated back to Lemmas 1 and 3 given in .
In what follows, the penalty parameter is updated by the formula with , the proximal parameters are chosen as , and the iterative variables are initialized as fixed values Generally speaking, the quadratic programming with two linear equality constraints is of the following form:where is a positive semidefinite matrix. Model (48) includes the classic Markowitz portfolio optimization problem as a special case; see, for example,  and Example 10.
Example 9. Consider model (48) with a simple case , where the given data are randomly generated by the following MATLAB codes:
For this example, Table 1 reports several experimental results of Example 9 with by the algorithm P-ADMM with different tolerance error, including the number of iterations (denoted by “IT”), the CPU time (denoted by “CPU”), the iterative error of the solution (denoted by “ERR”), and the residual of the objective (denoted by “ROB”). Figure 1 still draws the convergence curves of the residual of the objective and the iterative error of the solution under the tolerance , respectively.
From Table 1, we can see that when using the P-ADMM to solve Example 9, the CPU time cost is less than 0.2 seconds and the number of the iterations is not bigger than 65 steps. The obtained results, including the residual error ERR listed in Table 1 and the convergence curves depicted in Figure 1, verify the feasibility and efficiency of the P-ADMM scheme for solving the small-scale problem. Besides, the last two columns of Table 1 imply that we can choose a relatively tiny stopping criterion (e.g., ) to obtain nearly the same value of the objective and to save the CPU time.
Example 10. Consider model (48) with and
Then model (48) immediately becomes the Markowitz portfolio optimization problem:where the matrix stands for the covariance matrix of the return on the assets in the portfolio, the variable denotes the vector of portfolio weights that represent the amount of capital to be invested in each asset, is the vector of expected returns of the different assets, and is a given total return. In such case, the solution of (51) is sparse, which also verifies why some researchers try to find the sparse solution of the problem (51), see, for example, .
For Example 10, we test eleven large-scale experiments, in which matrix is generated in the same way as that of Example 9 and are, respectively, generated by the MATLAB inner functions and . Table 2 reports some experimental results of this example with different dimension , where the tolerance error of the algorithm is set as . The notations IT, CPU, ERR, and ROB are the same meanings as mentioned in Example 9, the number of the nonzero entries of the solution is denoted by , and the sparsity ratio is defined as . The convergence curves of the residual of the objective and the iterative error of the solution for Example 10 with are depicted in Figure 2.
An outstanding observation from Table 2 is that both the number of the iteration (<25) and the CPU time (<16 s) are small, and the CPU time increases along with the increase of the dimension of . Another observation is that the sparsity ratio of the solution is over , which implies that more than half of the assets are not necessary to be invested and also provides some useful suggestions for an investor in finance. Both Table 2 and Figure 2 show that the P-ADMM (7) is robust for solving the large-scale Markowitz portfolio optimization problem.
Instead of studying the optimization problem with one linear constraint, in this paper, we concentrate our attentions on the generalized bilinear programming problem and develop a preconditioned alternating direction method. Based on the traditional proof of the ADMM, the global convergence of the proposed method is proved and the worst-case convergence rate in an ergodic sense is established. In order to avoid the case that the subproblem has no explicit solution, we still use a linearized strategy to approximately tackle the involved subproblems in the proposed method. Numerical results show that the proposed method is feasible and efficient.
Nowadays, many researchers are interested in the ADMM which can be regarded as an alternating update method for the variables and Lagrange multipliers for the separable convex programming. For the nonseparable convex optimization problem, the famous Taylor formula motivates us to use the first-order approximation to linearly deal with the objective function of the problem, and then one can design the corresponding ADMM to solve it. Noticing that the proposed method in current paper can be applied to the above scenarios and can be also used to solve the matrix minimization problem with two linear constraints.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
H. Konno, P. T. Thach, and H. Tuy, Optimization on Low Rank Nonconvex Structures, Springer, Dordrecht, Netherland, 1997.
H. M. Markowitz, Portfolio Selection: Efficient Diversication of Investment, John Wiley and Sons, New York, NY, USA, 1959.
J.-J. Moreau, “Proximite et dualite dans un espace hilbertien,” Bulletin of the French Mathematical Society, vol. 93, pp. 273–299, 1965.View at: Google Scholar | MathSciNet
M. R. Hestenes, “Multiplier and gradient methods,” Journal of Optimization Theory and Applications, vol. 4, pp. 303–320, 1969.View at: Publisher Site | Google Scholar | MathSciNet
R. T. Rockafellar, “Augmented Lagrangians and applications of the proximal point algorithm in convex programming,” Mathematics of Operations Research, vol. 1, no. 2, pp. 97–116, 1976.View at: Publisher Site | Google Scholar | MathSciNet
B. He, H.-K. Xu, and X. Yuan, “On the proximal Jacobian decomposition of ALM for multiple-block separable convex minimization problems and its relationship to ADMM,” Journal of Scientific Computing, vol. 66, no. 3, pp. 1204–1217, 2016.View at: Publisher Site | Google Scholar | MathSciNet
J. C. Bai, H. C. Zhang, J. C. Li, and F. M. Xu, “Generalized symmetric alternating direction method for separable convex programming,” http://www.optimization-online.org/DB_FILE/2016/10/5699.pdf.View at: Google Scholar
W. Deng and W. Yin, “On the global and linear convergence of the generalized alternating direction method of multipliers,” Journal of Scientific Computing, vol. 66, no. 3, pp. 889–916, 2016.View at: Publisher Site | Google Scholar | MathSciNet
E. X. Fang, B. He, H. Liu, and X. Yuan, “Generalized alternating direction method of multipliers: new theoretical insights and applications,” Mathematical Programming Computation, vol. 7, no. 2, pp. 149–187, 2015.View at: Publisher Site | Google Scholar | MathSciNet
B. He and X. Yuan, “On the O(1/n) convergence rate of the Douglas-Rachford alternating direction method,” SIAM Journal on Numerical Analysis, vol. 50, no. 2, pp. 700–709, 2012.View at: Publisher Site | Google Scholar | MathSciNet
B. He and X. Yuan, “Block-wise alternating direction method of multipliers for multiple-block convex programming and beyond,” SMAI Journal of Computational Mathematics, vol. 1, pp. 145–174, 2015.View at: Publisher Site | Google Scholar | MathSciNet
D. Sun, K.-C. Toh, and L. Yang, “A convergent 3-block semiproximal alternating direction method of multipliers for conic programming with 4-type constraints,” SIAM Journal on Optimization, vol. 25, no. 2, pp. 882–915, 2015.View at: Publisher Site | Google Scholar | MathSciNet
J. J. Wang and W. Song, “An algorithm twisted from generalized ADMM for multi-block separable convex minimization models,” Journal of Computational and Applied Mathematics, vol. 309, pp. 342–358, 2017.View at: Publisher Site | Google Scholar | MathSciNet
J. Yang, W. Yin, Y. Zhang, and Y. Wang, “A fast algorithm for edge-preserving variational multichannel image restoration,” SIAM Journal on Imaging Sciences, vol. 2, no. 2, pp. 569–592, 2009.View at: Publisher Site | Google Scholar | MathSciNet
B. He, F. Ma, and X. Yuan, “Convergence study on the symmetric version of ADMM with larger step sizes,” SIAM Journal on Imaging Sciences, vol. 9, no. 3, pp. 1467–1501, 2016.View at: Publisher Site | Google Scholar | MathSciNet
T. Goldstein, B. O'Donoghue, S. Setzer, and R. Baraniuk, “Fast alternating direction optimization methods,” SIAM Journal on Imaging Sciences, vol. 7, no. 3, pp. 1588–1623, 2014.View at: Publisher Site | Google Scholar | MathSciNet
M. Tao and X. Yuan, “Recovering low-rank and sparse components of matrices from incomplete and noisy observations,” SIAM Journal on Optimization, vol. 21, no. 1, pp. 57–81, 2011.View at: Publisher Site | Google Scholar | MathSciNet
X. Chen and S. Xiang, “Sparse solutions of linear complementarity problems,” Mathematical Programming, vol. 159, no. 1-2, Ser. A, pp. 539–556, 2016.View at: Publisher Site | Google Scholar | MathSciNet