A New Augmented Lagrangian Method for Equality Constrained Optimization with Simple Unconstrained Subproblem
We propose a new method for equality constrained optimization based on augmented Lagrangian method. We construct an unconstrained subproblem by adding an adaptive quadratic term to the quadratic model of augmented Lagrangian function. In each iteration, we solve this unconstrained subproblem to obtain the trial step. The main feature of this work is that the subproblem can be more easily solved. Numerical results show that this method is effective.
In this paper, we consider the following equality constrained optimization:where , , and are twice continuously differentiable.
The method presented in this paper is a variant of the augmented Lagrangian method (denoted by AL). In the late 1960s, AL method was proposed by Hestenes  and Powell . Later Conn et al. [3, 4] presented a practical AL method and proved the global convergence under the LICQ condition. Since then, AL method attracted the attentions of many scholars and many variants were presented (see [5–11]). Up to now, there are many computer packages based on AL method, such as LANCELOT  and ALGENCAN [5, 6]. In the past decades, AL method was fully developed. Attracted by its well performance, there are still many scholars devoted to research AL method and its applications in recent years (see [7, 8, 11–15]).
For (1), we define the Lagrangian functionand the augmented Lagrangian functionwhere is called the Lagrangian multiplier and is called the penalty parameter. In this paper, refers to the Euclidean norm.
In a typical AL method, at the th step, for given multiplier and penalty parameter , an unconstrained subproblem is solved to find the next iteration point. Then, the multiplier and penalty parameter are updated by some rules. For convenience, for given and , we define
Motivated by the regularized Newton method for unconstrained optimization (see [16–19]), we construct a new subproblem of (1). At the th iteration point , is approximated by the following quadratic model: where and is a positive semidefinite approximation of . Letthus we haveIn [14, 15], is minimized within a trust region to find the next iteration point. Motivated by the regularized Newton method, we add a regularization term to the quadratic model and definewhere is called regularized parameter. At the th step of our algorithm, we solve the following convex unconstrained quadratic subproblem:for finding the trial step . Then, we compute the ratio between the actual reduction and predicted reductionWhen is close to , we accept as the next iteration point. At the same time, we think the quadratic model is a sufficiently “good” approximation of and reduce the value of . Conversely, when is close to zero, we set and increase the value of , by which we wish to reduce the length of the next trial step. This technique is similar to the update rule of trust region radius. Actually, sufficiently large indeed reduces the length of the trial step . However, the regularized parameter is different from trust region radius. In [14, 15], the authors construct a trust region subproblemThe exact solution of (12) satisfies the first-order critical conditions if there exists some such that is positive semidefinite andwhile the first-order critical condition of (10) isEquations (13) and (15) can show the similarities and differences between regularized subproblem (10) and trust region subproblem (12). It seems that the parameter plays a role similar to the multiplier in the trust region subproblem. But, actually, the update rule of (see (26)) shows that is not the approximation of . The update of depends on the quality of last trial step and has no direct relation with system (13).
To establish the global convergence of an algorithm, some kind of constraint qualification is required. There are many well-known constraint qualifications, such as LICQ, MFCQ, CRCQ, RCR, CPLD, and RCPLD. In case there are only equality constraints, LICQ is equivalent to MFCQ in which has full rank; CRCQ is equivalent to CPLD in which any subset of maintains constant rank in a neighborhood of ; RCR is equivalent to RCPLD in which maintains constant rank in a neighborhood of . RCPLD is weaker than CRCQ, and CRCQ is weaker than LICQ. In this paper, we use RCPLD which is defined in the following.
Definition 1. One says that RCPLD holds at a feasible point of (1), if there exists a neighborhood of such that maintains constant rank for all .
The rest of this paper is organized as follows. In Section 2, we give a detailed description of the presented algorithm. The global convergence is proved in Section 3. In Section 4, we present the numerical experiments. Some conclusions are given in Section 5.
Notations. For convenience, we abbreviate to , to , to , and to . In this paper, denotes the th component of the vector .
In this section, we give a detailed description of the proposed algorithm.
As mentioned in Section 1, we solve the unconstrained subproblem (10) to obtain the trial step . Since is at least positive semidefinite and , is positive definite as Therefore, (10) is a strictly convex quadratic unconstrained optimization. solves (10) if and only ifholds. Global convergence does not depend on the exact solution of (15), although the linear system (15) is easy to be solved. For minimizer of (10) along the direction , specifically, we consider the following subproblem:If , then the minimizer of (16) is . Therefore, at the th step, it follows thatBy direct calculation, we have thatIn Section 3, we always suppose that (18) holds.
In a typical AL algorithm, the update rule of depends on the improvement of constraint violation. A commonly used update rule is that if , where , one may think that the constraint violation is reduced sufficiently and thus is a good choice. Otherwise, if , one thinks that current penalty parameter can not sufficiently reduce the constraint violation and increase it in the next iteration. In , Yuan proposed a different update rule of for trust region algorithm. Specifically, if is increased. In (19), is an auxiliary parameter such that tends to zero. We slightly modify (19) in our algorithm. Specifically, if is increased.
In typical AL method, next iteration point is obtained by minimizing . In most AL methods, satisfies that , where is controlling parameter which tends to zero. As when is sufficiently small, is a good estimate of the next multiplier . As we obtain by minimizing , the critical point of has no direct relation to . Therefore, the update rule does not suit our algorithm. We obtain by approximately solving the following least squares problem:
Most AL algorithms require that is bounded to ensure the global convergence. Hence, all components of are restricted to certain interval . This technique is also used in our algorithm.
Now, we give the detailed algorithm in the following.
Step 0 (initialization). Choose the parameters , , . Determine , , , , . Let Set .
Step 1 (termination test). If and , return as a KKT point. If , , and , return as an infeasible KKT point.
Step 2 (determine the trial step). Evaluate the trial step by solvingsuch that (18) holds. Compute the ratio between the actual reduction to the predicted reductionwhere , . SetStep 3 (update the penalty parameter). IfsetOtherwise, setStep 4 (update the multiplier). If , set . Evaluate byand letIf , set and .
Set and go to Step 1.
Remark 3. In practical calculation, it is not required to solve (30) exactly to find . In our implementation of Algorithm 2, we use the Matlab subroutine minres to find an approximate solution of the linear system and take it as an approximation of .
3. Global Convergence
In this section, we discuss the global convergence of Algorithm 2. We assume that Algorithm 2 can find an infinite set and give some assumptions in the following.
Assumptions 1. (A1) and are twice continuously differentiable.
(A2) and are bounded, where is positive semidefinite approximation of .
Firstly, we give a result on the upper bound of the trial step.
Lemma 4. If solves subproblem (23), then one has
Proof. Any approximate solution of (23) satisfies . Clearly,If , then (32) holds. If , implies that or . Thus we can obtain (32).
Now, we discuss convergence properties in two cases. One is that the penalty parameter tends to and the other is that is bounded.
3.1. The Case of
Lemma 5. Suppose that (A1)-(A2) hold and ; then there exists a constant such that .
Proof. See Lemma 3.1 in Wang and Yuan .
In Lemma 5, if , then any accumulation point of is infeasible. Sometimes (1) is naturally infeasible; in other words, the feasible set is empty. In this case, we wish to find a minimizer of constraint violation. Specifically, we wish to solveThe solution of this problem is characterized byIn the next theorem, we show that if is not convergent to zero, at least one of the accumulation points of satisfies (35).
Theorem 6. Suppose that (A1)-(A2) hold and . If , then
Proof. We prove this result by contradiction. Suppose that there exists some such thatBy the definition of in (7), we know thatAs and are bounded, we can deduce the boundedness of by (A2); that is, there exists some such thatBy (37), (38), and (39), we can conclude thatholds for all sufficiently large . By the boundedness of and , we can conclude that there exists , such thatholds for all sufficiently large , where is defined by (7). By (18), (40), and (41),holds for all sufficiently large . By the update rule of and the fact that , we have thatholds for infinitely many . As holds for all sufficiently by (40), it is easy to see that (42) contradicts to (43) as and is convergent. Thus we can prove the desired result.
Lemma 7. Suppose that (A1)-(A2) hold, , and ; then
Proof. Assume that there exists such thatThen, by (18) and (41), we know that, for all sufficiently large ,By the update rule of and ,holds for infinitely many . We will prove that (47) contradicts to (46). Let be the index set containing all such that (47) holds. Therefore, for all ,If there exists an infinite subset such that holds for all , then, by (48), it holds thatAs , , and , (50) implies thatholds for all sufficiently large . If there exists an infinite subset such that holds for all , then by (48) we have thatfor all . Equation (52) also implies (51) as and . From (28) and (29), it follows that holds for all . Therefore, by (49) we know that, for all ,As , (53) implies thatholds for all sufficiently large . Thus from (47), we obtain (51) and (54) which contradict to (46). Thus we can complete the proof.
Theorem 8. Suppose that (A1)-(A2) hold. If and , then there exists one cluster point of such that is a KKT point of (1) or the RCPLD condition does not hold at .
Proof. Under the assumptions of this theorem, Lemma 7 implies that there exists an index set such that converge to some ,where is defined in (7). With the help of Theorem 2 in Andreani et al. , (55) imply that is a KKT point or the RCPLD condition does not hold at .
3.2. The Case of Being Bounded
In this subsection, without loss of generality, we assume that for all . Thus by the update rule (29), we have that andholds for all . As remains constant, it follows from (A1) and (A2) that and are all bounded. If we define the index setthen for .
Lemma 9. Suppose that (A1)-(A2) hold and for all . If , as , and there exists some constant such thatthen is divergent.
Proof. We prove this lemma by contradiction. We will show that if is convergent, then holds for all sufficiently large which contradicts to the fact that , as .
Suppose that and (32) imply thatholds for all sufficiently large . By the definition of ,Equations (59)–(61) imply that is convergent. LetIt is clear that . By Taylor’s theorem, it holds thatwhere is a convex combination of and . According to (60), we have thatholds for all sufficiently large and thusThe convergence of and the boundedness of imply that . Therefore, for all sufficiently large . This implies that and .
Lemma 10. Suppose that (A1)-(A2) hold and for all ; then we have that
Proof. Firstly, we prove that the sum of is bounded. Define the indices setwhere is defined by Steps 0 and 4 in Algorithm 2. From Step 4 of Algorithm 2, we know that if , then and . Hence we havewhere is the upper bound of . From Step 4 and (67), we havewhich impliesThen, we havewhere is defined by (5).
Secondly, we proveby contradiction. Suppose that there exists some such thatEquations (56) and (73) imply thatConsidering the sum of on the index set (see (57)), we have by (71) thatIt can be deduced by (74) and (75) that and thusIf is a finite set, then it follows from (57) and Step 2 that and hold for all sufficiently large . Therefore, , as . If is an infinite set, the second inequality in (77) implies that , as and . From Step 2, we know that if , then . Hence, we have , as . The fact that and (74) imply that holds for all sufficiently large . Hence it can be deduced by Lemma 9 that is divergent which contradicts the first part in (77).
Finally, we prove (66). If is a finite set, then is convergent. Thus, (72) implies (66). From now on we assume that is an infinite set. Suppose that (66) does not hold; then there exist an infinite index set () and a constant such thatBy (72), there also exists an infinite index set () such that ,Let ; then and is an infinite index set,Therefore, by (75), we have thatWith the help of (56), (80), and (83), we obtain thatA direct conclusion which can be drawn by (84) isThus by Lemma 4, we have that, for all sufficiently large ,Therefore, for sufficiently large ,Equations (84) and (87) imply that , as . Therefore (79) contradicts (81). Thus we complete the proof.
Lemma 11. Suppose that (A1)-(A2) hold and for all ; then we have
Proof. Suppose that (88) does not hold. Then there exists , such thatBy (18), we have thatAs is bounded above, similar to the second part in the proof of Lemma 10, we can conclude thatand thusBy (75) and (92), we have that is convergent and thus is also convergent as is bounded. However, Lemma 9, (91), (92), and the boundedness of deduce the divergence of . This contradiction completes the proof.
With the help of Lemmas 10 and 11, we can easily obtain the following result.
Theorem 12. Suppose that (A1)-(A2) hold and for all ; there exists an accumulation point of at which the KKT condition holds.
Note that, in Theorem 12, we do not suppose that RCPLD holds.
4. Numerical Experiment
In this Section, we investigate the performance of Algorithm 2. We compare Algorithm 2 with the famous Fortran package ALGENCAN. In our computer program, the parameters in Algorithm 2 are chosen as follows:We set to be the exact Hessian of the Lagrangian at the point . The Matlab subroutine minres is used to solve (15). All algorithms are terminated when one of the following conditions holds: and ; and ; . All test problems are chosen from CUTEst collection .
The numerical results are listed in Table 1 where the name of problem is denoted by , the number of its variables is denoted by , the number of constraints is denoted by , the number of function evaluations is denoted by , and the number of gradient evaluations is denoted by . In Table 1, we list the results of 38 test problems. Considering the numbers of function evaluations (), Algorithm 2 is better than ALGENCAN for 30 cases (78.9%). Considering the numbers of gradient evaluations (), Algorithm 2 is better than ALGENCAN for 31 cases (81.6%).
In this paper, we present a new algorithm for equality constrained optimization. We add an adaptive quadratic term to the quadratic model of the augmented Lagrangian function. In each iteration, we solve a simple unconstrained subproblem to obtain the trail step. The global convergence is established under reasonable assumptions.
From the numerical results and the theoretical analysis, we believe that the new algorithm can efficiently solve equality constrained optimization problems.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This work is supported by NSFC (11771210, 11471159, 11571169, and 61661136001) and the Natural Science Foundation of Jiangsu Province (BK20141409).
M. R. Hestenes, “Multiplier and gradient methods,” Journal of Optimization Theory and Applications, vol. 4, pp. 303–320, 1969.View at: Publisher Site | Google Scholar | MathSciNet
M. J. D. Powell, “A method for nonlinear constraints in minimization problems,” in Optimization, R. Fletcher, Ed., pp. 283–298, Academic Press, New York, NY, USA, 1969.View at: Google Scholar | MathSciNet
A. R. Conn, N. I. Gould, and P. L. Toint, “A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds,” SIAM Journal on Numerical Analysis, vol. 28, no. 2, pp. 545–572, 1991.View at: Publisher Site | Google Scholar | MathSciNet
A. R. Conn, N. I. Gould, and P. L. Toint, ,LANCELOT: A Fortran Package for Large-scale Nonlinear Optimization(Release A), vol. 17 of Springer, New York, USA, 1992.
R. Andreani, E. G. Birgin, J. M. Martnez, and M. L. Schuverdt, “Augmented Lagrangian methods under the constant positive linear dependence constraint qualification,” Mathematical Programming, vol. 111, no. 1-2, Ser. B, pp. 5–32, 2008.View at: Publisher Site | Google Scholar | MathSciNet
R. Andreani, E. G. Birgin, J. M. Martnez, and M. L. Schuverdt, “On augmented Lagrangian methods with general lower-level constraints,” SIAM Journal on Optimization, vol. 18, no. 4, pp. 1286–1309, 2007.View at: Publisher Site | Google Scholar | MathSciNet
E. G. Birgin and J. M. Martnez, “Augmented Lagrangian method with nonmonotone penalty parameters for constrained optimization,” Computational Optimization and Applications, vol. 51, no. 3, pp. 941–965, 2012.View at: Publisher Site | Google Scholar | MathSciNet
E. G. Birgin and J. M. Martnez, “On the application of an augmented Lagrangian algorithm to some portfolio problems,” EURO Journal on Computational Optimization, vol. 4, no. 1, pp. 79–92, 2016.View at: Publisher Site | Google Scholar | MathSciNet
Z. Dostál, “Semi-monotonic inexact augmented Lagrangians for quadratic programming with equality constraints,” Optimization Methods & Software, vol. 20, no. 6, pp. 715–727, 2005.View at: Publisher Site | Google Scholar | MathSciNet
Z. Dostál, A. Friedlander, and S. A. Santos, “Augmented Lagrangians with adaptive precision control for quadratic programming with simple bounds and equality constraints,” SIAM Journal on Optimization, vol. 13, no. 4, pp. 1120–1140, 2003.View at: Publisher Site | Google Scholar | MathSciNet
F. E. Curtis, H. Jiang, and D. P. Robinson, “An adaptive augmented Lagrangian method for large-scale constrained optimization,” Mathematical Programming, vol. 152, no. 1-2, Ser. A, pp. 201–245, 2015.View at: Publisher Site | Google Scholar | MathSciNet
F. E. Curtis, N. I. Gould, H. Jiang, and D. P. Robinson, “Adaptive augmented Lagrangian methods: algorithms and practical numerical experience,” Optimization Methods & Software, vol. 31, no. 1, pp. 157–186, 2016.View at: Publisher Site | Google Scholar | MathSciNet
A. F. Izmailov, M. V. Solodov, and E. I. Uskov, “Global convergence of augmented Lagrangian methods applied to optimization problems with degenerate constraints, including problems with complementarity constraints,” SIAM Journal on Optimization, vol. 22, no. 4, pp. 1579–1606, 2012.View at: Publisher Site | Google Scholar | MathSciNet
L. Niu and Y. Yuan, “A new trust-region algorithm for nonlinear constrained optimization,” Journal of Computational Mathematics, vol. 28, no. 1, pp. 72–86, 2010.View at: Publisher Site | Google Scholar | MathSciNet
X. Wang and Y. Yuan, “An augmented Lagrangian trust region method for equality constrained optimization,” Optimization Methods & Software, vol. 30, no. 3, pp. 559–582, 2015.View at: Publisher Site | Google Scholar | MathSciNet
C. Cartis, N. I. Gould, and L. Toint, “Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results,” Mathematical Programming, vol. 127, no. 2, Ser. A, pp. 245–295, 2011.View at: Publisher Site | Google Scholar | MathSciNet
C. Cartis, N. I. Gould, and L. Toint, “Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity,” Mathematical Programming, vol. 130, no. 2, Ser. A, pp. 295–319, 2011.View at: Publisher Site | Google Scholar | MathSciNet
K. Ueda and N. Yamashita, “A regularized Newton method without line search for unconstrained optimization,” Computational Optimization and Applications, vol. 59, no. 1-2, pp. 321–351, 2014.View at: Publisher Site | Google Scholar | MathSciNet
H. Zhang and Q. Ni, “A new regularized quasi-Newton algorithm for unconstrained optimization,” Applied Mathematics and Computation, vol. 259, pp. 460–469, 2015.View at: Publisher Site | Google Scholar | MathSciNet
Y. X. Yuan, “On the convergence of a new trust region algorithm,” Numerische Mathematik, vol. 70, no. 4, pp. 515–539, 1995.View at: Publisher Site | Google Scholar | MathSciNet
R. Andreani, G. Haeser, M. a. Schuverdt, and P. J. Silva, “A relaxed constant positive linear dependence constraint qualification and applications,” Mathematical Programming, vol. 135, no. 1-2, Ser. A, pp. 255–273, 2012.View at: Publisher Site | Google Scholar | MathSciNet
N. I. M. Gould, D. Orban, and P. L. Toint, “CUTEst: a Constrained and Unconstrained Testing Environment with safe threads for mathematical optimization,” Computational Optimization and Applications, vol. 60, no. 3, pp. 545–557, 2015.View at: Publisher Site | Google Scholar | MathSciNet