Research Article  Open Access
A Decomposition Algorithm for Convex Nondifferentiable Minimization with Errors
Abstract
A decomposition algorithm based on proximal bundletype method with inexact data is presented for minimizing an unconstrained nonsmooth convex function . At each iteration, only the approximate evaluation of and its approximate subgradients are required which make the algorithm easier to implement. It is shown that every cluster of the sequence of iterates generated by the proposed algorithm is an exact solution of the unconstrained minimization problem. Numerical tests emphasize the theoretical findings.
1. Introduction
Consider minimizing the following problem: where is a nondifferentiable convex function. It is well known that many practical problems can be formulated as (1.1), for example the problem of catastrophe, ruin, vitality, data mining, and finance. A classical conceptual algorithm for solving (1.1) is the proximal point method, based on the MoreauYosida regulation of [1, 2]. Implementable forms of the method can be obtained by means of a bundle technique, alternating serious steps with sequences of null steps [3, 4].
More recently, new conceptual schemes for solving (1.1) have been developed by using an approach that is somewhat different from MoreauYosida regularization. This is the theory introduced in [5]; see also [6, 7]. The idea is to decompose into two orthogonal subspace and at a point that the nonsmoothness of is concentrated essentially on , and the smoothness of appears on subspace. More precisely, for a given , where denotes the subdifferential of at in the sense of convex analysis, then can be decomposed as direct sum of two orthogonal subspaces, that is, , where and . They define the Lagrangian, an approximation of the original function, which can be used to create a secondorder expansion of along certain manifolds.
Mifflin and Sagastizábal design a algorithm for convex function in [8]. This algorithm brings the iterate to the primal track with the help of bundle subroutine. Then the Newton step is performed to gain superlinear decrease of the distance to solution. In order to implement this algorithm, [9] gives an algorithm which only uses the subgradients. However, this algorithm is conceptual in the sense that it needs to compute the exact function values of the objective function, which is difficult to evaluate them. For example, consider the situation of Lagrangian relaxation. The primal problem is where is a compact subset of and , . Lagrangian relaxation of the equality constraints in the problem leads to the following problem where is the dual function. Trying to solve problem (1.2) by means of solving its dual problem (1.3) makes sense in many situations. In this case, evaluating the function value and a subgradient requires solving the optimization problem (1.4) exactly. Actually, in some cases, computing exact values of is unnecessary and inefficient. For this reason, some modifications of bundle methods in [9] were needed.
The paper is organized as follows. In the next section we present the approximate Lagrangian based on the approximate subgradient. Then we design a conceptual Algorithm 2.6 which can deal with the approximate subgradients and approximate function values. Section 3 breaks into 3 parts. In the first part, we propose the approximate primaldual track. The proximal bundletype subroutine with inexact data is introduce in the second part. The third part of Section 3 is devoted to establishing an implemental Algorithm 3.5 which substitutes the approximate Vstep in Algorithm 2.6 with proximal bundle subroutine. Numerical testing of the resulting Algorithm 3.5 is reported in the final section.
2. A Conceptual Approximate Decomposition Algorithm
2.1. Approximate Lagrangian and Its Properties
In some cases, computing exact values of the objective function and exact subgradient is unnecessary and inefficient. For this reason, some modification of the Lagrangian will be proposed in this section. We assume that satisfies Introducing this to [5], one can restate the definition of Lagrangian and its properties as follows.
Definition 2.1. Assume (2.1). The approximate Lagrangian of , denoted by , is defined as follows. and is defined by
Theorem 2.2. Assume (2.1). Then the following assertions are true:(i)the function defined in (2.2) is convex and finite everywhere;(ii);(iii) and .
Theorem 2.3. Assume (2.1) and . Then one has that
Remark 2.4. Assume (1.2). If , then the approximate Lagrangian in this paper is exactly the Lagrangian in [5].
2.2. Approximate Decomposition Algorithm Frame
In order to give an approximate decomposition algorithm frame, we restate the definition of Hessian matrix in [5] as follows.
Definition 2.5. Assume that is finite, is fixed, and satisfies (2.1). We say that has at a Hessian associated with if has a generalized Hessian at 0, setting
Assume (2.1), we investigate an approximate decomposition algorithm frame based on the definition of the approximate Lagrangian.
Algorithm 2.6. Step 0. Initiation
satisfies for all , , where and .
Step 1. Stop if .
Step 2. Approximate step.
Compute an optimal solution satisfying
Set .
Step 3. step. Make a Newton step in .
Compute that satisfies for all , , where , such that
Compute the solution satisfying
Set and .
Step 4. Updateset
Compute that satisfies for all , , where . Set and go to Step 1.
Theorem 2.7. Assume (2.1) and has a positive definite Hessian at , a minimizer of . Then the iterate points constructed by Algorithm 2.6 satisfy
Remark 2.8. If , this algorithm is the same as Algorithm 4.5 in [5]. However, it only uses the approximate objective function values which make the algorithm easier to implement.
3. Approximate Decomposition Algorithm
Since the Algorithm 2.6 in Section 2 relies on knowing the subspaces and and converges only locally, it needs significant modification. In [10], Mifflin and Sagastizábal show that a proximal point sequence follows primal track near a minimizer. This opens the way for defining a algorithm where steps are replaced by proximal steps. In addition, the proximal step can be estimated with a bundle technique which also can approximate the unknown and subspaces as a computational byproduct. Therefore, they establish Algorithm 6 in [8] by combing the bundle subroutine with the space decomposition method. However, this algorithm needs the exact function values and exact subgradients, which is expensive to compute. Therefore, the study of using approximate values instead of the exact ones is deserving.
3.1. Approximate PrimalDual Track
Given a positive scalar parameter , the proximal point function depending on is defined by where stands for the Euclidean norm. It has the property: .
Similarly to the definition of primal track, we define the approximate primal track.
Definition 3.1. For any , , we say that is an approximate primal track leading to , a minimizer of , if for all small enough, it satisfies the following:(i), where ;(ii) is a function satisfying for all satisfies (2.1);(iii)the Jacobian is a basis matrix for ;(iv)the particular Lagrangian is a function.
Accordingly, we have the approximate dual track denoted by corresponding to the approximate primal track. More precisely,
In fact, if , the approximate primaldual track is exactly the primaldual track shown in [8].
The next lemma addresses that making an approximate step in Algorithm 2.6 essentially amounts to finding a corresponding approximate primal track point.
Lemma 3.2. Let be an approximate primal track leading to , a minimizer of , and let . Then for all sufficiently small is the unique minimizer of on the affine set .
Proof. Since is a basis for , Theorem 3.4 in [10] with gives the result.
3.2. The Proximal BundleType Subroutine with Inexact Data
Throughout this section, we make the following assumption: at each given point , and for , we can find some and satisfying where . At the same time, it can be ensured that where and for given . The condition (3.3) means that . This setting is realistic in many applications; see [11].
The bundle method includes two phases. (i) The first phase makes use of the information in bundles to establish a polyhedral approximation of at the actual iterate . (ii) Due to the kinky structure of , the model is possibly not precise for approximation . Then, more information around the actual iterate is mobilized to obtain a more reliable model. Feature (i) leads to the following approximation of at . Let denote the index set at the th iteration with each representing , where and satisfy for given . From the choices of and , we have that, for all and for all , where
On the basis of the above observation, we attempt to explore the possibility of utilizing the approximate subgradient and approximate function values instead of the exact ones. We approximate at from below by a piecewise linear convex function of the form:
Since (3.8) becomes more and more crude if an approximation of is farther away from , we add the proximal term , to it. To approximate an proximal point, we solve the first quadratic programming subproblem Its corresponding dual problem is Let and denote the optimal solution of (3.9) and (3.10), then it is easily seen that In addition, for all such that and The vector is an estimate of an approximate proximal point. Hence, it approximates an approximate primal track point when the latter exists. To proceed further we let and compute , , and satisfying .
An approximate dual path point, denoted by , is constructed by solving a second quadratic problem, which depends on a new index set The second quadratic programming problem is It has a dual problem Similar to (3.11), the respective solutions, denoted by and , satisfy
Given , the proximal bundle subprocedure is terminated and is declared to be an approximation of if Otherwise, above is replaced by , and new iterate data are computed by solving updated subproblems (3.9) and (3.14). This update, appending to active data at (3.9), ensures convergence to a minimizing point in case of nontermination.
Remark 3.3. From the talking above, the following results are true:(i); (ii)since is an approximate primal track point approximated by and approximates , from (3.2) the corresponding is estimated by ;(iii)we can obtain the by means of the following iteration.Let Then, from (3.16), , so for all such and for a fixed . Define a full column rank matrix by choosing the largest number of indices satisfying (3.19) such that the corresponding vectors are linearly independent and by letting these vectors be the columns of . Then let be a matrix whose columns form an orthonormal basis for the null space of with if is vacuous.
Theorem 3.4. At the th iteration, the above proximal bundle subprocedure satisfies the following:(i), where and ;(ii) and ;(iii);(iv), where ;(v)for any parameter , (3.17) implies
Proof. (i) Since satisfies and , satisfies
where , so the result of item (i) holds for . From the definition of , and we have that for all in
so for all such ,
In addition,
Adding to this inequality gives
which means that for .
(ii) Multiplying each inequality in (3.25) by its corresponding multiplier and summing these inequalities, we have
Using the definition of from (3.16) and the fact that gives
which means that . In a similar manner, this time using the multipliers that solve dual problem (3.10) and define in (3.11), together with , obtains the result.
(iii) Since , we have
From (ii): , we get
Therefore,
that is, . Then, since the expression for from (3.11) written in the form
combined with implies that , we obtain item (iii).
(iv) From (3.10), (3.11), (3.31), and the definition of , we have that is in the convex hull of . We obtain the result by virtue of the minimum norm property of .
(v) Since and , we have . Thus if (3.17) holds then . Together with the definition of , (3.12) and the nonnegativity of gives
Finally, combing this inequality with item (iv) gives (3.20).
3.3. Approximate Decomposition Algorithm and Convergence Analysis
Substituting the approximate step in Algorithm 2.6 with proximal bundletype subroutine, we present an approximate decomposition algorithm as follows. Afterwards a detailed convergence analysis is given. The main statement comprises the fact that each cluster point of the sequence of iterates generated by the algorithm is an optimal solution.
Algorithm 3.5. Choose a starting point and positive parameters , and with , .
Step 0. Compute satisfying , where . Let be a matrix with orthonormal dimensional columns estimating an optimal basis. Set and .
Step 1. Stop if .
Step 2. Choose an positive definite matrix , where is the number of columns of .
Step 3. Compute an approximate Newton step by solving the linear system
Set .
Step 4. Choose , initialize , and run the bundle subprocedure with . Compute recursively, and set .
Step 5. If
then set
Otherwise, execute a line search
reinitialize , and rerun the bundle subroutine with , to find new values for , then set .
Step 6. Replace by and go to Step 1.
Remark 3.6. In this algorithm, . If , this algorithm is the same as Algorithm 6 in [8]. However, this algorithm uses proximal bundletype subroutine which can deal with the approximate subgradients and the approximate function values.
Theorem 3.7. One of the following two cases is true:(i)if the proximal bundle procedure in Algorithm 3.5 does not terminate, that is, if (3.17) never hold, then the sequence of values converges to and is a minimizer of ;(ii)if the procedure terminates with , then the corresponding equals and is a minimizer of .
Proof. By ([12], Prop. 4.3), if this procedure does not terminate then it generates an infinite sequence of values and value converging to zero. Since (3.17) does not hold, the sequence of values also converges to 0. Thus, item (iii) in Theorem 3.4 implies that . And Theorem 3.4 (ii) gives By the continuity of , this becomes The termination case with follows in a similar manner, since (3.17) implies in this case.
The next theorem establishes the convergence of Algorithm 3.5, and the proof of which is similar to Theorem 9 in [8].
Theorem 3.8. Suppose that the sequence in Algorithm 3.5 is bounded above by . Then the following hold:(i)the sequence is decreasing and either or and both converge to 0;(ii)if is bounded from below, then any accumulation point of is a minimizers of .
Proof. In this paper, the inequalities of (3.15), (3.16), and (3.17) in [8] become
since , (3.39) implies that is decreasing. Suppose . Then summing (3.39) over and using the fact that for all implies that . Then (3.41) with and implies that , which establishes (i).
Now suppose is bounded from below and is any accumulation point of . Then, because , , and converge to 0, (3.40) together with the continuity of implies that for all and (ii) is proved.
4. An Illustrated Numerical Example
We test some examples in this section to validate the effectiveness of the proposed algorithm. The platform is Matlab R2009a, Intel Pentium(R) Dual CPU T2390 1.87 GHz. All test examples are of the form where is finite and each is on .
For our runs we used the following examples:(i)F2d: the objective function is given in [8], defined for by(ii)F3d: four functions of three variables, where denotes the corresponding dimension of the subspace. Given and four parameter vectors , for where , and .
In Table 1, we show some relevant data for the problems, including the dimensions of and , the (known) optimal values and solutions, and the starting points.
 
The italic data in Table 1 is calculated by our algorithms. 
We calculate an subgradient at by using the method in [13]: , where is a subgradient at and is a subgradient at a point such that . Here and are randomly chosen. The approximate function value is randomly taken out from the interval . The radius is adjusted iteratively in the following way: If we find the linearization error then is reduced by a multiple smaller than one. On the other hand, if is significantly smaller than , then is increased by a multiple greater than one. When in the algorithm, then Hessian at is computed in the following form: , where , is an active index such that correspond to via (3.16).
The parameters have values , , , , and equal to the identity matrix. As for , one can refer to [8].
Table 2 shows the results of Algorithm 3.5 for these examples, compared with Algorithm 6 in [8]. Number of denotes the number of evaluation of the function and subgradient (subgradient) in Algorithm 6 and Algorithm 3.5. is the calculated solution, stands for the difference between the function values at and .

It is shown in Table 2 that we obtain quite accurate solutions by Algorithm 3.5 with inexact data costing a slightly more evaluation number than that with exact data. One noticeable exceptional occurs in the example F3d1; it seems that the decomposition algorithm is sensible with exact data, but is more stable when applying inexact data (function values and subgradients). This favorable results demonstrate that it is suitable to use approximate decomposition algorithm to solve (1.1) numerically.
Acknowledgment
The research is supported by the National Natural Science Foundation of China under Project no. 11171049 and no. 11171138.
References
 J. J. Moreau, “Proximitéet dualité dans un espace hilbertien,” Bulletin de la Société Mathématique de France, vol. 93, pp. 273–299, 1965. View at: Google Scholar
 R. T. Rockafellar, “Monotone operators and the proximal point algorithm,” SIAM Journal on Control and Optimization, vol. 14, no. 5, pp. 877–898, 1976. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 A. Auslender, “Numerical methods for nondifferentiable convex optimization,” Mathematical Programming Study, no. 30, pp. 102–126, 1987. View at: Google Scholar  Zentralblatt MATH
 J.B. HiriartUrruty and C. Lemaréchal, Convex Analysis and Minimization AlgoRithms, vol. 2, Springer, 1993.
 C. Lemaréchal, F. Oustry, and C. Sagastizábal, “The $\mathrm{\&\#x1D4B0;}$lagrangian of a convex function,” Transactions of the American Mathematical Society, vol. 352, no. 2, pp. 711–729, 2000. View at: Publisher Site  Google Scholar
 R. Mifflin and C. Sagastizábal, “$\mathrm{\&\#x1D4B1;}$$\mathrm{\&\#x1D4B0;}$decomposition derivatives for convex maxfunctions,” in IllPosed Variational Problems and Regularization Techniques, vol. 477 of Lecture Notes in Economics and Mathematical Systems, pp. 167–186, Springer, Berlin, Germany, 1999. View at: Google Scholar  Zentralblatt MATH
 R. Mifflin and C. Sagastizábal, “On $\mathrm{\&\#x1D4B1;}\mathrm{\&\#x1D4B0;}$theory for functions with primaldual gradient structure,” SIAM Journal on Optimization, vol. 11, no. 2, pp. 547–571, 2000. View at: Publisher Site  Google Scholar
 R. Mifflin and C. Sagastizábal, “A $\mathrm{\&\#x1D4B1;}\mathrm{\&\#x1D4B0;}$algorithm for convex minimization,” Mathematical Programming B, vol. 104, no. 23, pp. 583–608, 2005. View at: Publisher Site  Google Scholar
 Y. Lu, L.P. Pang, X.J. Liang, and Z.Q. Xia, “An approximate decomposition algorithm for convex minimization,” Journal of Computational and Applied Mathematics, vol. 234, no. 3, pp. 658–666, 2010. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 R. Mifflin and C. Sagastizábal, “Proximal points are on the fast track,” Journal of Convex Analysis, vol. 9, no. 2, pp. 563–579, 2002. View at: Google Scholar  Zentralblatt MATH
 M. V. Solodov, “On approximations with finite precision in bundle methods for nonsmooth optimization,” Journal of Optimization Theory and Applications, vol. 119, no. 1, pp. 151–165, 2003. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 R. Correa and C. Lemaréchal, “Convergence of some algorithms for convex minimization,” Mathematical Programming B, vol. 62, no. 2, pp. 261–275, 1993. View at: Publisher Site  Google Scholar  Zentralblatt MATH
 M. Hintermüller, “A proximal bundle method based on approximate subgradients,” Computational Optimization and Applications, vol. 20, no. 3, pp. 245–266, 2001. View at: Publisher Site  Google Scholar  Zentralblatt MATH
Copyright
Copyright © 2012 Yuan Lu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.