A Decomposition Algorithm for Convex Nondifferentiable Minimization with Errors
A decomposition algorithm based on proximal bundle-type method with inexact data is presented for minimizing an unconstrained nonsmooth convex function . At each iteration, only the approximate evaluation of and its approximate subgradients are required which make the algorithm easier to implement. It is shown that every cluster of the sequence of iterates generated by the proposed algorithm is an exact solution of the unconstrained minimization problem. Numerical tests emphasize the theoretical findings.
Consider minimizing the following problem: where is a nondifferentiable convex function. It is well known that many practical problems can be formulated as (1.1), for example the problem of catastrophe, ruin, vitality, data mining, and finance. A classical conceptual algorithm for solving (1.1) is the proximal point method, based on the Moreau-Yosida regulation of [1, 2]. Implementable forms of the method can be obtained by means of a bundle technique, alternating serious steps with sequences of null steps [3, 4].
More recently, new conceptual schemes for solving (1.1) have been developed by using an approach that is somewhat different from Moreau-Yosida regularization. This is the -theory introduced in ; see also [6, 7]. The idea is to decompose into two orthogonal subspace and at a point that the nonsmoothness of is concentrated essentially on , and the smoothness of appears on -subspace. More precisely, for a given , where denotes the subdifferential of at in the sense of convex analysis, then can be decomposed as direct sum of two orthogonal subspaces, that is, , where and . They define the -Lagrangian, an approximation of the original function, which can be used to create a second-order expansion of along certain manifolds.
Mifflin and Sagastizábal design a -algorithm for convex function in . This algorithm brings the iterate to the primal track with the help of bundle subroutine. Then the -Newton step is performed to gain superlinear decrease of the distance to solution. In order to implement this algorithm,  gives an algorithm which only uses the subgradients. However, this algorithm is conceptual in the sense that it needs to compute the exact function values of the objective function, which is difficult to evaluate them. For example, consider the situation of Lagrangian relaxation. The primal problem is where is a compact subset of and , . Lagrangian relaxation of the equality constraints in the problem leads to the following problem where is the dual function. Trying to solve problem (1.2) by means of solving its dual problem (1.3) makes sense in many situations. In this case, evaluating the function value and a subgradient requires solving the optimization problem (1.4) exactly. Actually, in some cases, computing exact values of is unnecessary and inefficient. For this reason, some modifications of bundle methods in  were needed.
The paper is organized as follows. In the next section we present the approximate -Lagrangian based on the approximate subgradient. Then we design a conceptual Algorithm 2.6 which can deal with the approximate subgradients and approximate function values. Section 3 breaks into 3 parts. In the first part, we propose the approximate primal-dual track. The proximal bundle-type subroutine with inexact data is introduce in the second part. The third part of Section 3 is devoted to establishing an implemental Algorithm 3.5 which substitutes the approximate V-step in Algorithm 2.6 with proximal bundle subroutine. Numerical testing of the resulting Algorithm 3.5 is reported in the final section.
2. A Conceptual Approximate Decomposition Algorithm
2.1. Approximate -Lagrangian and Its Properties
In some cases, computing exact values of the objective function and exact subgradient is unnecessary and inefficient. For this reason, some modification of the -Lagrangian will be proposed in this section. We assume that satisfies Introducing this to , one can restate the definition of -Lagrangian and its properties as follows.
Definition 2.1. Assume (2.1). The approximate -Lagrangian of , denoted by , is defined as follows. and is defined by
Theorem 2.3. Assume (2.1) and . Then one has that
2.2. Approximate Decomposition Algorithm Frame
In order to give an approximate decomposition algorithm frame, we restate the definition of Hessian matrix in  as follows.
Definition 2.5. Assume that is finite, is fixed, and satisfies (2.1). We say that has at a -Hessian associated with if has a generalized Hessian at 0, setting
Assume (2.1), we investigate an approximate decomposition algorithm frame based on the definition of the approximate -Lagrangian.
Algorithm 2.6. Step 0. Initiation
satisfies for all , , where and .
Step 1. Stop if .
Step 2. Approximate -step.
Compute an optimal solution satisfying Set .
Step 3. -step. Make a Newton step in .
Compute that satisfies for all , , where , such that Compute the solution satisfying Set and .
Step 4. Update-set
Compute that satisfies for all , , where . Set and go to Step 1.
Remark 2.8. If , this algorithm is the same as Algorithm 4.5 in . However, it only uses the approximate objective function values which make the algorithm easier to implement.
3. Approximate Decomposition Algorithm
Since the Algorithm 2.6 in Section 2 relies on knowing the subspaces and and converges only locally, it needs significant modification. In , Mifflin and Sagastizábal show that a proximal point sequence follows primal track near a minimizer. This opens the way for defining a -algorithm where -steps are replaced by proximal steps. In addition, the proximal step can be estimated with a bundle technique which also can approximate the unknown and subspaces as a computational byproduct. Therefore, they establish Algorithm 6 in  by combing the bundle subroutine with the -space decomposition method. However, this algorithm needs the exact function values and exact subgradients, which is expensive to compute. Therefore, the study of using approximate values instead of the exact ones is deserving.
3.1. Approximate Primal-Dual Track
Given a positive scalar parameter , the proximal point function depending on is defined by where stands for the Euclidean norm. It has the property: .
Similarly to the definition of primal track, we define the approximate primal track.
Definition 3.1. For any , , we say that is an approximate primal track leading to , a minimizer of , if for all small enough, it satisfies the following:(i), where ;(ii) is a -function satisfying for all satisfies (2.1);(iii)the Jacobian is a basis matrix for ;(iv)the particular -Lagrangian is a -function.
Accordingly, we have the approximate dual track denoted by corresponding to the approximate primal track. More precisely,
In fact, if , the approximate primal-dual track is exactly the primal-dual track shown in .
The next lemma addresses that making an approximate -step in Algorithm 2.6 essentially amounts to finding a corresponding approximate primal track point.
Lemma 3.2. Let be an approximate primal track leading to , a minimizer of , and let . Then for all sufficiently small is the unique minimizer of on the affine set .
Proof. Since is a basis for , Theorem 3.4 in  with gives the result.
3.2. The Proximal Bundle-Type Subroutine with Inexact Data
Throughout this section, we make the following assumption: at each given point , and for , we can find some and satisfying where . At the same time, it can be ensured that where and for given . The condition (3.3) means that . This setting is realistic in many applications; see .
The bundle method includes two phases. (i) The first phase makes use of the information in bundles to establish a polyhedral approximation of at the actual iterate . (ii) Due to the kinky structure of , the model is possibly not precise for approximation . Then, more information around the actual iterate is mobilized to obtain a more reliable model. Feature (i) leads to the following approximation of at . Let denote the index set at the th iteration with each representing , where and satisfy for given . From the choices of and , we have that, for all and for all , where
On the basis of the above observation, we attempt to explore the possibility of utilizing the approximate subgradient and approximate function values instead of the exact ones. We approximate at from below by a piecewise linear convex function of the form:
Since (3.8) becomes more and more crude if an approximation of is farther away from , we add the proximal term , to it. To approximate an proximal point, we solve the first quadratic programming subproblem Its corresponding dual problem is Let and denote the optimal solution of (3.9) and (3.10), then it is easily seen that In addition, for all such that and The vector is an estimate of an approximate proximal point. Hence, it approximates an approximate primal track point when the latter exists. To proceed further we let and compute , , and satisfying .
An approximate dual path point, denoted by , is constructed by solving a second quadratic problem, which depends on a new index set The second quadratic programming problem is It has a dual problem Similar to (3.11), the respective solutions, denoted by and , satisfy
Given , the proximal bundle subprocedure is terminated and is declared to be an approximation of if Otherwise, above is replaced by , and new iterate data are computed by solving updated subproblems (3.9) and (3.14). This update, appending to active data at (3.9), ensures convergence to a minimizing point in case of nontermination.
Remark 3.3. From the talking above, the following results are true:(i); (ii)since is an approximate primal track point approximated by and approximates , from (3.2) the corresponding is estimated by ;(iii)we can obtain the by means of the following iteration.Let Then, from (3.16), , so for all such and for a fixed . Define a full column rank matrix by choosing the largest number of indices satisfying (3.19) such that the corresponding vectors are linearly independent and by letting these vectors be the columns of . Then let be a matrix whose columns form an orthonormal basis for the null space of with if is vacuous.
Theorem 3.4. At the th iteration, the above proximal bundle subprocedure satisfies the following:(i), where and ;(ii) and ;(iii);(iv), where ;(v)for any parameter , (3.17) implies
Proof. (i) Since satisfies and , satisfies
where , so the result of item (i) holds for . From the definition of , and we have that for all in
so for all such ,
Adding to this inequality gives
which means that for .
(ii) Multiplying each inequality in (3.25) by its corresponding multiplier and summing these inequalities, we have Using the definition of from (3.16) and the fact that gives which means that . In a similar manner, this time using the multipliers that solve dual problem (3.10) and define in (3.11), together with , obtains the result.
(iii) Since , we have From (ii): , we get Therefore, that is, . Then, since the expression for from (3.11) written in the form combined with implies that , we obtain item (iii).
(iv) From (3.10), (3.11), (3.31), and the definition of , we have that is in the convex hull of . We obtain the result by virtue of the minimum norm property of .
(v) Since and , we have . Thus if (3.17) holds then . Together with the definition of , (3.12) and the nonnegativity of gives Finally, combing this inequality with item (iv) gives (3.20).
3.3. Approximate Decomposition Algorithm and Convergence Analysis
Substituting the approximate -step in Algorithm 2.6 with proximal bundle-type subroutine, we present an approximate decomposition algorithm as follows. Afterwards a detailed convergence analysis is given. The main statement comprises the fact that each cluster point of the sequence of iterates generated by the algorithm is an optimal solution.
Algorithm 3.5. Choose a starting point and positive parameters , and with , .
Step 0. Compute satisfying , where . Let be a matrix with orthonormal -dimensional columns estimating an optimal -basis. Set and .
Step 1. Stop if .
Step 2. Choose an positive definite matrix , where is the number of columns of .
Step 3. Compute an approximate -Newton step by solving the linear system Set .
Step 4. Choose , initialize , and run the bundle subprocedure with . Compute recursively, and set .
Step 5. If then set Otherwise, execute a line search reinitialize , and rerun the bundle subroutine with , to find new values for , then set .
Step 6. Replace by and go to Step 1.
Remark 3.6. In this algorithm, . If , this algorithm is the same as Algorithm 6 in . However, this algorithm uses proximal bundle-type subroutine which can deal with the approximate subgradients and the approximate function values.
Theorem 3.7. One of the following two cases is true:(i)if the proximal bundle procedure in Algorithm 3.5 does not terminate, that is, if (3.17) never hold, then the sequence of -values converges to and is a minimizer of ;(ii)if the procedure terminates with , then the corresponding equals and is a minimizer of .
Proof. By (, Prop. 4.3), if this procedure does not terminate then it generates an infinite sequence of -values and -value converging to zero. Since (3.17) does not hold, the sequence of -values also converges to 0. Thus, item (iii) in Theorem 3.4 implies that . And Theorem 3.4 (ii) gives By the continuity of , this becomes The termination case with follows in a similar manner, since (3.17) implies in this case.
Theorem 3.8. Suppose that the sequence in Algorithm 3.5 is bounded above by . Then the following hold:(i)the sequence is decreasing and either or and both converge to 0;(ii)if is bounded from below, then any accumulation point of is a minimizers of .
Proof. In this paper, the inequalities of (3.15), (3.16), and (3.17) in  become
since , (3.39) implies that is decreasing. Suppose . Then summing (3.39) over and using the fact that for all implies that . Then (3.41) with and implies that , which establishes (i).
Now suppose is bounded from below and is any accumulation point of . Then, because , , and converge to 0, (3.40) together with the continuity of implies that for all and (ii) is proved.
4. An Illustrated Numerical Example
We test some examples in this section to validate the effectiveness of the proposed algorithm. The platform is Matlab R2009a, Intel Pentium(R) Dual CPU T2390 1.87 GHz. All test examples are of the form where is finite and each is on .
For our runs we used the following examples:(i)F2d: the objective function is given in , defined for by(ii)F3d-: four functions of three variables, where denotes the corresponding dimension of the -subspace. Given and four parameter vectors , for where , and .
In Table 1, we show some relevant data for the problems, including the dimensions of and , the (known) optimal values and solutions, and the starting points.
We calculate an -subgradient at by using the method in : , where is a subgradient at and is a subgradient at a point such that . Here and are randomly chosen. The approximate function value is randomly taken out from the interval . The radius is adjusted iteratively in the following way: If we find the linearization error then is reduced by a multiple smaller than one. On the other hand, if is significantly smaller than , then is increased by a multiple greater than one. When in the algorithm, then -Hessian at is computed in the following form: , where , is an active index such that correspond to via (3.16).
The parameters have values , , , , and equal to the identity matrix. As for , one can refer to .
Table 2 shows the results of Algorithm 3.5 for these examples, compared with Algorithm 6 in . Number of denotes the number of evaluation of the function and subgradient (-subgradient) in Algorithm 6 and Algorithm 3.5. is the calculated solution, stands for the difference between the function values at and .
It is shown in Table 2 that we obtain quite accurate solutions by Algorithm 3.5 with inexact data costing a slightly more evaluation number than that with exact data. One noticeable exceptional occurs in the example F3d-1; it seems that the decomposition algorithm is sensible with exact data, but is more stable when applying inexact data (function values and subgradients). This favorable results demonstrate that it is suitable to use approximate decomposition algorithm to solve (1.1) numerically.
The research is supported by the National Natural Science Foundation of China under Project no. 11171049 and no. 11171138.
J. J. Moreau, “Proximitéet dualité dans un espace hilbertien,” Bulletin de la Société Mathématique de France, vol. 93, pp. 273–299, 1965.View at: Google Scholar
J.-B. Hiriart-Urruty and C. Lemaréchal, Convex Analysis and Minimization Algo-Rithms, vol. 2, Springer, 1993.
R. Mifflin and C. Sagastizábal, “-decomposition derivatives for convex max-functions,” in Ill-Posed Variational Problems and Regularization Techniques, vol. 477 of Lecture Notes in Economics and Mathematical Systems, pp. 167–186, Springer, Berlin, Germany, 1999.View at: Google Scholar | Zentralblatt MATH