Abstract

We describe an extension of the redistributed technique form classical proximal bundle method to the inexact situation for minimizing nonsmooth nonconvex functions. The cutting-planes model we construct is not the approximation to the whole nonconvex function, but to the local convexification of the approximate objective function, and this kind of local convexification is modified dynamically in order to always yield nonnegative linearization errors. Since we only employ the approximate function values and approximate subgradients, theoretical convergence analysis shows that an approximate stationary point or some double approximate stationary point can be obtained under some mild conditions.

1. Introduction and Motivation

Consider the following unconstrained nonsmooth nonconvex optimization problem:where is a function from . Nonsmooth optimization problems (NSO) arise from many fields of applications. There exist several approaches to solve these kinds of problems; see [14]. Bundle methods [5] are based on the cutting-planes methods, first described in [6, 7], where the convexity of the objective function is the fundamental assumption. If the objective function is convex, tangent lines are cutting planes supporting the epigraph of and the linearization errors are always nonnegative, and the model functions usually defined by the maximum of tangent lines are lower approximations to the objective function. However, in the nonconvex case the linearization errors can be negative, and the corresponding model function does not stay below and may even cut off a region containing a minimizer. Very little systematic research has been performed on extending convex bundle methods to nonconvex cases. The bundle methods for nonconvex functions [812] are of proximal type, which were developed in the 90’s; see [13]. They still use subgradients locality measures to redefine negative linearization errors, and primal information, corresponding to function values, is again ignored.

Note that in some cases computing the exact function value is not easy. The assumptions for using approximate subgradients and approximate values of the function are realistic, for instance, the Lagrangian relaxation problem: if is a max-type function of the form , where each is convex and is an infinite set, then it may be impossible to calculate since itself is defined by a minimization problem involving another function . However, we may still consider two cases. In the first case, for each positive one can find an element satisfying ; in the second case, this may be possible only for some fixed (any possibly unknown) . In both cases we may set . Besides that, the study of approximate subgradients of convex functions is deserved since in some cases a subgradient is expensive to compute. But if we know an already computed subgradient , where is near to , then we have because where . For more details and papers involving the approximate function values and subgradients, we refer to [1416] and the references therein.

In our work, we attempt to explore the possibility of using the approximate function values and approximate subgradients of instead of the exact ones for solving problem (1). From the point of view of primal problem, by separating the traditional prox-parameter into two parts, local convexification parameter and new model prox-parameter, and employing the inexact information of the objective function, we construct a cutting-planes model which is the approximation to the local convexification function, and the iterate points are obtained approximately by computing its double approximate proximal points. The cutting-planes model is special in the sense that it no longer approximates the objective function, but rather certain local convexification, centered at the proximal center.

This paper is organized as follows: in Section 2, some preliminary results and assumptions required in our paper are provided. In Section 3 we pay more attention to the primal pattern of (1) instead of the dual insight and the cutting-planes model of local approximation to the objective function is constructed in this part. In Section 4, the concrete approximate redistributed algorithm for solving (1) is presented. Convergence results are examined and discussed in Section 5, and the iterate points generated by the proposed algorithm converge to an approximate (double approximate) stationary point of the objective function. In the last section, some conclusions are given.

2. Preliminaries and Assumptions

In this part, we first present some basic definitions and results [17].(i)The regular subdifferential of at is defined by (ii)The limiting subdifferential of at is defined by If is finite at , and are closed and is convex.(iii)For given , -limiting subdifferential of at is defined by

We call elements of this approximate subdifferential approximate subgradients. And if is finite at , is convex and closed.(iv)We say is prox-bounded if there exists such that is bounded below. The corresponding threshold is the smallest such that is bounded below for all .(v)The function is lower- on an open set if is finite on and for any in there exists a threshold such that is convex on an open neighbourhood of for all .

Next we give some assumptions needed in our paper.

Assumption 1. For fixed accuracy tolerances , , for each , the oracle can provide an approximate function value and an approximate subgradient of at , where .

Assumption 2. Given and , there exist an open bounded set and a function such that , and is lower- on satisfying on .

Lemma 3. For a function satisfying Assumptions 1 and 2, the following conclusions hold:(a)The approximate level set is nonempty and compact.(b)The approximate function is bounded below and prox-bounded with threshold .(c)There exists such that, for any and given , the function is convex on .(d)The approximate function is Lipschitz continuous on .

Proof. Since is the lower- function defined in by Assumption 2 which agrees with on , is continuous and finite valued on the open set , which contains . Thus is closed. By the fact that is bounded from the boundedness of , (a) is proved. For item (b), without loss of generality, we assume that and are the same function. Since is finite and continuous on the approximate compact level set , this implies is bounded below. Any function which is bounded below is prox-bounded with threshold . Item (c) can be obtained by [17]. The last item (d) is obvious by noting that all lower- functions are locally Lipschitz continuous.

Define the approximate proximal point mappingwhich is single-valued and Lipschitz continuous on , provided is sufficiently large. By imitating the conclusion in [18] and the optimality condition, we define a new kind of approximate stationary point of the objective function.

Definition 4. One calls an approximate stationary point of if , where is defined by (6), written with replaced by .

Note that is sufficiently large meaning that , where is the value in item (c) in Lemma 3 andThe above relation with the local convexification property plays a fundamental role in the development of our algorithm if we have already known the ideal proximal threshold .

3. Construction of the Model

For a convex function , the exact linearization error of at is defined bywhere and is the current stability center. Obviously we have , and the reformulated bundle data consists of and the approximate subgradients . In our method, at any iteration, bundle method keeps memory of the iterative process in a bundle of inexact information:where , , and is the best approximate value obtained until iteration , evaluated at the serious step , corresponding to some past iterate . For nonconvex function , we work with augmented functionsWe consider an augmented bundle of inexact information: , whereNote that the following relation holds:where , , and and are called the convexification parameter and model prox-parameter, respectively. We use the past information in bundle to construct a cutting-planes model of the function : An equivalent expression, written with all the iteration indices, is the following:The next candidate point is chosen as . The corresponding optimality condition is that there exists such thatwhere denotes the unit simplex in . Denote . is called the aggregate approximate subgradient, and the corresponding aggregate bundle element is the quadruplet:Note that for and all we haveFor all , , , and can be updated according to the following formula:In classical bundle method with inexact information for convex functions, the bundle consists of the pair for which holds whenever , which can be seen by noting that , the definition of , and For our method, this pair is replaced by a quadruplet for which the relationholds, since for all , whenever , we have We want to make the parameter asymptotically estimate the ideal convexity threshold , and when is sufficiently large, is a convex function. As a result, the model function becomes eventually a lower approximation to a locally convex function . Setand clearly for all whenever .

Remark 5. By comparing with [19] and [18], we find that , which means that the domain of that ensures that the nonnegativeness of the linearization errors is enlarged.

4. Algorithm

Algorithm 6 (approximate redistributed proximal bundle method). Consider  the following steps.
Step 1 (initialization). Choose an initial point , two parameters , , one stopping tolerance , an Armijo-like parameter , and a convexification growth parameter . Set , , , , and . Compute , , and the additional bundle information . Choose .
Step 2 (computation of trail point). Having the current serious step , the bundle , and the prox-parameter distribution with , , define the convex piecewise linear approximate model function from (14). Compute The optimal simplicial multipliers from (15) and the aggregate quadruplet from (16) are available. Define the predicted decrease Step 3 (bundle management). Compute and . Set Choose a new index set satisfying Step 4 (descent test). Ifdeclare a serious step: set , , and , and update bundle elements according to (19).
Otherwise, declare a null step: set .
Step 5 (update of local convexification parameter). Setwhere is given in (23), written with replaced by .
Step 6 (update of model prox-parameter). If , restart the algorithm by setting and loop to Step 2.
Otherwise go to Step 7.
Step 7 (stopping criterion). If , then stop with the message “Algorithm successfully terminated at .”
Otherwise, in case of serious step increase by 1. In all cases increase by 1, and loop to Step 2.

From the definition of , we obtain its equivalent expression

Lemma 7. Consider the sequence generated by Algorithm 6. If satisfies Assumptions 1 and 2, and , there can be only a finite number of restarts in Step 6. Eventually the sequence lies in and the sequence becomes constant.

Proof. Because the functions are convex, iterates are always well defined. By Lemma 3(d), the function is Lipschitz continuous on and let be the Lipschitz constant. Take , and then for any , the open ball . Note that Since , we have . From the convexity of , we know there exists such that , and hence During each Step 6, increases, eventually we will have , and it means that . So ; that is, .

5. Convergence Results

Lemma 8. For the function , the following conclusions hold:(a)One always has(b)If , which is defined in (23), then(c)If is a null step, , and either or , then for all (d)If , then, for some ,

Proof. (a) is clear since is the maximum of a collection of affine functions.
(b) follows from and (23) ensures for all .
For (c), suppose that is a null step and ; we knowfor all and .  Equation (17) implies that Then we obtain in (38) for all and If , since , the relation above for is just item (c). For the case , we sum the above inequality by using the convex multipliers and obtain the desired result.
For the last conclusion (d), since , from the definition of and , for all ,

Lemma 9. Consider the functions given by (14). If satisfies Assumptions 1 and 2, there exists an iteration such that all the parameters sequences stabilize; that is, Therefore, condition  (3) in [19] is eventually satisfied. If, in addition, , thenthat is, condition(6) in [19] holds.

Proof. By Lemma 7 there are a finite number of restarts in Step 6. Once there are no more restarts, and the update of in Step 5 is nondecreasing. If the sequence is not to be stabilized at some value , there must be an infinite subsequence of iterations at which is increased by a factor of at least . In this case for some iteration , the function is convex on . And one will have for all . Hence,and therefore, from that iteration on, the convexification parameter remains unchanged; that is, for all , and this leads to a contradiction.
When , the augmented function is convex on , and hence the model function remains below the augmented function.

If Algorithm 6 stops at some iteration with and , by the definition of , . Note that and , and we see which shows ; this implies that . Suppose that is sufficiently large for to be convex on , and this would imply that In other words, ; is an approximate stationary point of .

Definition 10. is called -double approximate stationary point of if where , , and are the stabilized convexification parameter and model prox-parameter.

Theorem 11. Suppose that and there is no termination. Let be the stabilized value for the local convexification parameter sequence, as in Lemma 9, and . The following two mutually exclusive conclusions hold:(a)There is a last serious step , followed by infinitely many null steps. Then and is an approximate stationary point of .(b)There is an infinite number of serious steps. Then any accumulation point of the sequence is -double approximate stationary point of .

Proof. For item (a), consider that iteration after the last serious step was generated, so there are only null steps. First, we prove . Consider the function: . And let . Since , , and (35) implies that Condition (36) gives the inequality By the definition of , , and hence But , so After expanding squares, the two rightmost terms above satisfy the relation So we obtain and the sequence is eventually increasing and convergent since it is bounded above by . By conditions (35) and (36) Since , we obtain that . By the convergence of , the sequence is bounded. Note thatso as . Since (when is sufficiently large), is bounded. Therefore, the sequence of approximate proximal point converges. By using the similar technique of Theorem3(ii) in [19], we can prove that . For the sequence , we show that there exists such that By Lemma 8(d), when is sufficiently large, And at the same time, according to Lemma 9  . As , we have is bounded, so it has an accumulation point; say for some iterate set , . Obviously, Since , so , and Because the descent test in Step 4 is not satisfied, taking the limit as we have Therefore, , and from we obtain . But implies that , which shows that . That is, equals approximately ; is an approximate stationary point of .
To see item (b), has an accumulation point since ; say for some iterate set , as and . Since , we set , so that . The descent test implies that as , either or . By Lemma 3(b), is bounded below; therefore . From (31) both and must converge to zero. Therefore by (14), as . Consider now ; since , both and converge to , with . But and implies that, for all , Therefore, we have As , we also have that, for any , Hence, for all . That is, is -double approximate stationary point of .

Remark 12. Note that if , the results obtained in Theorem 11 are exactly the ones in [18], which means that our work is really the generalization of previous work.

6. Conclusion

We propose an approximate redistributed proximal bundle method for nonsmooth nonconvex optimization by employing the inexact information from the objective function. With the inexact data we prove that the cutting-planes model constructed in this paper is eventually the local lower approximation to the approximate objective function. Analysis of the convergence proceeds by first showing that the convexification parameter eventually stabilized and once stabilized, the convergence to the approximate or -double approximate stationary point of the objective function is obtained under the condition that the stabilized convexification parameter is greater than the ideal proximal threshold . The local convexification approach opens a new way for future study on nonsmooth nonconvex optimization and can shed a new insight on the first order models from [20].

In [21], the authors present a framework of general bundle methods which are capable of handling inexact oracles, and the framework generalizes in various ways a number of algorithms proposed in the literatures. Next we discuss the relationship between our algorithm and [21]. In [21], the objective function is a finite-valued convex function, and the authors make the following assumptions. For each given , the oracle delivers the inexact information:where the error bound is possibly unknown. By (64) we have . As a result,Even when the value of the upper error bound is unknown, inequalities (64) and (65) imply that , which means thatIn our paper, if is the function which is locally convex by adding a quadratic term with convexification parameter, we suppose that, for each given , an oracle delivers the inexact information:Assumption (67) is more general than (64) since if we choose , (67) will become , , which is exactly (66). Therefore, our assumption is, in a sense, some kind of generalization of assumptions in [21].

In [21], the authors use the linearizations and the cutting-plane modeland the next trial point is obtained by minimizing the stabilized model function In our paper, we choose the cutting-plane model function to be which becomes a lower approximation to the locally convex objective function. It is similar to (68) except for the appearance of . At the same time, the next trail point is chosen as , the approximate proximal point of defined by (6), which is quite different from the general traditional techniques introduced in [21].

In [21], the authors mention that, for inexact oracles, the progress made by the algorithm can be measured relative to the model, some nominal reference value, or the approximate objective function, and they are called and denoted byIn our paper, we define a predicted decrease , which is similar to , but a bit different from since is only associated with current stability center , while the first term in is not only related to current stability center but also involved with the new trial point . And this predicted decrease is nonnegative as long as is sufficiently large, and this requirement is exactly coincident with the condition which guarantees that is a lower approximation to . Therefore, we can further employ it in the descent test to decide between making a descent step or a null step.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors would like to acknowledge the valuable suggestions and helpful comments from referees. This project is supported by the National Natural Science Foundation of China (nos. 11301246, 11171049, and 11471151).