Abstract

Network-structured optimization problems are found widely in engineering applications. In this paper, we investigate a nonconvex distributed optimization problem with inequality constraints associated with a time-varying multiagent network, in which each agent is allowed to locally access its own cost function and collaboratively minimize a sum of nonconvex cost functions for all the agents in the network. Based on successive convex approximation techniques, we first approximate locally the nonconvex problem by a sequence of strongly convex constrained subproblems. In order to realize distributed computation, we then exploit the exact penalty function method to transform the sequence of convex constrained subproblems into unconstrained ones. Finally, a fully distributed method is designed to solve the unconstrained subproblems. The convergence of the proposed algorithm is rigorously established, which shows that the algorithm can converge asymptotically to a stationary solution of the problem under consideration. Several simulation results are illustrated to show the performance of the proposed method.

1. Introduction

Network-structured problems have drawn recently considerable attention in various applications, such as mobile ad hoc networks, wireless sensor networks, and Internet networks [13]. The absence of centralized access to information and time-varying network connectivity are common features of network-structured problems. For this reason, distributed optimization methods associated with multiagent networks should be designed on the base of local communication and computation and the change of network topology. Distributed computing allows each agent to only utilize its own cost function and communicate with its direct neighbors, which has the potential advantage on protecting agents’ privacy. In recent years, there is an increasing trend to develop distributed optimization by integrating the communication and computation cooperation (see, e.g., [4, 5] and references therein).

The methods designed to solve distributed convex problems have been widely studied in the literature. Based on the consensus averaging mechanism, there are several useful approaches, including primal consensus distributed methods, dual consensus distributed methods, and primal-dual consensus distributed methods. Nedic and Ozdaglar in [5] originally developed a distributed subgradient-based algorithm, in which every agent optimizes its own objective and locally exchanges information with neighboring agents in a network. The convergence rate of their proposed algorithm has been obtained. But, only unconstrained distributed problems were investigated. In [6], Nedic et al. generalized the distributed method in [5] to solve constrained convex optimization. Later, many researchers proposed various extensions based on primal (sub)gradient-based methods in, for example, [712]. In [13], Duchi et al. extended the centralized dual averaging algorithm to the distributed setting and then proposed a distributed dual averaging algorithm. The convergence result of their algorithm shows that the number of iterations is dependent on the underlying network sizes and spectral gaps. The primal-dual consensus distributed method was designed for solving distributed convex problems with global inequality constraints under the framework of Lagrangian duality. By finding the corresponding saddle-points of the Lagrangian function, Zhu and Martinez [14] first proposed a primal-dual consensus distributed algorithm and established the convergence results. The authors in [15] obtained the explicit convergence rate of the algorithm under the strong connectivity of networks.

However, the methods mentioned above are not suitable for the general nonconvex distributed optimization problems. Until recently, there are some references designing algorithms for distributed nonconvex problems, such as [16, 17]. In [16], Lorenzo and Scutari developed a novel distributed algorithm for solving unconstrained nonconvex optimization problems which associated a multiagent network with time-varying connectivity. They combined a successive convex approximation technique and a dynamic consensus mechanism to realize distributed computation as well as local information exchange in the network. For nonconvex optimization problems with constraints, Scutari et al. [17] proposed a successive convex approximation method by solving a sequence of strongly convex subproblems while maintaining feasibility. They have shown that their proposed method can converge to a stationary solution of the original constrained nonconvex problem under consideration. However, the method proposed in [17] is not suitable in the distributed setting. Additionally, the Lagrange dual method is utilized to solve the sequence of strongly convex subproblems, which may largely enlarge the dimension of the problem due to introducing the dual variables. Thus, the computational difficulty and cost may be increased.

In this paper, we investigate a distributed nonconvex problem with inequality constraints. The main contributions of this paper are twofold: (i) based on the penalty function method, a distributed algorithm for solving the nonconvex problem with global inequality constraints is proposed; (ii) the convergence of the proposed algorithm is rigorously proved. More specifically, we first transform the nonconvex problem to a sequence of strongly convex subproblems based on successive convex approximation techniques. In order to realize distributed computation, we then exploit the exact penalty function method to transform the sequence of strongly convex constrained subproblems into unconstrained ones. Finally, we propose a fully distributed method to solve these unconstrained subproblems. We obtain the convergence results of the proposed algorithm and demonstrate several numerical simulations.

The work in this paper is closely related to the previous works [1619]. Our method is based on the algorithm proposed in [16], but the problem we consider in this paper is different from the one in [16], since we investigate a nonconvex problem with global inequality constraints. The proposed algorithm is different from [17], since our algorithm exploits the exact penalty function method to solve the related subproblems possibly reducing the dimensionality of the problem. However, the algorithm in [17] utilized the Lagrange dual method to solve the subproblems, in which the computation is not implemented in a distributed manner. Our algorithm is an extension of the algorithm in [18] to handle with the nonconvex case. By comparison, we solve a constrained distributed nonconvex optimization in this paper, while the authors in [19] solved the unconstrained one.

The remainder of this paper is organized as follows. Section 2 provides the problem statement and related preparations. Section 3 gives the algorithm development. Section 4 proposes the distributed algorithm and establishes the results of convergence. Numerical simulations are given in Section 5. Finally, the conclusions are obtained in Section 6.

2. Problem Statement and Preparations

In the section, we state the optimization problem under consideration and give the assumptions and definitions that will be used in the sequel.

2.1. Problem Statement

Consider the following distributed nonconvex optimization problem:where is a nonconvex smooth cost function, only known by agent for ; , is a convex smooth function for ; and is a closed, convex set. The constraint and the set are known by all the agents. Letbe the feasible set of problem (1). We assume that Slater’s condition [20] is satisfied for problem (1).

Problem (1) is ubiquitous that arises in many applications, such as networking, wireless communications, and machine learning. Therefore, it is meaningful to solve the problem.

2.2. Assumption and Definition

We first give the description of the network topology. Time is assumed to be discrete. At each time slot , the network is modeled as a directed graph , where is the set of nodes with agents, and represents the set of time-varying directed edges. The neighborhood of agent at time is defined as . The communication pattern between neighbors is set as follows: agent in can communicate with node at time . We assign time-varying weights to match the digraph and define the weight matrix .

Assumption 1 (see [5]). (A1) The graph sequence is B-strongly connected; that is, there is an integer such that the graph , with the edge set , is strongly connected, for all .
(A2) There is a scalar with , for all : (i) for all ; (ii) for all and all agents communicate directly with agent ; (iii) otherwise, all agents cannot communicate directly with agent , for all .
(A3) Each weight matrix satisfies

Assumption 1 shows that, at any time , any agent will receive the information from agent within the next time slots. Moreover, the weigh matrix is double-stochastic.

Assumption 2. (B1) The set is nonempty, closed, and convex.
(B2) For , each local function is continuously differentiable on , and is Lipschitz continuous on with constant ; let .
(B3) is bounded on ; that is, there is a finite number such that , for all .
(B4) For , each function is bounded on ; that is, there is a finite number such that ; let .
(B5) is coercive on ; that is, .

The above assumptions are quite general, which can be satisfied by a large class of problems in practical applications. Assumption (B5) can make sure that problem (1) has a solution.

The goal in this paper is to design a method that can find a stationary solution of problem (1). Moreover, the method is implemented in the distributed scenario satisfying Assumptions 1 and 2.

Next we introduce several definitions, which will be used in the convergence analysis of our method.

Definition 3 (regularity [21]). A point is called regular for problem (1) if the Mangasarian-Fromovitz Constraint Qualification (MFCQ) holds at , that is, if the following implication is satisfied:whereis the normal cone to at and is the index set of convex constraints that are active at .

Definition 4 (stationary point [17]). A point is a stationary point of problem (1), if it satisfies the following KKT system:where are Lagrange multipliers chosen suitably.

As pointed out by [17], a regular (local) minimum point of problem (1) is also a stationary point. It is well known that the traditional goal for solving nonconvex problems is actually to find stationary points. To simplify the discussion, we assume that all feasible points of problem (1) are regular throughout the rest of this paper.

3. Development of Algorithm

Designing an effective distributed method for problem (1), we had to face three main challenges: (i) the nonconvexity of the objective function ; (ii) the unavailability of global knowledge on ; and (iii) the presence of inequality constraints . In order to deal with the difficulties, we utilize successive convex approximation (SCA) techniques, exact penalty function methods, and dynamic consensus mechanisms to develop our algorithm.

3.1. Local SCA Approximation

In a distributed setting, the computational cost for directly solving problem (1) is considerably high and even infeasible. We would prefer to suitably approximate problem (1), in the sense of local convex approximation.

By copying the global variable , each agent maintains a local estimate that needs to be updated at each iteration. We rewrite and consider a convexification of as follows: at each iteration, we use a strongly convex function to replace the nonconvex function and linearize the at ; that is,where is a strongly convex surrogate of the nonconvex and is the gradient for the term at ; that is,

At each iteration , a strongly convex problem is solved by agent Note that in (9) is well defined, since subproblem (9) has a unique solution.

We give the following assumptions on the approximation of .

Assumption 5. Each satisfies the following:
(C1) is uniformly strongly convex on with a strongly convex parameter .
(C2) is uniformly Lipschitz continuous on .
(C3) , for all .

Assumption 5 is quite natural. The function is viewed as a strongly convex, local approximation of around that inherits the first-order properties of . Assumption 5(C2) is the requirement of Lipschitzianity that is surely satisfied, for example, if the set is bounded. For a given , several feasible choices are provided in [16, 22, 23].

3.2. Exact Penalty Function Method

Subproblem (9) is not easy to solve due to the presence of the global constraints and the local accessibility of ; thus, the exact penalty function method is utilized to transform subproblem (9) into an unconstrained problem, given bywhere , is a penalty parameter. We can obtain that is convex on , and for all , where stands for the subgradient of at .

Under suitable conditions [24], the solution set of the penalized problem (10) coincides with the solution set of the constrained problem (9). In order to explain the fact in detail, we introduce the Lagrangian function of problem (9):where is the vector of dual variable corresponding to the constraints . The dual problem of problem (9) isIt can be proved that no duality gap exists between subproblem (9) and its dual problem (12) if Slater’s condition is satisfied (see Proposition  5.3.1 in [20]). In addition, the set of dual optimal solutions is nonempty bounded. Thus, based on Proposition  1 in [24], there is a penalty parameter satisfied such that the solutions of penalized problem (10) are the same as those of subproblem (9). Thus, throughout the rest of this paper, we can always select a finite penalty parameter such that .

3.3. Consensus Update

We now introduce a consensus mechanism to ensure that each local estimate gradually coincides among all agents. A consensus-based step is used on and each agent updates its state as follows:where is the weight satisfying Assumption 1.

Since the evaluation of in (8) requires the quantity of all , , which is not feasible at agent . In order to deal with the issue, we need a local estimation of in (8), eventually converging to . We rewrite in (8) as follows:and let . Then, we replace in (14) by where is a locally auxiliary variable updated by agent , asymptotically tracking . By using the dynamic averaging consensus strategy [25], we can update in (15) via the following formula:where .

Note that the update of in (16), and thus ) in (15), can be now performed locally with message exchanges with the agents in the neighborhood . By the above description, problem (10) can be converted into the following problem:where .

4. Algorithm and Convergence Results

Based on the previous algorithm development, we propose an exact penalty function based distributed algorithm (EPDA, for short) to solve problem (1), presented in Algorithm 1.

(1) Initialization: , , , for all .
Choose the weigh matrix satisfied Assumption 1 and the step-size satisfied (20).
Select a penalty parameter . Set ;
(2) repeat
(3) for each agent do
(4) Local SCA optimization: compute by solving problem (17);
(5) Update ;
(6) Consensus update:
(a) ;
(b) ;
(c) ;
(7) end for
(8) Set ;
(9) until satisfies a termination criterion.

In order to obtain the convergence results of the proposed algorithm, we first give Lemma 6, which shows the relationship between the solutions of problem (1) and the solutions of subproblem (9). Note that the following conditions (18) and (19) can be satisfied; please see the proof of relation (A.32).

Lemma 6. Suppose that Assumptions 1, 2, and 5 hold. Let be the sequence generated by Algorithm 1; the following results hold:
(i) Ifthen at least one regular limit point of is a stationary solution of problem (1).
(ii) Ifthen every regular limit point of is a stationary solution of problem (1).

Proof. See Appendix .

We are now in the position to give the convergence properties of Algorithm 1.

Theorem 7. Let be the sequence generated by Algorithm 1 and let be the average. Suppose that (a) Assumptions 1, 2, and 5 hold; and (b) the step-size sequence is selected such that , for all ,Then, (i) the sequence is bounded and all its limit points are stationary solutions of problem (1); (ii) all the sequences asymptotically agree; that is, as , for all .

Proof. See Appendix .

Remark 8. (1) On the choice of surrogate functions, we only present several instances to show how to choose the surrogate function ; also see [16, 19, 22, 23].
(i) If any convex structure of is unavailable, the linearization of at is the most plain choice; that is,(ii) If is convex, one can just set(iii) Consider the case where can be decomposed as , where is convex and is nonconvex. We can only linearize and preserve the convex as follows:(2) On the choice of step-sizes, condition (20) in Theorem 7 requires that the step-size sequence reduces to zero, but not too fast. The choice of the step-size meeting (20) is quite flexible [1, 23]. The following two choices of step-size are very effective in our simulations:

(i) , ;

(ii) , , .

(3) On the choice of weigh matrices, Assumption 1 requires that each communication weigh matrix is doubly stochastic. References [4, 5] provided several choices of weigh matrices, such as the maximum degree weigh matrix, the Metropolis-Hastings weigh matrix, and the least-mean square consensus weight matrix.

5. Numerical Simulations

We consider a distributed nonconvex quadratic problem with quadratic inequality constraints (also see Example  C in [22]):where , , , and are nonnegative constants, and is a box constraint in . Note that the function is locally accessible, only known by agent . The local function is differentiable, and its gradient is -Lipschitz continuous with , where is a unit matrix.

The corresponding penalized problem for problem (24) is given bywhere the penalty parameter is chosen suitably. In general, there are two ways to choose such that it satisfies the requirement. One way is by solving the unconstrained optimization problem defined in (12), and the other one is by heuristic. In order to imitate the time-varying weight matrix, a pool of 50 weight matrices from connected random graphs are generated, in which each weight matrix is satisfied (Assumption 1). The time-varying weight matrices required in Steps  6(a) and 6(b) of Algorithm 1 are randomly drawn from the above pool. To compare, we use the distributed Lagrangian primal-dual subgradient (DLPDS, for short) algorithm proposed in [14] to solve the corresponding subproblem (9).

For simplicity, we assume that each is a diagonal matrix with elements generated randomly in , is a vector with elements generated randomly in , is generated randomly in , is a diagonal matrix with elements generated randomly in , is generated randomly in , and the initial points are generated randomly in , where the box constraint is set as . In the numerical experiments, we heuristically select the penalty parameter .

Some experimental results are presented to illustrate the convergence behavior of the proposed Algorithm EPDA. All the curves are averaged over 20 independent realizations. Some comparisons with existing Algorithm DLPDS are also given.

Figures 1(a) and 2(a) depict the value of max error versus the number of iterations with different nodes and dimensions. It can be observed that both algorithms have the potential to converge to the same stationary solution. However, our Algorithm EPDA is much faster than Algorithm DLPDS [14]. As can be seen from Figure 1(a), Algorithm EPDA can reach higher precision than Algorithm DLPDS after 400 iterations. Figure 2(a) shows the similar results when and .

Figures 1(b) and 2(b) depict the value of function versus the number of iterations with different nodes and dimensions. For two tested cases, the value of the objective function gradually decreases when the number of iterations increases, but the value of the objective function for Algorithm EPDA reduces faster than that of Algorithm DLPDS.

6. Conclusions

In this paper, a distributed algorithm was proposed to solve the nonconvex distributed optimization with global inequality constraints over time-varying multiagent networks. The proposed algorithm was based on the successive convex approximation technique, exact penalty function method, and dynamic averaging consensus. The convergence of the proposed algorithm was proved. Several numerical results showed the effectiveness of the proposed method.

Appendix

(1) Preliminaries. We present some preliminary results that will be used to prove our main results. For ease of notation, we need to give the following:where .

Lemma A.1 (see [6]). Let , , and be three sequences of numbers such that for all . Suppose thatand . Then, either or converges to a finite value and .

Similar to the proof of Proposition  9 in [16], we can obtain the following lemma.

Lemma A.2. Let be the sequence generated by Algorithm 1. Then, for all and for all , the following results hold:
(i) Bounded disagreement:where is a finite constant.
(ii) Asymptotic agreement on :(iii) Asymptotically vanishing tracking error:(iv) Asymptotic agreement on best-responses:

(2) Proof of Lemma 6. We only prove statement (i) of Lemma 6; then statement (ii) of Lemma 6 is proved by applying statement (i) to every convergence subsequence of .

Assuming that is a regular accumulation point of subsequence of satisfied (18); therefore, there is such that . Next, we prove that is a KKT point of problem (1). LetUsing together with the continuity of , we haveThe limit in (A.12) means that there is a positive integer such thatSince the functions and are continuous, by Assumption 5(C3), we obtainand, for ,

We now prove that, for sufficiently large , the MFCQ holds at . By contradiction, suppose that the following relation does not hold for infinitely many :Then, there is a nonempty index set such that, after a suitable remuneration, for each , we haveWithout loss of generality, we assume that, for every , the sequence converges to a limit such that . By and taking the limit in (A.17), and invoking (A.15) along with the outer semicontinuity of the mapping (see Proposition  6.6 in [21]), we obtainwhich is contradicted with the regularity of . Thus, (A.16) must hold for sufficiently large , implying that the KKT system of subproblem (9) has a solution for each sufficiently large . Therefore, there exist such that

From (A.13) and the complementarity slackness in (A.19), we have for all and large . Moreover, the sequence of nonnegative multipliers must be bounded. By contradiction, assuming that for some , dividing both sides of (A.19) by , and taking the limit , we can obtainfor some , in contradiction with Definition 3.

Therefore, the sequence must have a limit. Let be such a limit (after a suitable remuneration). Taking the limit in (A.19), and combining (A.14) and (A.15) along with the outer semicontinuity of the mapping , we haveBy (A.21), is a stationary solution of problem (1).

(3) Proof of Theorem 7. Using the descent lemma on , Steps  5 and 6(a) of Algorithm 1 and (A.1), we obtainSumming and subtracting to the right side of inequality (A.22), we have

Next we evaluate the third term on the right side of (A.23). For a given , let be the unique solution of subproblem (9). By the optimal condition of subproblem (9), we have, for all ,Adding and subtracting , and using Assumption 5(C3), we getwhere the second inequality of (A.25) is obtained by Assumption 5(C1) and Lemma  7 in [22].

Letting in (A.25) and noting that we have, for ,where . Summing inequality (A.27) from to , we can obtain the following relation:By (A.23), (A.28), (A.7), Assumption 2(B4), and the triangle inequalities, we can getLetBy (A.9), (A.10), (20), and (A.7), we have . Since is coercive, it follows from Lemma A.1 that is convergent to a finite value, and Combining the inequality above and (20), we have

By a similar argument as in [22], we can obtain that , . Thus, we have , By Assumption 2(B5) and the convergence of , the sequence is bounded. Furthermore, the sequence of has at least one limit point . Owing to the continuity of any and , must exist for all ; therefore, is a stationary solution of problem (1) by (i) of Lemma 6. Thus, statement (i) of the theorem is proved, and statement (ii) readily follows from (A.8).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was partially supported by the NSFC 11501070, 61473326, 11671062, and 11471062, by the Natural Science Foundation Projection of Chongqing (cstc2015jcyjA00011 and cstc2017jcyjAX0253), by the Key Laboratory of Optimization and Control, Ministry of Education, China, and by the Research and Innovation Project of Postgraduate of Chongqing Normal University (YKC17019).