Research Article | Open Access
Approximate Dual Averaging Method for Multiagent Saddle-Point Problems with Stochastic Subgradients
This paper considers the problem of solving the saddle-point problem over a network, which consists of multiple interacting agents. The global objective function of the problem is a combination of local convex-concave functions, each of which is only available to one agent. Our main focus is on the case where the projection steps are calculated approximately and the subgradients are corrupted by some stochastic noises. We propose an approximate version of the standard dual averaging method and show that the standard convergence rate is preserved, provided that the projection errors decrease at some appropriate rate and the noises are zero-mean and have bounded variance.
The problem of solving optimization problems over a multiagent network has attracted a lot of attention in recent years (see, e.g., [1–13]). The objective function of such problems is, in general, a sum of local objective functions, each of which is known to one specific agent only. Moreover, the estimates of all agents are restricted to lie in some convex set. Duo to the lack of a central coordinator, the methods that are developed to solve this problem have to be executed by individual agents through local interactions.
In this paper, we consider the multiagent saddle-point problem where the global objective function is given as a sum of local convex-concave functions, subject to some global constraint. We utilize the average consensus algorithm (see, e.g., [14–21]) as a mechanism to design a distributed method for solving this problem. The method is based on the standard dual averaging method (see, e.g., [1, 22]), and it can also be viewed as an approximate version of the distributed dual averaging method in . Different from the distributed dual averaging methods in [1–4], which require that the projection steps have to be very accurately calculated, the proposed method assumes that they only have to be computed approximately. Moreover, the proposed method also considers the case where the subgradients are corrupted by some stochastic noises.
Literature Review. In , the authors develop a general framework for solving convex optimization problem over a network of multiple agents. Based on the average consensus algorithms, they propose a subgradient-based method; the method is fully distributed, in the sense that each agent only needs to communicate with its neighbors. Different from the work , the authors in  propose a distributed method that is based on dual averaging of subgradients; in particular, the authors characterize the explicit convergence rate of the proposed method. The authors in  further study the effects of communication delays on the distributed dual averaging method. The work  utilizes the push-sum algorithm as a mechanism to design a distributed dual averaging method; the implementation of the method removes the need for the doubly stochastic communication matrices. In , the authors solve the saddle-point problem over a multiagent network; the objective function is given as a sum of multiple convex-concave functions. Based on the dual averaging method, the authors propose a distributed method and characterize its convergence rate.
The contribution of our work in this paper is mainly twofold. First, we propose an approximate dual averaging method, and the implementation of the method does not need to calculate the projection steps accurately. We show how the projection errors affect the error bound of the method and conclude that the standard convergence rate is preserved when the errors decrease at some appropriate rate. Second, we further consider the case where the subgradients are corrupted by stochastic noises that are zero-mean and have bounded variance, and we also highlight the dependence of the error bound on the variance.
In contrast with the work , we solve the saddle-point problem over a multiagent network; in particular, we show that the standard convergence rate (where is the iteration counter) is preserved, even when the projection steps are computed approximately and the subgradients are corrupted by some stochastic noises. In contrast with the work , we propose an approximate version of the distributed dual averaging method and show that if the projection errors decrease at some appropriate rate, the standard convergence rate is preserved.
The remainder of this paper is organized as follows. Section 2 gives a formal statement of the multiagent saddle-point problem and the underlying network model. Section 3 presents the method and its main convergence results. Finally, we conclude with Section 4.
Notation and Terminology. We use to denote the -dimensional vector space. We denote the standard inner product on by , for all . Let be a closed convex set in . We say is a proximal function of the set if it is continuous and strongly convex on with respect to some norm ; that is, for all , , for all , where is some positive scalar. We define the proximal center of the set by . For , we introduce the following norm: , where , denotes the Eculidean norm, and and are the parameters that will be specified in the sequel. This implies the following dual norm of : . A vector is called a subgradient of a convex function at if, for all , . The supergradient of a concave function can be defined accordingly.
2. Problem Setup
2.1. Communication Network Model
We consider a time-varying network with agents. The network can be viewed as a directed graph with node set and time-varying link set. The information exchange at time is modeled through using the communication matrix , which induces the link set ; is the set of activated links at time , defined as . We represent the agents’ connectivity at each time by a directed graph .
2.2. Multiagent Saddle-Point Problem
In this paper, we are interested in solving the following problem: where and are convex and compact sets in and , respectively, and each is a convex-concave function defined over known only by agent .
We refer to a vector pair as a saddle point of over if Note that such a vector pair is a solution to problem (1).
We now make some assumptions on problem (1). For the set , we assume that there exists a proximal function with proximal center and convex parameter denoted by and , respectively. Without loss of generality, we assume that . For the set we introduce the similar assumptions and notations; that is, . Therefore, for , it is natural to introduce a proximal function of the set , given by It is easy to see that the proximal center of is and . Furthermore, we denote .
3. Main Results
3.1. The Method and Assumptions
We now propose the method, which is based on the method in . Specifically, each agent updates its estimates by setting (): where , , ( and denote a subgradient of with respect to and a supergradient of with respect to at point , respectively), is the stochastic noise vector in evaluating , is a positive and nondecreasing sequence, , and satisfies the following two properties: where is a positive scalar that represents the error in computing the next iterate by a projection defined by the proximal function and parameter . Note that is not uniquely defined for each .
In the paper, we make the following assumptions.
Assumption 1 (connectivity). For all , there exists a positive integer such that the directed graph is strongly connected.
Assumption 2 (weight matrix). For all , the communication matrix satisfies the following properties: (i) is doubly stochastic and (ii) there exists a positive scalar such that for all . In addition, if , then for all .
Assumption 3 (bounded subgradients). We assume that the following inequalities hold for all and : where and are positive scalars.
Assumption 4 (stochastic subgradient). We assume that the stochastic noise vector satisfies the following properties, for all and : where is some positive constant.
3.2. Convergence Results
With the assumptions made in Section 3.1, we have the following main convergence result.
Theorem 5. Under Assumptions 1, 2, 3, and 4, consider a sequence generated according to the method (4) and (5), with step and projection error sizes: where and are some positive scalars. Let be a saddle point of , and then, for each agent and all , we have where , , , , and .
Proof. See The Appendix.
Remark 6. Theorem 5 represents the main convergence of the method (4) and (5), which shows that the function value converges to at rate in expectation, for each . It is easy to see that the error bound is an increasing function of the noise magnitude . It is worth noting that, in method (4) and (5), we have considered the case where the subgradients are corrupted by stochastic noises that are zero-mean and have bounded variance, and moreover, the projection steps are calculated only approximately. In fact, the proposed method converges when the projection error decreases as , where . However, for the case when , the convergence rate cannot be achieved.
Remark 7. As compared to the work , we show that the standard convergence rate for the dual averaging method is preserved, under the assumption that the projection steps are only computed approximately, and the subgradients are corrupted by some stochastic noises as well. As compared to , the proposed method solves the saddle-point problem in a distributed setting, and the expected convergence rate is also established.
We have studied the problem of solving saddle-point problems over a multiagent network. The objective function is given as a sum of local convex-concave functions, subject to some global constraint. Based on the average consensus algorithm and the dual averaging method, we propose an approximate dual averaging method under the constraint that the projection steps are computed approximately and the subgradients are corrupted by stochastic noises. Finally, we have presented the main convergence results of the proposed method.
Proof of Theorem 5
We provide three lemmas which will be used for the proof of Theorem 5.
Proof. We can compute the general evolution of as follows, by referring to (4): In a similar way, for , we have Hence by noting that for all it follows that Note that (cf. (6)), for all and . Hence, we can use the definition of the dual norm and Assumption 3 to bound as follows: This, along with Lemma A.2, leads to the following estimate: where we have used the inequality , according to Assumption 4. Hence, the desired result follows by using the inequality that .
Lemma A.3 (see ). For function , where is the proximal center of and is some positive scalar, we have the following.(a)Function is convex and differentiable, and its gradient satisfies .(b), for all .(c)Function satisfies , for all .
Proof of Theorem 5. First, we introduce the following gap sequence, for all :
Bounding . It is easy to see that where we have used Assumption 4; that is, . Breaking into two parts, we have For the first term on the right-hand side of (A.10), we can follow an argument similar to that of the proof of Theorem 1 in  to provide the following bound: where we have used (A.4), while for the second term, we achieve this in the following way. By recalling the definition of , we have where , , and the last equality follows from the fact that the weight matrix is double stochastic (cf. Assumption 2). Then, we investigate the sequence ; that is, It turns out that, for the term , we have where the first equality follows from Lemma A.3(a). Hence, we can bound the term as follows: By recalling property (ii) of the approximate projection (6) and Lemma A.3(b), we can further obtain where the equality was used, which holds for (for , it is easy to verify that ). Substituting (A.16) into (A.13) and then taking the expectation, we obtain where we have used the bounds and . Hence, combining the inequalities (A.10), (A.11), and (A.17) yields where we have used the fact that .
Bounding . Following an argument similar to that of the proof of Theorem 1 in , we can arrive at where we have used the fact that is a convex-concave function. By recalling the definition of a saddle point (2), we have which further implies By substituting the preceding inequality into (A.19) leads to It remains to bound . We will use Lemma A.2 to achieve this. Using the update (5), we have where we have used Lemma A.3 and the property (ii) of the approximate projection (6). Using Lemma A.2 it follows that Combining the inequalities (A.18), (A.22), and (A.24) gives We are left to bound the terms and . By recalling the definition of the sequence , we have In a similar way, we have ; therefore, the desired result follows by substituting this and (A.26) into (A.25). The proof is complete.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported by the National Natural Science Foundation of China under Grant no. 61304042, the Natural Science Foundation of Jiangsu Province under Grant no. BK20130856, the Jiangsu Planned Projects for Postdoctoral Research Funds under Grant no. 1302003A, and the Nanjing University of Posts and Telecommunications under Grant nos. NY213041 and NY213094.
- J. C. Duchi, A. Agarwal, and M. J. Wainwright, “Dual averaging for distributed optimization: convergence analysis and network scaling,” IEEE Transactions on Automatic Control, vol. 57, no. 3, pp. 592–606, 2012.
- D. Yuan, Q. Ma, and Z. Wang, “Dual averaging method for solving multi-agent saddle-point problems with quantized information,” Transactions of the Institute of Measurement and Control, vol. 36, no. 1, pp. 38–46, 2014.
- K. I. Tsianos and M. G. Rabbat, “Distributed dual averaging for convex optimization under communication delays,” in Proceedings of the 2012 American Control Conference, pp. 1067–1072, Montreal, Canada, June 2012.
- K. Tsianos, S. Lawlor, and M. Rabbat, “Push-sum distributed dual averaging for convex optimization,” in Proceedings of the 51st IEEE Conference on Decision and Control, pp. 5453–5458, Maui, Hawaii, USA, 2012.
- D. Yuan, S. Xu, H. Zhao, and L. Rong, “Distributed dual averaging method for multi-agent optimization with quantized communication,” Systems & Control Letters, vol. 61, no. 11, pp. 1053–1061, 2012.
- J. Lu, C. Y. Tang, P. R. Regier, and T. D. Bow, “Gossip algorithms for convex consensus optimization over networks,” IEEE Transactions on Automatic Control, vol. 56, no. 12, pp. 2917–2923, 2011.
- A. Nedic and A. Ozdaglar, “Cooperative distributed multi-agent optimization,” in Convex Optimization in Signal Processing and Communications, D. P. Palomar and Y. C. Eldar, Eds., pp. 340–386, Cambridge University Press, Cambridge, UK, 2010.
- I. Matei and J. S. Baras, “Performance evaluation of the consensus-based distributed subgradient method under random communication topologies,” IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 4, pp. 754–771, 2011.
- A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48–61, 2009.
- B. Johansson, T. Keviczky, M. Johansson, and K. H. Johansson, “Subgradient methods and consensus algorithms for solving convex optimization problems,” in Proceedings of the 47th IEEE Conference on Decision and Control (CDC '08), pp. 4185–4190, Cancún, Mexico, December 2008.
- D. Yuan, S. Xu, and J. Lu, “Gradient-free method for distributed multi-agent optimization via push-sum algorithms,” International Journal of Robust and Nonlinear Control, 2014.
- M. Zhu and S. Martinez, “On distributed convex optimization under inequality and equality constraints,” IEEE Transactions on Automatic Control, vol. 57, no. 1, pp. 151–164, 2012.
- D. Yuan, S. Xu, and H. Zhao, “Distributed primal-dual subgradient method for multiagent optimization via consensus algorithms,” IEEE Transactions on Systems, Man, and Cybernetics B: Cybernetics, vol. 41, no. 6, pp. 1715–1724, 2011.
- S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip algorithms,” IEEE Transactions on Information Theory, vol. 52, no. 6, pp. 2508–2530, 2006.
- H. Zhao, W. Ren, D. Yuan, and J. Chen, “Distributed discrete-time coordinated tracking with Markovian switching topologies,” Systems & Control Letters, vol. 61, no. 7, pp. 766–772, 2012.
- J. Li, W. Ren, and S. Xu, “Distributed containment control with multiple dynamic leaders for double-integrator dynamics using only position measurements,” IEEE Transactions on Automatic Control, vol. 57, no. 6, pp. 1553–1559, 2012.
- A. Jadbabaie, J. Lin, and A. S. Morse, “Coordination of groups of mobile autonomous agents using nearest neighbor rules,” IEEE Transactions on Automatic Control, vol. 48, no. 6, pp. 988–1001, 2003.
- J. Lu, D. W. C. Ho, and J. Kurths, “Consensus over directed static networks with arbitrary finite communication delays,” Physical Review E, vol. 80, no. 6, Article ID 066121, 2009.
- R. Olfati-Saber and R. M. Murray, “Consensus problems in networks of agents with switching topology and time-delays,” IEEE Transactions on Automatic Control, vol. 49, no. 9, pp. 1520–1533, 2004.
- W. Ren and R. W. Beard, “Consensus seeking in multiagent systems under dynamically changing interaction topologies,” IEEE Transactions on Automatic Control, vol. 50, no. 5, pp. 655–661, 2005.
- L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging,” Systems & Control Letters, vol. 53, no. 1, pp. 65–78, 2004.
- Y. Nesterov, “Primal-dual subgradient methods for convex problems,” Mathematical Programming, vol. 120, no. 1, pp. 221–259, 2009.
- A. Nedić and A. Ozdaglar, “Subgradient methods for saddle-point problems,” Journal of Optimization Theory and Applications, vol. 142, no. 1, pp. 205–228, 2009.
Copyright © 2014 Deming Yuan and Yang Yang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.