#### Abstract

This paper considers the problem of solving the saddle-point problem over a network, which consists of multiple interacting agents. The global objective function of the problem is a combination of local convex-concave functions, each of which is only available to one agent. Our main focus is on the case where the projection steps are calculated approximately and the subgradients are corrupted by some stochastic noises. We propose an approximate version of the standard dual averaging method and show that the standard convergence rate is preserved, provided that the projection errors decrease at some appropriate rate and the noises are zero-mean and have bounded variance.

#### 1. Introduction

The problem of solving optimization problems over a multiagent network has attracted a lot of attention in recent years (see, e.g., [1–13]). The objective function of such problems is, in general, a sum of local objective functions, each of which is known to one specific agent only. Moreover, the estimates of all agents are restricted to lie in some convex set. Duo to the lack of a central coordinator, the methods that are developed to solve this problem have to be executed by individual agents through local interactions.

In this paper, we consider the multiagent saddle-point problem where the global objective function is given as a sum of local convex-concave functions, subject to some global constraint. We utilize the average consensus algorithm (see, e.g., [14–21]) as a mechanism to design a distributed method for solving this problem. The method is based on the standard dual averaging method (see, e.g., [1, 22]), and it can also be viewed as an approximate version of the distributed dual averaging method in [2]. Different from the distributed dual averaging methods in [1–4], which require that the projection steps have to be very accurately calculated, the proposed method assumes that they only have to be computed approximately. Moreover, the proposed method also considers the case where the subgradients are corrupted by some stochastic noises.

*Literature Review*. In [9], the authors develop a general framework for solving convex optimization problem over a network of multiple agents. Based on the average consensus algorithms, they propose a subgradient-based method; the method is fully distributed, in the sense that each agent only needs to communicate with its neighbors. Different from the work [9], the authors in [1] propose a distributed method that is based on dual averaging of subgradients; in particular, the authors characterize the explicit convergence rate of the proposed method. The authors in [3] further study the effects of communication delays on the distributed dual averaging method. The work [4] utilizes the push-sum algorithm as a mechanism to design a distributed dual averaging method; the implementation of the method removes the need for the doubly stochastic communication matrices. In [2], the authors solve the saddle-point problem over a multiagent network; the objective function is given as a sum of multiple convex-concave functions. Based on the dual averaging method, the authors propose a distributed method and characterize its convergence rate.

The contribution of our work in this paper is mainly twofold. First, we propose an approximate dual averaging method, and the implementation of the method does not need to calculate the projection steps accurately. We show how the projection errors affect the error bound of the method and conclude that the standard convergence rate is preserved when the errors decrease at some appropriate rate. Second, we further consider the case where the subgradients are corrupted by stochastic noises that are zero-mean and have bounded variance, and we also highlight the dependence of the error bound on the variance.

In contrast with the work [22], we solve the saddle-point problem over a multiagent network; in particular, we show that the standard convergence rate (where is the iteration counter) is preserved, even when the projection steps are computed approximately and the subgradients are corrupted by some stochastic noises. In contrast with the work [2], we propose an approximate version of the distributed dual averaging method and show that if the projection errors decrease at some appropriate rate, the standard convergence rate is preserved.

The remainder of this paper is organized as follows. Section 2 gives a formal statement of the multiagent saddle-point problem and the underlying network model. Section 3 presents the method and its main convergence results. Finally, we conclude with Section 4.

*Notation and Terminology*. We use to denote the -dimensional vector space. We denote the standard inner product on by , for all . Let be a closed convex set in . We say is a* proximal function* of the set if it is continuous and strongly convex on with respect to some norm ; that is, for all , , for all , where is some positive scalar. We define the* proximal center* of the set by . For , we introduce the following norm: , where , denotes the Eculidean norm, and and are the parameters that will be specified in the sequel. This implies the following dual norm of : . A vector is called a* subgradient* of a convex function at if, for all , . The supergradient of a concave function can be defined accordingly.

#### 2. Problem Setup

##### 2.1. Communication Network Model

We consider a time-varying network with agents. The network can be viewed as a directed graph with node set and time-varying link set. The information exchange at time is modeled through using the communication matrix , which induces the link set ; is the set of activated links at time , defined as . We represent the agents’ connectivity at each time by a directed graph .

##### 2.2. Multiagent Saddle-Point Problem

In this paper, we are interested in solving the following problem: where and are convex and compact sets in and , respectively, and each is a convex-concave function defined over known only by agent .

We refer to a vector pair as a* saddle point* of over if
Note that such a vector pair is a solution to problem (1).

We now make some assumptions on problem (1). For the set , we assume that there exists a proximal function with proximal center and convex parameter denoted by and , respectively. Without loss of generality, we assume that . For the set we introduce the similar assumptions and notations; that is, . Therefore, for , it is natural to introduce a proximal function of the set , given by It is easy to see that the proximal center of is and . Furthermore, we denote .

#### 3. Main Results

##### 3.1. The Method and Assumptions

We now propose the method, which is based on the method in [2]. Specifically, each agent updates its estimates by setting ():
where , , ( and denote a subgradient of with respect to and a supergradient of with respect to at point , respectively), is the* stochastic noise vector* in evaluating , is a positive and nondecreasing sequence, , and satisfies the following two properties:
where is a positive scalar that represents the error in computing the next iterate by a projection defined by the proximal function and parameter . Note that is not uniquely defined for each .

In the paper, we make the following assumptions.

*Assumption 1 (connectivity). *For all , there exists a positive integer such that the directed graph is strongly connected.

*Assumption 2 (weight matrix). *For all , the communication matrix satisfies the following properties: (i) is doubly stochastic and (ii) there exists a positive scalar such that for all . In addition, if , then for all .

*Assumption 3 (bounded subgradients). *We assume that the following inequalities hold for all and :
where and are positive scalars.

*Assumption 4 (stochastic subgradient). *We assume that the stochastic noise vector satisfies the following properties, for all and :
where is some positive constant.

##### 3.2. Convergence Results

We show convergence of the method (4) and (5) via local average pair defined at each agent , where is the iteration counter.

With the assumptions made in Section 3.1, we have the following main convergence result.

Theorem 5. *Under Assumptions 1, 2, 3, and 4, consider a sequence generated according to the method (4) and (5), with step and projection error sizes:
**
where and are some positive scalars. Let be a saddle point of , and then, for each agent and all , we have
**
where , , , , and .*

* Proof. *See The Appendix.

*Remark 6. *Theorem 5 represents the main convergence of the method (4) and (5), which shows that the function value converges to at rate in expectation, for each . It is easy to see that the error bound is an increasing function of the noise magnitude . It is worth noting that, in method (4) and (5), we have considered the case where the subgradients are corrupted by stochastic noises that are zero-mean and have bounded variance, and moreover, the projection steps are calculated only approximately. In fact, the proposed method converges when the projection error decreases as , where . However, for the case when , the convergence rate cannot be achieved.

*Remark 7. *As compared to the work [2], we show that the standard convergence rate for the dual averaging method is preserved, under the assumption that the projection steps are only computed approximately, and the subgradients are corrupted by some stochastic noises as well. As compared to [23], the proposed method solves the saddle-point problem in a distributed setting, and the expected convergence rate is also established.

#### 4. Conclusion

We have studied the problem of solving saddle-point problems over a multiagent network. The objective function is given as a sum of local convex-concave functions, subject to some global constraint. Based on the average consensus algorithm and the dual averaging method, we propose an approximate dual averaging method under the constraint that the projection steps are computed approximately and the subgradients are corrupted by stochastic noises. Finally, we have presented the main convergence results of the proposed method.

#### Appendix

#### Proof of Theorem 5

We provide three lemmas which will be used for the proof of Theorem 5.

Lemma A.1 (see [7]). *Let Assumptions 1 and 2 hold. Then
**
where , for all .*

Lemma A.2. *Under Assumptions 1, 2, 3, and 4, consider a sequence generated according to the method (4) and (5), and then, for all ,
**
where .*

* Proof. *We can compute the general evolution of as follows, by referring to (4):
In a similar way, for , we have
Hence by noting that for all it follows that
Note that (cf. (6)), for all and . Hence, we can use the definition of the dual norm and Assumption 3 to bound as follows:
This, along with Lemma A.2, leads to the following estimate:
where we have used the inequality , according to Assumption 4. Hence, the desired result follows by using the inequality that .

Lemma A.3 (see [22]). *For function , where is the proximal center of and is some positive scalar, we have the following.*(a)*Function is convex and differentiable, and its gradient satisfies .*(b)*, for all .*(c)*Function satisfies , for all .*

*Proof of Theorem 5. *First, we introduce the following gap sequence, for all :
*Bounding *. It is easy to see that
where we have used Assumption 4; that is, . Breaking into two parts, we have
For the first term on the right-hand side of (A.10), we can follow an argument similar to that of the proof of Theorem 1 in [2] to provide the following bound:
where we have used (A.4), while for the second term, we achieve this in the following way. By recalling the definition of , we have
where , , and the last equality follows from the fact that the weight matrix is double stochastic (cf. Assumption 2). Then, we investigate the sequence ; that is,
It turns out that, for the term , we have
where the first equality follows from Lemma A.3(a). Hence, we can bound the term as follows:
By recalling property (ii) of the approximate projection (6) and Lemma A.3(b), we can further obtain
where the equality was used, which holds for (for , it is easy to verify that ). Substituting (A.16) into (A.13) and then taking the expectation, we obtain
where we have used the bounds and . Hence, combining the inequalities (A.10), (A.11), and (A.17) yields
where we have used the fact that .*Bounding *. Following an argument similar to that of the proof of Theorem 1 in [2], we can arrive at
where we have used the fact that is a convex-concave function. By recalling the definition of a saddle point (2), we have
which further implies
By substituting the preceding inequality into (A.19) leads to
It remains to bound . We will use Lemma A.2 to achieve this. Using the update (5), we have
where we have used Lemma A.3 and the property (ii) of the approximate projection (6). Using Lemma A.2 it follows that
Combining the inequalities (A.18), (A.22), and (A.24) gives
We are left to bound the terms and . By recalling the definition of the sequence , we have
In a similar way, we have ; therefore, the desired result follows by substituting this and (A.26) into (A.25). The proof is complete.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant no. 61304042, the Natural Science Foundation of Jiangsu Province under Grant no. BK20130856, the Jiangsu Planned Projects for Postdoctoral Research Funds under Grant no. 1302003A, and the Nanjing University of Posts and Telecommunications under Grant nos. NY213041 and NY213094.