Mathematical Problems in Engineering

Volume 2017 (2017), Article ID 1793291, 13 pages

https://doi.org/10.1155/2017/1793291

## Distributed Constrained Stochastic Subgradient Algorithms Based on Random Projection and Asynchronous Broadcast over Networks

^{1}State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China^{2}Information Engineering College, Henan University of Science and Technology, Luoyang, China

Correspondence should be addressed to Junlong Zhu

Received 22 February 2017; Accepted 17 July 2017; Published 28 September 2017

Academic Editor: Thomas Hanne

Copyright © 2017 Junlong Zhu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

We consider a distributed constrained optimization problem over a time-varying network, where each agent only knows its own cost functions and its constraint set. However, the local constraint set may not be known in advance or consists of huge number of components in some applications. To deal with such cases, we propose a distributed stochastic subgradient algorithm over time-varying networks, where the estimate of each agent projects onto its constraint set by using random projection technique and the implement of information exchange between agents by employing asynchronous broadcast communication protocol. We show that our proposed algorithm is convergent with probability 1 by choosing suitable learning rate. For constant learning rate, we obtain an error bound, which is defined as the expected distance between the estimates of agent and the optimal solution. We also establish an asymptotic upper bound between the global objective function value at the average of the estimates and the optimal value.

#### 1. Introduction

Distributed optimization problems have received considerable interest from industry and academia since it arises in many applications, including distributed parameter estimation, detection and source localization [1–4], distributed learning and regression [5–7], resource allocation [8, 9], and distributed power control [10]. The goal of such problems is to minimize a global objective function which is a sum of local cost functions over a network. To achieve this goal, we need to design a distributed optimization algorithm over time-varying networks, where each local cost function and constraint set are private information. Moreover, each agent can exchange information with its neighbors across the time-varying networks. Hence, many distributed optimization algorithms are proposed to address the distributed optimization problems [11–15].

However, each agent may not know the constraint set beforehand in some applications. Thus, the estimate of each agent cannot be projected onto the constraint set by the determinate projection operation, and the determinate projection-based distributed optimization algorithm [16–18] cannot be directly application in such optimization problems. To deal with this case, a random projection-based distributed optimization algorithm is studied in [19, 20]. Moreover, the determinate projection-based distributed optimization algorithm can be regarded as a special case of the random projection-based distributed optimization algorithm. Therefore, we consider the distributed random projection algorithm in this paper. Furthermore, the local constraint set is assumed to have the form, , where is the index set and is a simple set.

In addition, each agent needs to exchange information with its neighbors over time-varying networks. Hence, the design of communication protocol is a crucial role in the design of distributed optimization algorithm. In practice, gossip communication protocol [21] and broadcast communication protocol [22] are two frequently used communication protocols. An asynchronous distributed random projection algorithm based on gossip communication protocol has been proposed in [23]. However, the broadcast is a natural communication mode in wireless media. Moreover, compared with the gossip communication protocol, the broadcast communication protocol can improve communication in consensus algorithms. Since the consensus problem can be regarded as a special case of distributed optimization problem, the global objective function is a sum of local cost functions. Hence, the same improvement result can apply to the distributed optimization problem. Furthermore, due to bidirectional communications, the communication bottleneck is created in broadcast communication protocol. To remove this bottleneck, we use the random broadcast communication protocol to exchange information among agents over time-varying networks. Therefore, we propose an asynchronous distributed random subgradient algorithm by using random projection operation and random broadcast communication protocol. Further, we assume that the link can be randomly interrupted in our proposed algorithm. The broadcast-based consensus algorithm has been recently studied in [22, 24, 25]. Besides, [26] proposes an asynchronous distributed projection algorithm based on broadcast communication protocol, where the constraint set is available to every agent and the projection operation is a determinate projection. Unlike [26], we project the estimate of each agent onto its own constraint set by employing random projection operation. Furthermore, each agent only knows its own constraint set and does not know the constraint sets of other agents. Informally, the algorithm in [26] is a special case of our proposed algorithm.

Our objective is to analyze the convergence properties of our proposed algorithm and establish some asymptotic error bounds. The main contributions of this paper are as follows:(i)We propose an asynchronous distributed random subgradient algorithm based on random projection operation and randomized broadcast communication protocol, where we consider the case that the link failures are randomly occurring over time-varying networks. Moreover, we assume that each agent only knows its local cost function and its own constraint set.(ii)We analyze the convergence properties of our proposed algorithm by appropriately chosen step sizes; we show that the estimates of all agents converge to an optimal solution with probability 1.(iii)We also establish some asymptotic error bounds when the step sizes are constant.

The remainder of this paper is organized as follows. We describe the optimization problem of our interest, present the algorithm, and give some assumptions in Section 2. We state the main results of the paper in Section 3. In Section 4, the convergence rate of the algorithm and their proofs are provided. The analysis of error bounds is presented in Section 5. The conclusion of the paper is given in Section 6.

*Notation*. In this paper, all vectors are column vectors. We use boldface to denote the vectors in and use normal font to denote scalars or vectors of different dimensions. and denote the transpose operation of a vector and a matrix , respectively. We use to denote the standard Euclidean norm of a vector . The notations and denote a vector whose all entries are 1 and the identity matrix of size , respectively. denotes the expectation of a random variable .

#### 2. Algorithm Description and Assumptions

We consider a network which consists of agents (or nodes), indexed by . At each time , the network topology is denoted by an undirected graph , where denotes the set of agents and denotes the set of edges. We assume that the undirected graph is simple. If there exists a directed edge from agent to at time , then . In a connected network, two agents are said to be neighbor of if the agents may be connected directly by an edge; that is, the agents can share information with each other. At time , we denote the set of neighbors of agent by ; that is, . We also assume that communication links may be interrupted at random times.

We consider an optimization problem as follows:where denotes the convex objective function of agent and denotes constraint set of agent .

To solve problem (1), we propose an asynchronous distributed subgradient random projection algorithm based on randomized broadcast communication protocol. In this paper, we employ the asynchronous time model as in [21] and the randomized broadcast model as in [22].

##### 2.1. Algorithm Description

In asynchronous model, each agent has a virtual clock, and the virtual clock ticks is a Poisson process with rate . We assume that one agent is waked up at a time. Thus, if an agent is waked up at time , we use to denote the index of the agent. Since the link can be randomly interrupted, we also use to denote the subset of the neighbors of the agent . Hence, the agent receives the broadcast information with probability ; that is, if , then . Each agent hears the broadcast information from agent at time . Hence, if , thenwhere denotes the estimate of agent at time . If , then the estimate of agent is updated as follows:where is a mixing parameter, denotes the stepsize of agent at time , denotes a random variable, denotes subgradient of function at , and is a stochastic subgradient error of agent at time . Moreover, let be independent of each other and be independent from the random broadcast process. Note that is drawn from the index set . denotes the Euclidean projection operation on the constraint set .

In order to analyze conveniently, we first define the following matrix:where is the identity matrix of size and denotes a vector with th entry being 1 and the other entries being 0. From Lemma in [26], each random matrix is not doubly stochastic, but the expectation of matrix is doubly stochastic. Let . We note that the matrices are i.i.d. and , where denotes the largest eigenvalue of a symmetric matrix . Hence, by (4), relations (2)-(3) can be rewritten as follows:where if occurs, or else .

##### 2.2. Assumptions

In order to analyze the convergence properties of algorithm (5), we need to give some standard assumptions as follows.

*Assumption 1. *The network topology is connected and without self-loops at time . If , then . If , then . Furthermore, the link failure process is independent and identically distribution.

*Assumption 2. *We assume that the constraint sets are nonempty closed and convex for all . We also assume that the cost function is convex. Moreover, we assume that the subgradient of is uniformly bounded over for every ; namely, for all .

*Assumption 3. *For any random variable , , and for all , we assume that the following relation hold:where is a positive constant and denotes the distance between a point and a set .

In Assumption 3, random variable obeys a positive probability distribution. Moreover, if the set has a nonempty interior, then Assumption 3 can be seen to hold [27].

Let denote all the information generated by the entire history of algorithm (5). Next, we give the assumption for stochastic subgradient error as follows.

*Assumption 4. *For each agent at time , we assume that the error satisfieswith probability 1, where is a positive constant.

In this section, we propose an asynchronous distributed subgradient projection algorithm based on random projection operation and randomized broadcast communication protocol. Moreover, we also provide some standard assumptions to analyze the convergence properties of the algorithm. We will present main results of this paper in Section 3.

#### 3. Main Results

In this section, we provide the convergence properties of algorithm (5). The detailed proofs of main results are given in next section.

We first define the optimal value and optimal solutions set of problem (1) as follows:

The first result states that our proposed algorithm is convergent with probability 1.

Theorem 5. *Under Assumptions 1–4, let the set of optimal point be nonempty. Let estimate sequence , , be generated by algorithm (5) with positive stepsize , where is the update number of agent until time . Then, the estimates of all agents converge to some optimal points with probability 1.*

Theorem 5 shows that the iterations of all agents asymptotic converge to some optimal points over time-varying networks; that is, for all , with probability 1.

We also establish asymptotic error bound between some optimal points in the optimal set and the estimates of algorithm (5).

Theorem 6. *Under Assumptions 1–4, let estimate sequence , , be generated by algorithm (5) and let for . Moreover, assume that each function is -strongly convex, where the constant satisfies . Furthermore, let the set be compact. Then, one has with probability 1**(a)**(b)where , is dimension of vector, for all , , , , and .*

Theorem 6 establishes asymptotic error bound, which is defined as the average of expected distances between some optimal points and the estimates of algorithm (5). The asymptotic error is defined as the difference between the global cost function at and the global cost function at optimal point .

In this section, we provide the main results of this paper. The detailed proofs of main results are given in Sections 4 and 5.

#### 4. Analysis of Convergence Results

In this section, we provide the proof of Theorem 5. For this purpose, we first establish a basic iterate relation for the estimates of algorithm (5).

Lemma 7. *Under Assumptions 1–4, the estimate sequence , , is generated by algorithm (5). Let and for all . Then, for any constant and all vectors , one has with probability 1for , where , , and is a sufficiently large positive constant.*

*Proof. *Let . Following from the nonexpansive property [16], we haveFurther, from the definition of inner product and Cauchy-Schwarz inequality, we obtainwhere the second inequality follows from the inequality . Hence, following from relations (12) and (13), we obtainIn addition, we also havewhere the last inequality follows from the inequality for some by letting , , and . Hence, combining relations (14) and (15), we obtainNext, we consider the term in (16), which can be rewritten as follows:Thus, according to Lemma of [26], there exists a sufficient large positive constant , for any , we havewhere we use the Cauchy-Schwarz inequality. Then, from (16) and (18), we obtainNote that ; we haveHence, taking conditional expectation in both sides of (19) and using the fact that the mean of stochastic errors is zero, we obtain that with probability 1 for all and By Assumption 3, we have where . Let . For all , we obtain with probability 1Note that if . Besides, if , then the agent updates its estimate with probability and does not update its estimate with probability . Hence, following from (23), the desired result is obtained completely.

We next consider the errors for all , which are caused by the random projection operation. The errors are defined as follows: for all . Moreover, we have the following lemma.

Lemma 8. *Under Assumptions 1–4, for and all , the relations and hold with probability 1. Furthermore, for all , one has that with probability 1.*

*Proof. *Let in Lemma 7. For all , we obtain with probability 1where with large enough and . Since the function is convex [28], we haveCombining (24), (25), and the definition of projection operation, we obtain for all with probability 1where . Hence, following from the Supermartingale Convergence Theorem, which is stated in [29], we obtain . Further, taking expectation on in (26), we have that for all . By Monotone Convergence Theorem [30], we obtain ; the relations .

Further, for all , following from the definition of error , we haveMoreover, note that . Hence, we havewhere we use the nonexpansive property of the projection. Furthermore, following from the inequality , we obtainwhere the last inequality follows from Lemma in [26]. Taking the conditional expectation on both sides of (29), we haveFollowing from the fact that and , we obtain with probability 1 for all . Further, we also have with probability 1. Therefore, the conclusions of this lemma are proved completely.

We also establish the following lemma to prove the main results of this paper.

Lemma 9. *Under Assumptions 1–4, let . Assume that are generated by algorithm (5). Then, one finds with probability 1 for , where .*

*Proof. *We define the vector in for such that for . According to algorithm (5), we obtainwhere for , and if , then . Further, note that . Thus, by fixing arbitrary index , we obtainSince , . Thus, we havewhere and we use the fact that . Following from (33), we obtain with probability 1We first compute the second term in right side of (34). According to the property of stochastic matrix , we haveBy using the inequality , we have for In addition, we have by using . Hence, we obtain thatFurthermore, from the nonexpansive property and for with large enough , we findThus, from (38), we have with probability 1Hence, from inequalities (36) and (39), we obtainFurther, we obtain for According to Lemma 8, the first term in (41) is convergent with probability 1. Moreover, since and then following from Supermartingale Convergence Theorem [29], we haveFrom the definition of , we obtain for From the convexity of norm and the definition of , we haveThus, following from (43) and (44), we can see thatfor all with probability 1. Therefore, the lemma is proved completely.

By using Lemmas 7, 8, and 9, we start to prove Theorem 5.

*Proof of Theorem 5. *From Lemma 7, we obtainLetting , then . Therefore, we haveBy the convexity of each function and the boundedness of each subgradient norm , we haveFurthermore, according to the convexity of squared-norm, we obtainwhere the last inequality uses the nonexpansive property.

Let . Moreover, we also haveHence, combining inequalities (49) and (50) and then summing over from to , we haveThus, from (47), (48), and (51), we obtainLet . Following from inequalities (46) and (52), we have with probability 1Moreover, note that . Hence, by using Supermartingale Convergence Theorem [29] and Lemma 9, for all , we find that is convergent with probability 1. Moreover, hold with probability 1. Further, noting that , we have with probability 1Hence, from the definition of and Lemma 8, we obtain with probability 1Since is convergent with probability 1, and are also convergent by using (5) and (55). Therefore, and are convergent with probability 1. According to (54), we obtain with probability 1 . We further have . Hence, we obtain with probability 1 . Following from Lemma 9, we have with probability 1 . Therefore, we obtain with probability 1. Moreover, according to Lemma 9, the statement of this theorem is obtained completely.

In this section, we have given a proof of Theorem 5. From this theorem, we can see that the proposed algorithm is almost sure convergent. In next section, we will analyze the error bounds of the proposed algorithm.

#### 5. Error Bounds Analysis

In this section, we give the proof of Theorem 6, where we assume that the cost function is strongly convex with constant and the stepsize is constant for all ; that is, . To prove Theorem 6, we first establish a basic iterate relation for a constant stepsize as follows.

Lemma 10. *Under Assumptions 1–4, let the estimate sequence , , be generated by algorithm (5). Moreover, for all , assuming that each function is -strongly convex, where the constant satisfies . Then, with probability 1, the following relation holds for all and :*

*Proof. *Since , we haveMoreover, we also havewhere the inequality follows from the fact that the function is strongly convex. Further, by using Assumption 2, we obtain