Robust Control, Optimization, and Applications to Markovian Jumping SystemsView this Special Issue
Asynchronous Gossip-Based Gradient-Free Method for Multiagent Optimization
This paper considers the constrained multiagent optimization problem. The objective function of the problem is a sum of convex functions, each of which is known by a specific agent only. For solving this problem, we propose an asynchronous distributed method that is based on gradient-free oracles and gossip algorithm. In contrast to the existing work, we do not require that agents be capable of computing the subgradients of their objective functions and coordinating their step size values as well. We prove that with probability 1 the iterates of all agents converge to the same optimal point of the problem, for a diminishing step size.
In recent years, the problem of solving convex optimization problems over a network has attracted a lot of research attention; see [1–18]. The objective function of the problem is a sum of convex functions, each of which is known by a specific agent only. Such problems arise in many real applications including distributed finite-time optimal rendezvous  and distributed regression over sensor networks . The methods that are designed for solving these optimization problems need to be fully distributed; that is, there does not exist a central coordinator.
In this paper, we propose an asynchronous gossip-based gradient-free method for solving the convex optimization problem over a multiagent network. The method is based on the gossip algorithm  and the gradient-free oracles . The method is asynchronous in the sense that only one agent communicates at a given time, in contrast to the synchronous methods where all agents communicate simultaneously. Moreover, the method does not rely on the assumption that the information of the subgradients of the objective function is available. As is well known that for a variety of reasons there have been many instances where derivatives of the objective functions are unavailable or computationally expensive to calculate [20, 21].
Literature Review. In , the authors study the problem of minimizing a sum of multiple convex functions, each of which is known to one specific agent only. The authors use the average consensus algorithm in the literature on multiagent systems (see, e.g., [19, 22–26]) as a mechanism to develop a distributed subgradient method for solving the optimization problem; the convergence of the method is also given for a constant step size. The authors in  further take the global equality and inequality constraints into consideration. The work in  proposes a variant of the distributed subgradient method in , in which at each iteration several consensus steps are executed, which simplifies the proof of the convergence of the method. Inspired by the work in , the authors in  further incorporate the global inequality constraints. The aforementioned methods are synchronous because they require that all agents in the network update at the same time. To overcome this limitation, the work in  develops an asynchronous distributed algorithm, based on the gossip algorithm. The algorithm is asynchronous in the sense that only one agent communicates at a given time. Moreover, all agents use different step size values and they do not require any coordination of the agents. In , the author further removes the need for bidirectional communications of the asynchronous algorithm in ; the convergence of the algorithm is also established. The aforementioned methods or algorithms, however, rely on the assumption that the subgradients of the objective functions are available to each agent, respectively.
By comparison to previous work, the main contributions of this paper are twofold: (i) different from the methods or algorithms considered in existing papers, which rely on computing the subgradients of each agent’s objective function, we propose the derivative-free method which is based on utilizing the random gradient-free oracles; (ii) the proposed method is asynchronous, in the sense that all agents use different step size values that do not require any coordination of the agents. We prove that with probability 1 the iterates of all agents converge to the same optimal point of the problem, for a diminishing step size.
Notation and Terminology. Let be the -dimensional vector space. We denote the standard inner product on by , for . We write to denote the Euclidean norm of a vector and to denote the Euclidean projection of a vector on . We use to denote the transpose of . For a matrix , represents the element in the th row and th column of , and represents its transpose. We use to denote the expected value of a random variable . For a function , its gradient at a point is represented by .
2. Problem Formulation
In this section, we start by describing the constrained multiagent optimization problem. Then, we provide some preliminary results on the gossip algorithm that we use in developing the method.
2.1. Constrained Multiagent Optimization
We consider the following constrained multiagent optimization problem: where is a decision vector; is the convex objective function of agent known only by agent , and we assume that is Lipschitz continuous over with Lipschitz constant ; is a nonempty closed convex set. We denote the optimal set of problem (1) by , and we assume that it is nonempty. Note that in problem (1), each function need not be differentiable.
2.2. Gossip Algorithm
The underlying network topology of problem (1) is denoted by , where is the node set and is the set of links with and only if there is a link between agents and . We assume that the network is fixed, undirected, and connected.
In the paper, we utilize gossip algorithm as a mechanism to design the method. To be specific, at each time instant, agent is chosen with probability , and then with some positive probability, agent communicates with one of its neighbors agent . The iterations evolve as follows: for , and for agents that do not belong to , update
3. Gossip-Based Gradient-Free Method
In this section, motivated by the random gradient-free method in  and the gossip algorithm in , we present an asynchronous gossip-based gradient-free method for solving problem (1). We use to denote the index of the agent that is chosen to update at time and the index of the agent communicating with agent . The method is given as follows.
Gossip-Based Gradient-Free Method with a Diminishing Step Size
Initialize: choose random , .
Iteration :(i)for :(1)compute ;(2)compute , where , and denotes the number of updates that agent has performed until time , inclusively, and is the random gradient-free oracle, given by where , and is a positive constant; is a random variable generated locally according to the Gaussian distribution.(ii)For : .
We use to denote the -field generated by the entire history of the random variables to iteration ; that is, where .
The method can be presented in a more compact form, by defining the following weight matrix: where is the identity matrix and denotes the th standard basis vector. It is easy to see that is doubly stochastic.
Now we can write the method as follows: for all and any , where is the indicator function of the event . For the gradient-free oracle , we have the following lemma, which is adopted from .
Lemma 1. For each and all , one has the following: (a), where with , and it satisfies: (b).
Remark 2. Note that method (7) is asynchronous, in the sense that to implement the method, each agent need not coordinate its step size with the step sizes of its neighbors; the time-varying parameters share the same feature. In addition, to implement the method (7), the information of subgradients of the objective functions is not needed; however, each agent only needs to make two function evaluations per iteration to get the gradient-free oracle.
Let be the event that agent updates at time and the probability of event . It is easy to see that where denotes the set that contains all agents that are neighboring to agent and denotes the probability that agent is chosen by its neighbor to communicate. In the paper, we denote and , respectively. There is an interesting link between the step size and the probability that agent updates.
Lemma 3 (see ). Let . Let for all and , and also let be a scalar such that . Then, there exists a large enough such that with probability 1 for all and , (a);(b);(c).
To establish the convergence of method (7), we also make use of the following lemma.
Lemma 4 (see ). Let , , , and be nonnegative random sequences such that for all , with probability 1, where . If and with probability 1, then, with probability 1, the sequence converges to some random variable and .
We now present the main result of the paper, which is given in the following theorem.
Theorem 5. Let , , be the sequences generated by method (7) with and , where is some positive constant. Assume that problem (1) has a nonempty optimal set . Also, assume that the sequence is independent and identically distributed. Then the sequences , , converge to the same random point in with probability 1.
Proof. For and , we have for any , where the first inequality follows from the nonexpansive property of the projection operation. For , by recalling Lemma 3(c), with probability 1 the last term on the right-hand side of (10) can be bounded as follows: Substituting the preceding inequality into (10) gives To simplify the notation, we denote and ; then from Lemma 3(b) and (12) it follows that with probability 1 for all and , Taking the conditional expectation on , and jointly yields where the last inequality follows from using Lemma 1. For the last term on the right-hand side of the preceding inequality, we can derive where in the last inequality we have use the bound , according to Lemma 1. Hence, substituting (15) into (14) yields where we have used the inequalities and , based on Lemma 1(a). Using the fact that and Lemma 3(a), we obtain which implies Taking the expectation with respect to and using the fact the preceding inequality holds with probability , and with probability , we obtain with probability 1 for all and , Summing the above inequality for , and noting that , and denoting , we obtain with probability 1 for all and , where and we have used the following inequality: Now by using the definition of the weight matrix and the convexity of the squared norm it follows that which yields the final bound for all and with probability 1: where and we have used the following inequality: Now we are ready to establish the convergence of the method. First, note that which can be easily seen from the explicit expressions for and . For the term , we can follow an argument similar to the proof of Lemma 4 in  and derive that for each , and , which gives Hence, combining the preceding fact with Lemma 4, which we can obtain with probability 1, the sequence converges for any , and (note that , and hence ), which implies This, along with the fact that the sequence converges for any and , gives our final statement, that is, for all with probability 1.
Remark 6. Note that other choices of the parameters are possible. For example, we can set , for all and any , under which case the convergence of the method (7) can also be established.
Remark 7. In contrast to the subgradient-based methods in [1–3], the implementation of the proposed method does not need the information of subgradients but only the function values. This makes our method suitable for the cases where explicit gradient calculations are computationally infeasible or expensive. In contrast to the gradient-free method in , the proposed method is asynchronous and the step sizes do not require any coordination of the agents.
In this paper, we have considered the constrained multiagent optimization problem. We present an asynchronous method that is based on the gossip algorithm and the gradient-free oracles for solving the problem. The proposed method removes the need for synchronous communications and the information of the subgradients as well. Finally, we prove that with probability 1 the iterates of all agents converge to the same optimal point of the problem, for a diminishing step size. There are several interesting questions that remain to be explored. For instance, it would be interesting to study the case of constant step size; it would be also interesting to study the effects of message quantization on the proposed method.
Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.
This work was supported by the National Natural Science Foundation of China under Grant no. 61304042, the Natural Science Foundation of Jiangsu Province under Grant no. BK20130856, the Jiangsu Planned Projects for Postdoctoral Research Funds under Grant no. 1302003A, and Nanjing University of Posts and Telecommunications under Grant no. NY213041.
B. Johansson, T. Keviczky, M. Johansson, and K. H. Johansson, “Subgradient methods and consensus algorithms for solving convex optimization problems,” in Proceedings of the 47th IEEE Conference on Decision and Control (CDC '08), pp. 4185–4190, Cancun, Mexico, December 2008.View at: Publisher Site | Google Scholar
S. S. Ram, A. Nedić, and V. V. Veeravalli, “Asynchronous gossip algorithms for stochastic optimization,” in Proceedings of the 48th IEEE Conference on Decision and Control Held Jointly with the 28th Chinese Control Conference (CDC/CCC '09), pp. 3581–3586, Shanghai, China, December 2009.View at: Publisher Site | Google Scholar
S. Lee and A. Nedich, “Asynchronous gossip-based random projection algorithms over networks,” submitted to IEEE Transactions on Automatic Control.View at: Google Scholar
S. S. Ram, A. Nedic, and V. V. Veeravalli, “Asynchronous gossip algorithms for stochastic optimization: constant stepsize analysis,” in Recent Advances in Optimization and Its Applications in Engineering, M. Diehl, F. Glineur F, E. Jarlebring, and W. Michiels, Eds., pp. 51–60, Springer, Berlin, Germany, 2010.View at: Google Scholar
Yu. Nesterov, “Random gradient-free minimization of convex functions,” CORE Discussion Paper 2011/1, 2011.View at: Google Scholar