Abstract
Most of the current algorithms for solving distributed online optimization problems are based on the firstorder method, which are simple in computation but slow in convergence. Newton’s algorithm with fast convergence speed needs to calculate the Hessian matrix and its inverse, leading to computationally complex. A distributed online optimization algorithm based on Newton’s step is proposed in this paper, which constructs a positive definite matrix by using the firstorder information of the objective function to replace the inverse of the Hessian matrix in Newton’s method. The convergence of the algorithm is proved theoretically and the regret bound of the algorithm is obtained. Finally, numerical experiments are used to verify the feasibility and efficiency of the proposed algorithm. The experimental results show that the proposed algorithm has an efficient performance on practical problems, compared to several existing gradient descent algorithms.
1. Introduction
In recent years, with the development of computer network technology, distributed optimization algorithms [1–3] have been successfully applied to solve largescale optimization problems, which are considered to be an effective method. Distributed optimization decomposes a largescale optimization problem into multiple subproblems according to different agents in a multiagent network. Different agents calculate their associated subproblems simultaneously and communicate information with their immediate neighbor agents. And, all the agents finally obtain a common optimal solution that can minimize the sum of their objective functions through the exchange of information along with their respective optimization iteration. Many problems in science and engineering can be modeled as problems, such as machine learning [4], signal tracking and location [5], sensor networks [6, 7], and smart grids [8].
Distributed optimization assumes that the local objective function is known and invariant. However, many practical problems in diverse fields are affected by their environment and the corresponding objective function is changing all the time, which requires the optimization process of these problems in the online setting. In the distributed online optimization problem, each agent has a limited amount of information. Only when one agent makes a decision with the current information can it know the relevant information of its corresponding objective function, which leads to the inevitable outcome: the decision it makes is not the optimal and the difference, socalled regret, exists between makedecisions of all agents, respectively. Regret is one of the most straightforward measures of the performance of an online algorithm. Obviously, the smaller the regret, the better the performance of the algorithm. Since the implementation of the algorithm is completed after multiple iterations, we theoretically require that the regret generated by multiple iterations should gradually approach to zero along with the increasing of iterative number. That is, if the corresponding cumulative regret bound of the algorithm can be got, the regret bound should be sublinear convergence with respect to the number of iteration.
Distributed online optimization and its applications in multiagent systems have become a hot research area nowadays in the systems and control community. Inspired by the distributed dual average algorithm in [3], the authors in [9] proposed a distributed online weighted dual average algorithm for the distributed optimization problem on dynamic networks and obtained a squareroot regret bound. Yan et al. [10] introduced a distributed online projected subgradient descent algorithm and achieved a squareroot regret for convex locally cost functions and a logarithmic regret for strongly convex locally cost function. In [11], a distributed stochastic subgradient online optimization algorithm is proposed. In the case that the local objective function is convex and strongly convex, the convergence of the algorithm is proved, and the corresponding regret bounds are obtained respectively. For more references on distributed online algorithms, see [12–21].
The distributed online optimization algorithm based on the firstorder information is simple in a calculation but converges slowly in most cases. In the traditional optimization method, Newton’s method converges faster than the firstorder information when it uses the secondorder gradient information of the objective function. Some scholars have applied it to distributed optimization problems [22–24] to improve the convergence of distributed online optimization algorithm. However, these algorithms need to compute and store the Hessian matrix of the objective along with its inverse at each iteration, which will inevitably affect the efficiency. To overcome such inconvenience, inspired by the algorithm mentioned in reference [25], we propose a distributed online Newton step algorithm, which can achieve the effect of the Newton method by using the first information of the objective function.
The major contributions of this article are as follows: (i) for the distributed online optimization problem, we propose a distributed online Newton step algorithm which can not only avoid the deficiencies of calculation and storage in the process of Newton method implementation but also achieve the effectiveness of Newton’s method. In the algorithm, the firstorder gradient of the object function has been used to construct a positive definite matrix, which is similar to the inverse of the Hessian matrix in the Newton method. Moreover, the convergence of the proposed algorithm is proved theoretically, and the regret bound of the algorithm is obtained. (ii) We apply the proposed algorithm to a practical problem. The effectiveness and practicality of the algorithm is verified by numerical experiments. Meanwhile, we compare the proposed algorithm with several existing gradient descent algorithms, and the results show that the convergence rate of this algorithm is obviously faster than the gradient descent algorithms.
The rest of this paper is organized as follows: in Section 2, we discuss some closely related works about distributed Newton method. Some necessary mathematical preliminaries and assumptions which will be used in this paper are introduced in Section 3. Our algorithm is stated in Section 4, and the concrete proof of the convergence of the algorithm is presented in Section 5. The performance of the proposed algorithm is verified in comparison with several gradient descent algorithms over the practical problem in Section 6, and then the conclusion of this paper is included in Section 7.
2. Related Work
Newton and QuasiNewton methods are recognized as a class of effective algorithms in solving optimization problems. The iterative formula of Newton method is as follows:where is the inverse of Hessian matrix . Newton’s method needs to calculate the second derivative of the objective function and the Hessian matrix obtained may not be positive definite. In order to overcome these shortcomings, some scholars put forward the QuasiNewton method. The basic idea of the QuasiNewton method is to structure an approximate matrix without the second derivative of the objective function to instead of the inverse of the Hessian matrix in the Newton method. Different methods of constructing approximate matrix represent different QuasiNewtonian methods.
Although the QuasiNewton method is recognized as a more efficient algorithm, it is seldom used in a distributed environment because the distribution approximation of Newton steps is difficult to design. Mark Eisen et al. [26] proposed a decentralized QuasiNewton Method. In this method, the idea of determining the Newton direction is to utilize the inverse of the Hessian matrix to approximate the curvature of the cost function of the agent and its neighbor agents. This method has good convergence but faces storage and computational deficiencies for large data sets (The approach involves the power of matrices of sizes with being the total number of nodes and being the number of features). Aryan Mokhtari, Qing Ling, and Alejandro Ribeiro in [27] proposed a Network Newton method. In this method, a distributed computing matrix is constructed by equivalent transformation of the original problem, which can replace the original Hessian matrix, so as to realize the distributed solution of the problem. The authors proved that the algorithm can converge to the approximate value of the optimal parameter at least at the linear rate. They further demonstrated the convergence rate of the algorithm and analyzed several practical implementation matters in the literature [28]. Rasul Tutunov et al. [29] proposed a distributed Newton optimization algorithm based on consensus. By using the sparsity of the dual Hessian matrix, they reconstructed the calculation of Newton steps into the method of solving diagonally dominant linear equations and realized the distributed calculation of Newton’s method. They also theoretically proved that the algorithm has superlinear convergence similar to centralized Newton’s algorithm in the field of the optimal solution. Although these algorithms realize the distributed computation of the Newton method, they need to calculate the inverse of the Hessian, which is expensive for the algorithm.
Motivated by these observations, for the online setting, we propose a distributed Newtonstep algorithm which can achieve a convergence rate approximate to Newton’s method on the basis of distributed computing, and the inverse of the approximate Hessian matrix can be easily calculated. Numerical experimental results show that our algorithm can run significantly faster than the algorithms in [9, 10, 19] with a lower computational cost in preiteration.
3. Preliminaries
In this section, some notational conventions and basic notions are introduced first. Then, we provide a brief description distributed online optimization problem. At the same time, some concepts will be used and relevant assumptions are represented in this paper.
3.1. Some Conceptions and Assumptions
The dimension Euclidean space is denoted by , is a subset of , and represents the Euclidean norm. Strongly convex functions are defined as follows:
Definition 1. [30] Let be a differentiable function on , be the gradient of function at , and be a convex subset. Then is strictly convex on if and only iffor all .
Lemma 1 (see [25]). Function is differentiable on the set with a diameter , and is a constant. For and is concave, then when , for any , the following inequation holds:
Some notations about matrices are given to be used in our proof of the convergence of the algorithm. Denote the space of all matrices by . For a matrix , represents the entry of at the row and the column. is the transpose of . denotes the determinant of , and is the eigenvalue of the matrix . Then, the next equations are set up: , . In addition, for any vector , equation is set up.
Definition 2. [31] Matrix is positive definite, if and only if for any and ( denotes an dimensional vector where all the entries are 0), .
3.2. Distributed Online Optimization Problem
We consider a multiagent network system with multiple agents, each agent is associated with a strictly convex function (with bounded first and second derivatives) , and the function is varying over time. All the agents cooperate to solve the following general convex consensus problem:
At each round , the agent is required to generate a decision point according to its current local information as well as the information received from its immediate neighbors. Then, the adversary responds to each decision with a loss function and each agent gets the loss . The communication between agents is specified by an undirected graph , where is a vertex set, and denotes an edge set. Undirected means if then . Each agent can only communicate directly with its immediate neighbors . The goal of the agents is to seek a sequence of decision points , so that the regret with respect to each agent regarding any fixed decision in hindsightis sublinear in .
Throughout this paper, we make the following assumptions:(i)each cost function is strictly convex and twice continuous differentiable and LLipschitz on the convex set (ii) is compact and the Euclidean diameter of is bounded by (iii) is concave in the set for all and
By assumption (i), the function is convex in the set , and with some reasonable assumptions over the domains of the value of and , is concave in the set . In addition, the Lipschitz condition (i) implies that for any and any gradient , we have the following equation:
4. Distributed Online Newton Step Algorithm
For problem (4), we assume that information can be exchanged among each agent in a timely manner, that is, the network topology graph between agents is a complete graph. The communication between agents in our algorithm is modeled by a doubly stochastic symmetric , so that only if else , and for all , for all .
4.1. Algorithm
The distributed online Newton step algorithm is presented in Algorithm 1.

The projection function used in this algorithm is defined as follows:where is a positive definite matrix.
4.2. Algorithm Analysis
In this algorithm, when a decision is made by the agent with the current information, the corresponding cost function can be obtained. So we can get the gradient . Construct a symmetric positive definite matrix , then the direction of iteration is constructed by utilizing which always exists to replace the inverse of the Hessian matrix in Newton’s method. Take the linear combination of the current iteration point of agent and the current iteration point of its neighbor agent as the starting point of the new iteration along with the size , and the projection operation is used to get the next iteration point .
The calculation of and its inverse in the algorithm can be seen from Step 7, , which shows that can be computed via using the previous approximation matrix as well as the gradient at step . Therefore, we do not have to store all the gradients from the previous step iteration, at the same time, as shown by the following equation (32):
Let , and , then , the inverse of can be got simply. It is the same thing as solving for , we just use the information from the current and the previous step.
5. Convergence Analysis
Now, the main result of this paper is stated in the following.
Theorem 1. Give the sequence of and generated by Algorithm 1 for all , , and the regret with respect to agent action iswhere , , , , are constants, and .
From Theorem 1, the regret bound of Algorithm 1 is sublinear convergence with respect to iterative number , that is, . Note that, the regret bound is related to the scale of the network. Specifically, as the network grows in size, the regret bound value also increases. In addition, the value of the regret bound is also influenced by the values of parameter and the diameter of the convex set . The value of indirectly reflects the connectivity of the network implying that the smaller the value of , the smaller the regret bound of the algorithm.
To prove the conclusion of Theorem 1, we first give some lemmas and their proofs.
Lemma 2. For any fixed , let , then , and the following bound holds for any and where .
Proof. According to the assumption that the function is strictly convex and continuous differentiable in convex set , and , by Lemma 1 we can obtainSumming up over can get the conclusion of Lemma 2.
From Lemma 2, if the upper bound of the right side of the inequality can be obtained, the upper bound of the left side can be obtained, too. Therefore, we are committed to solving the upper bound of the right side of the above equation.
Lemma 3. Let , and the following bound holds for any and any ,
Proof. according to Algorithm 1, we have the following equation:sothen we can obtain the following next equation:Since is the projection of in the norm induced by , it is a well known fact that (see [25] section 3.5 Lemma 3.9)This fact together with (15) givesSumming both sides of (17) from to , we obtain the following equation:that isAccording to Algorithm 1, , thenthus we obtainDue to , thenAnd, since is positive definite, and , so . Combining (21) and (22), we can stateThus, the proof of Lemma 3 is completed.
Next, we consider the last term of (23).
Lemma 4. For any , we can get the following bound holding:
Proof. Note that,where for matrices , we denote by the inner product of these matrices as vectors in . For real numbers and the logarithm function , the Taylor expansion of in is . So , implying . An analogous fact holds for the positive definite matrices, i.e., , where denote the determinant of the matrix (see the detailed proof in [25]). This fact gives us (for convenience we denote )Since and , from the properties of matrices and determinants, we know that the largest eigenvalue of is at most. Hence and , thenCombining the above factors, we obtain the following equation:The proof of Lemma 4 is completed.
According to Algorithm 1, , where is the direction of iteration. Using the knowledge of matrix analysis, we have the following conclusions.
Lemma 5. For any ,where , , , and is the th eigenvalue of .
This conclusion gives us that when the number of iterations increases, converges to zero, which ensures the consistency of the algorithm. The detailed proof can be seen in Appendix A.
Now, we consider the norm of vector , and get the following inequation.
Lemma 6. For any , let , , thenandwhere is the size of the network, is the total number of iterations. The specific proof is represented in Appendix B.
Next, we turn our attention to the bound of the following term . By combining the knowledge of vectors and matrices, we get Lemma 7.
Lemma 7. For any , the following inequality holdswhere , ,
Proof. According to Algorithm 1, we can statewhere is a positive constant, and is the inverse of matrix . We obtain the following equation:Now, by multiplying both sides of this equation by the vector , we can obtain the following equation:then the left of (32) can be written as follows:The matrix is symmetric and positive definite, which means that , therefore we can obtain the following equation:Here, we apply the Cauchy–Schwarz inequation: , where is a Hermite matrix and is positive semidefinite.
Next, we consider the super bound of and . According to the Step 7 of Algorithm 1, , soSimilarly, we have the following equation:Combining the results of Lemmas 5 and 6, we have the following equation:thenThus, we complete the proof of Lemma 7.
Putting all these together, Theorem 1 can be proved as follows.
Proof of Theorem 1. According to the assumptions, is strictly convex, and the function is concave in when the value of is sufficiently small. Setting , combined with axioms 2–7, we can obtain the regret boundThe values of the parameters in equation (15) are the same as Theorem 1.
6. Numerical Experiment
In order to verify the performance of our proposed algorithm, we conducted a numerical experiment on an online estimation over a distributed sensor network which is mentioned in reference [9]. In a distributed sensor network, there are sensors (See Figure 1 in [9]). Each sensor is connected to one or more sensors. It is assumed that each sensor is connected to a processing unit. Finally, the processing units are integrated to obtain the best evaluation of the environment. The specific model is as follows: given a closed convex set , the observation vector represents the th sensor measurement at time which is uncertain and timevarying due to the sensor’s susceptibility to unknown environmental factors such as jamming. The sensor is assumed (not necessarily accurately) to have a linear model of the form , where is the observation matrix of sensor and for all . The objective is to find the argument that minimizes the cost function , namely,where the cost function associated with sensor is . Since the observed value changes with time , only when we calculate the value of can we get the local error of the th sensor at time . In other words, due to modeling errors and uncertainties in the environment, the local error functions are allowed to change over time.
In an offline setting, each sensor has a noisy observation for all , where is generally assumed to be (independent) white noise at time . In this case, the centralized optimal estimate for problem (37) is
While in an online setting, the white noise varies with time (see [9]). We assume (The white noise is uniformly distributed on the interval ). In the proposed distributed online algorithm, each sensor calculate based on the local information available to it and then an “oracle” reveals the cost at time step .
The performance of the proposed algorithm is discussed based on the following aspects:
6.1. The Analysis of the Algorithm Performance
The numerical experiments consist of two parts: the impact of network size on the performance of the DONS and the effect of network connectivity on the effectiveness of the algorithm iterations.
We carried out numerical experiments at n = 1, n = 2 and n = 100, respectively. Figure 1 depicts the convergence curves of the algorithm for different network sizes. According to Figure 1, it is obvious that the average regret decreases fast and the algorithm can converge on different scaled networks as the number of the agent in the network increase. Especially, when , the problem is equivalent to a centralized optimization problem, and our distributed optimization algorithm can reach the same effect as the centralized algorithm.
According to Theorem 1, the effectiveness of the algorithm is directly affected by the connectivity of the network, so we verify the algorithm under different network topology. (i) Complete graph. All the agents are connected to each other. (ii) Cycle graph. Each agent is only connected to its two immediate neighbors. (iii) Watts–Strogatz. The connectivity of random graphs is related to the average degree and connection probability. Here, let the average degree of the graph is 3 and the probability of connection is 0.6. As shown in Figure 2, DONS can lead to a significantly faster convergence on a complete graph than a cycle graph and has the similar convergence on Watts–Strogatz. The experimental result is consistent with the theoretical analysis results in this paper.
6.2. Performance Comparison of Algorithms
To verify the performance of the proposed algorithm, we compared the proposed algorithm with the class algorithms DOGD in [10], DODA in [9] and the algorithm in [19]. The parameters in these algorithms are based on their theoretical proofs. The network topology relationship among agents is complete, and the size of the network is the same . As shown in Figure 3, the presented algorithm DONS displays better performance with faster convergence and higher accuracy than DODA, DOGD, and the algorithm in [19].
7. Conclusion and Discussion
A distributed online optimization algorithm based on Newton step is proposed for a multiagent distributed online optimization problem, where the local objective function is strictly convex and quadratic continuously differentiable. In each iteration, the gradient of the current iteration point is used to construct a positive definite matrix, and then the direction of the next iteration is constructed by substituting this positive matrix for the inverse of the Hessian matrix in Newton’s method. Through theoretical analysis, the regret bound of the algorithm is obtained, and the regret bound converges sublinear with respect to the number of iterations. Numerical examples also demonstrate the feasibility and effectiveness of the proposed algorithm. Simulation results indicate significant convergence rate improvement of our algorithm relative to the existing distributed online algorithms based on firstorder methods.
Appendix
A. The Proof of Lemma 5
Proof. First, we consider the value of . Let be the eigenvalue of , then is the eigenvalue of . Applying the Schweitzer inequation (See 2.11 in [33]), we can getwhere for . Obviously, is symmetric positive definite implying that is symmetric positive definite, and . By the definition of vector’s norm, we haveFor any , ,