Abstract

A decentralized randomized coordinate descent method is proposed to solve a large-scale linearly constrained, separable resource optimization problem with selfish agent. This method has a cheap computational cost and can guarantee an improvement of selected objective function without jeopardizing the others in each iteration. The convergence rate is obtained using an alternative gap benchmark of objective value. Numerical simulations suggest that the algorithm will converge to a random point on the Pareto front.

1. Introduction and Motivation

Distributed and parallel optimization techniques have become a powerful tool in solving large-scale resource optimization problems [14]. Different from the consensus-based distributed optimization model ([2, 5]), where all agents collectively share the same decision variables, the resource optimization problem usually has a separable structure, i.e., each agent has its own decision variable and own objective function, but all the agents are weakly coupled by equality or inequality constraints [6]. The overall target is to minimize the summation of the agents’ objective functions. One typical example of this kind of model is the optimal coordination problem of the distributed energy resources [4, 5], in which each generator decides the power generation by minimizing the cost function and meeting the overall demand. Recently, Li et al. [6] extended such method to the model of a global inequality constraint.

This work is motivated by the observation that resource allocation problems in the digital age are often subject to some new limitations unseen in traditional resource allocation problems. For example, in the area of content distribution or video streaming, many researchers have explored a scheme called parallel access [7] or multiple content distribution servers [8, 9] in a bid to optimize servers’ work load and enhance users’ quality of experience (QoE). Similar problems can also arise in the so-called multihoming problem when users start to compete for resources [10, 11]. This scheme first divides the original content/video into fragments, the replicas of these fragments are stored in a number of servers (often geographically diverse), and then the users are allowed to access/download fragments from multiple servers concurrently. This scheme is characterized by the following features:(i)Large scale: the number of users could easily hit hundreds of thousands or even millions.(ii)Bandwidth limit: each server has a total bandwidth limit, so that each user is only given a quota of the total bandwidth.

However, there are some practical concerns about this scheme. Specifically, user utilization of bandwidth quota of different servers may vary due to the physical distance or the access quality of different Internet service providers. In this sense, there is a need to reallocate bandwidth quotas of the servers among users to improve user QoE. Unfortunately, there are three major difficulties underlying this optimization problem:(1)The information necessary (gradients of objective functions, etc.) for the quota optimization is scattered among users and may be only partially available at a given time (e.g., users are inactive or unresponsive).(2)Even if all the information is readily available, considering the large scale of the problem, an overall one-shot optimization could be computationally prohibitive and may lack timeliness, which is especially crucial for the enhancement of video viewing experience.(3)Considering (2), a distributed algorithm that adaptively and progressively optimizes quota allocation among users should be preferred; however, here the crux is that the users in no way can accept a deteriorating QoE during the optimizing process, i.e., the users (agents) are selfish: one would not be too happy when he knows his own QoE is compromised for the QoE of someone else or for the “greater good” of the overall system.

The goal of this paper is to develop an efficient distributed method for solving resource allocation problem with selfish agents. Since the complete information of first and second order information necessary for a one-shot optimization is usually not available in many real world large-scale applications, we develop a randomized coordinate descent (RCD) algorithm that partially updates a pair of objective functions in each iteration (the RCD algorithm coincides with the distributed network optimization model, where only a subset of nodes can communicate with each other and can be optimized in one iteration). Our algorithm can be vaguely viewed as a multiobjective (MO) extension of the single-objective randomized coordinate descent algorithm [12]. Different from the single-objective optimization problem, the MO problem aims to optimize multiple objectives simultaneously in the sense of Pareto optimality (see [13, 14]).

Our RCD method has the following prominent features: first, it optimizes the selected objective functions in each iteration without affecting other objective functions, and the optimization at each iteration is required to be such that none of the selected objective functions should deteriorate. As a result, the value of each objective function should be non-deteriorating over the whole optimizing process, and the solution of each iteration can be applied in real time to progressively improve the objective function of each agent. Second, compared to the centralized MO methods such as [15, 16], our algorithm takes full advantage of the separability of the problem structure, and the update has a very cheap cost per iterate and can be easily implemented in a parallel setting. Perhaps more significantly, this work also focuses on the convergence analysis of the multiobjective RCD algorithm. Although the convergence results of single-objective algorithms are well established ([17, 18]), similar discussions for its MO counterparts are unexpectedly scarce. It was not until very recently that a few works have succeeded in obtaining the convergence rates for certain unconstrained MO algorithms ([1921]). In fact, as will be evident in our analysis, the limiting point generated by our RCD algorithm will converge to a random point on the Pareto front; as a result, all the tools used in single-objective algorithm analysis are generally rendered useless. To conquer this difficulty and complete the missing piece, we develop a framework of convergence analysis for the RCD method, which generalizes the existing analysis for scalar optimization problem [12]. Under a mild condition, we show that the RCD algorithm has a sublinear convergence rate. If agents’ cost functions are all strongly convex, then the RCD algorithm has a linear convergence rate.

The paper is organized as follows. In Section 2, we present the constrained MO optimization problem. In Section 3, we present the RCD algorithm. An analysis of the convergence rate of this RCD algorithm for different cases is given in Section 4. Some numerical examples are given in Section 5. We conclude the paper and discuss the future research in Section 6. Throughout this paper, the following notations are used. The vector is denoted by the bold letter. We use the following notation to denote the order of the vectors. Given and , means that , for all and means that , for .

2. Problem Formulation

Consider the following standard multiobjective (MO) optimization problem with linear equality constraints:where with and is a continuous function satisfying certain conditions for (notation denotes the vector with all elements being zero). Clearly, function is separable, i.e., only affects . However, these decision variables are weakly coupled by the global equality constraint. Note that although the right-hand side of the equality constraint is 0, there is no difficulty to generalize the results to the non-zero case using some affine transformation. In the context of the above bandwidth allocation problem, here can be viewed as the bandwidth quotas of servers allocated to agent ; is a function that represents the negative of the utility (QoE) brought by quotas ; and the equality constraint represents the total bandwidth limitation of each server . Model arises in various network optimization problems (see, e.g., [2227]).

Problem aims to optimize the multiple objectives simultaneously in the sense of Pareto optimality (see [13, 14]). Indeed, a feasible point is a Pareto optimal solution of problem if there exists no other feasible solution such that and . The traditional way to solve is to aggregate objective functions into a weighted summation social welfare, say, with ; then, solve the following single-objective optimization problem using classical methods (e.g., gradient descent method, Newton method, etc.):

But this scheme suffers two major drawbacks for the reasons we stated previously: first, the large scale of the underlying problem (a very large ) may prevent this scheme from responding in a timely manner; second, this scheme does not guarantee a progressively non-deteriorating allocation evolution during the optimizing process, so that it is possible that the individual welfares of some agents get sacrificed for the “greater good,” i.e., the maximization of social welfare , and hence they could become discontent and discard the service for good. Therefore, to address these issues, we present the following randomized block coordinate descent method.

3. Randomized Block Coordinate Descent Method

3.1. The RCD Algorithm

Throughout this paper, we assume that the following condition holds true.

Assumption 1. Each has Lipschitz continuous gradient with constant , i.e., for all , it haswhere is the Euclidean norm (note that here the Lipschitz coefficient can be agent-specific, e.g., ; in this case, we can let ; on the other hand, one can also modify our algorithm scheme accordingly with respect to these agent-specific Lipschitz coefficients and obtain an improved convergence rate).
Note that the Lipschitz property implies the following inequality:for any and . Given the current solution , the classical gradient-based method needs to compute the whole Jacobian to generate a feasible step that simultaneously decreases all objectives. Obviously, this step is expensive when the problem size is large. To conquer this difficulty, we propose the following RCD method. In each iteration, we randomly select two objectives pair with probability to update their objective values for . To ensure convergence of the algorithm, we need the following condition on the sampling probability, .

Assumption 2. For any , there exists such that sampling probabilities are all strictly positive.
Graphically, one can think of a strictly positive as an “edge” between node and , and if , it means that there is no edge between node and . Assumption 2 means that the communication network among the nodes is connected, i.e., any two nodes in the network are either directly connected by an edge or indirectly connected by at least one path formed by several intermediate edges. If the network is complete, i.e., each node is directly connected to every other node, for all edges, the above condition is automatically satisfied. Some other options are available, for example, the cyclic network (see Figure 1). In a cyclic network, each node is connected in a circular fashion: . Another one is the so-called central coordinator network, in which one node is chosen as a central coordinator (see Figure 2), and the rest of the nodes are connected to this central coordinator while do not share direct connections among themselves. The implication of a connected network is that any local change can eventually ripple through the whole network instead of being contained.
Once a pair is chosen, the following problem for the convex optimization problem is solved:It is not hard to see that due to inequality (4), the solution of problem provides a descent direction to improve the objective values for both and . The key difference between and its single-objective RCD counterpart is that a single-objective RCD algorithm will aggregate (5) and (6) and attempt to minimize such aggregate cost. Unlike our multiobjective method, this single-objective scheme does not necessarily generate non-deteriorating solution for both agents. We use to denote the optimal solution of problem and use and to denote the updated solution points for the -th and -th objective functions, respectively. Let be the updated solution vector. The RCD algorithm updates the solution points as follows: , , and keeps all the others unchanged , for and . One prominent feature of the method is that problem admits explicit solution.

Lemma 1. Given and pair , the optimal solution of iswhere

Furthermore, is Pareto optimal only if the optimal solution of is for all pairs , and the reverse direction holds true if all are convex.

Proof. Checking the Karush–Kuhn–Tucker (KKT) condition of problem yieldswhere is the Lagrange multiplier. Substituting back into the constraints gives aswhereIt can be easily verified that is strictly decreasing for and is strictly increasing for ; after checking the boundary values, we obtain as in (10). Clearly, when each is convex and , it implies the Pareto optimality, i.e., for any pair , which satisfies the KKT condition of . The other direction is straightforward.
The algorithm is summarized in the following pseudo-code.
A distributed algorithm for large-scale linearly-coupled resource allocation problems with selfish agents. Algorithm 1
Since is assumed to be a large-scale problem, the computational cost of methods [15, 16] could become intimidating because an overall optimization that involves all objectives is required to compute an update direction. Also, the methods of [15, 16] would require all the objective function information to be transmitted to a central coordinating agent, where the optimization is conducted. But this could create excessive communication overheads in the central coordinating agent. In contrast, our method is able to take advantage of the separability of the problem structure, and the update has an analytical form and can be parallelized by optimizing multiple pairs concurrently; moreover, the pairwise optimization can be easily implemented in a peer-to-peer manner. The cost per iteration is , where is the maximum cost of computing the gradient of each , and is the cost of updating . When compared with the single-objective randomized coordinate descent method, the main difference is that the optimization procedure guarantees that for any pair; as a result, the update is always non-deteriorating for each of the agents and hence can be applied in real time.

Initialization: set ;
while True do
 Randomly select a matching pair according to some network structure
 Solve from problem:
   
    
     
 Update:
     
     

Remark 1. When all information is available, one may also consider to follow the idea of [15], that is, to simultaneously optimize all agents, the update directions are given by solving the following problem:However, the cost per iteration of this problem will be . Also note that generally does not have an analytical solution form; therefore, an additional algorithm for quadratic programming is needed to find its solution.

4. Convergence Rate Analysis

4.1. Convergence Rate Analysis: Non-convex Case

We investigate the convergence rate of the RCD algorithm in this section. We first introduce the potential function as . For the optimal updating vector and the associated Lagrange multiplier , the Lipschitz continuity of implieswhere the second inequality follows from the observation that . Therefore, under the preceding updating rule, we can estimate the expected decrease of potential function as

Now, we introduce the following criteria as a metric, , aswith for some , . Using , the preceding inequality (10) becomes

One can view as an improvement potential indicator. Specifically, we have the following lemma.

Lemma 2. only if is a Pareto optimal solution of , and the reverse direction holds true if each is convex.

Proof. We first prove one direction that implies is a Pareto optimal solution by contraposition. Assume that with is not Pareto optimal; then, there exist and such that , , which implies . The other direction in this lemma is straightforward. We omit the detail.
We now turn to investigate the convergence rate of the RCD algorithm. Convergence rate of methods for single-objective problem such as is usually measured with respect to the gap from its optimal value . However, different from the single objective-based optimization problem, the generated sequence of the MO algorithm may converge to different points in the Pareto optimal frontier, and there is no a priori way to determine which point the sequence converges to; this property will be further substantiated by the numerical simulations. Therefore, one needs to construct an alternative benchmark that measures the convergence rate (the construction of such alternative gap function for multiobjective optimization problems is also discussed by Dutta et al. and Tanabe et al. [2830] recently). We define the following two measures (functions):where provides a lower bound of the potential function and measures the gap between the current value of potential function and such a lower bound. These two measures have the following properties.

Lemma 3. For any , if , it has and . Furthermore, the solution is a Pareto optimal point only if , and the reverse direction holds true if each is convex.

Proof. The first half follows directly from the observation that implies . The rest of the proof is straightforward, and we omit the details.
Although are only tangential to the following convergence result for non-convex objective functions, they are pivotal to obtain convergence rate for convex and strongly convex objective functions. Then, we have the following preliminary convergence rate result for generally non-convex objective functions given as follows.

Theorem 1. If level set is bounded, then the following holds:

Proof. Taking expectation of both sides of (17) with respect to and summing up to , we arrive at the following:where the second inequality follows from Jensen’s Inequality. Since is bounded, we know that is also bounded, and hence (18) follows.
Note that in the non-convex case, the stationary point could only be a local minimum.

4.2. Convergence Rate: Convex Case

When each is convex, an accelerated convergence rate can be obtained. But first we need some preliminary results. In the same spirit of the dual norm, for a non-empty compact set , we define the dual function of on aswhere . Then, the following lemma gives the counterpart of Cauchy–Schwarz inequality with respect to the feasible set of problem .

Lemma 4. Given and a non-empty compact set , for all , the following inequality holds true:

Furthermore, there exists a number such that for all .

Proof. Since is a compact set, then is well defined. Depending on the value of , we have two cases as follows:(1)If , from (17), we know must be in form of for some and . Suppose there exists a such that , which implies for some . However, this leads to , which is a contradiction to the fact that . Therefore, implies for all such that , .(2)If , let . Clearly, it has . Then, inequality (21) follows fromwhere the second equality follows from the fact that , , and hence it has .
As for the last part of the lemma, we can compute the lower bound value aswhere is a compact set.

Theorem 2. Given an initial solution , if the level set is bounded, then the solution generated by the -th iteration of RCD algorithm satisfieswhere and is defined with respect to in Lemma 4 with . Furthermore, is bounded by

Proof. As the level set is bounded, problem always admits solution. On the other hand, the Lipschitz property in Assumption 1 implieswhich further implies the boundedness of the set . Therefore, satisfies the following inequality:Then, it haswhere the second inequality is from the convexity of , and the third inequality is from (24). Inequality (28) together with (17) givesSince it has , applying Lemma 3 to the preceding inequality givesTaking expectation of both sides of the above inequality and applying Jensen’s inequality, , it haswhere . Since , it has

4.3. Convergence Rate: Strongly Convex Case

In this section, we investigate the convergence rate when function is -strongly convex for all .

Assumption 3. There exists a such thatfor all .
Summing up leads to the strong convexity of the Lyapunov function, i.e.,

Theorem 3. Given an initial feasible solution , if level set is bounded, then the RCD algorithm satisfieswhere is defined in Lemma 4 with respect to the set .

Proof. Applying Lemma 4 to (34), it hasLetThen, one can easily verify that by using the convexity of . Maximizing the left-hand side of (36) with respect to and the right-hand side with respect to yieldswhere the second inequality follows from Lemma 4. Letting , combining with (17) yieldsTaking expectation of both sides and applying the resulting inequality iteratively leads to (35).

4.4. Convergence Rate in Probability

Again with the aid of gap benchmark , in this section, we are able to establish the convergence rate in probability for the RCD algorithm. First we introduce the following lemma [31].

Lemma 5 (see [31]). Let be a constant, consider a , and let be a non-negative, non-increasing sequence of random variables with one of the following properties:(1) for all and .(2) for all such that and is a constant.

Then, given confidence level , if property (1) holds, we can choose and

If property (2) holds, we choose

Then, we have

Proof (see [31], Theorem 1). This proof is done by applying Markov inequality.
Then, we have the following theorem that quantifies the confidence of reducing the improvement of potential function to no more than .

Theorem 4. If is bounded and each is convex, let ; then, we can choosewhere is the corresponding bounding coefficient with respect to ; if each is -strongly convex, then we may choose

Under either way, it has

Proof. This result follows directly from the observation that the gap sequence generated by RCD: , satisfies the properties mentioned in Lemma 5, according to (32) and (39).

5. Numerical Simulation

In this section, we use simulation method to demonstrate our theoretical results. Consider the following example with 10 quadratic objective functions, each objective function taking the form of (46), for for . The coefficients are randomly generated in each trial of test. The 10-th objective (agent) is selected as a central coordinator, i.e., at each iteration, one of the first 9 objectives is randomly picked with equal probability to communicate with the 10-th objective. For a given initial point , we run the algorithm 3 times and record their corresponding potential function value trajectories. The results are summarized in Figure 3.

As is clearly seen, although each of the three runs is convergent, their limiting levels of potential function value differ, indicating that they converge to different points on the Pareto front.

To further highlight this point, we run two extra sets of simulations. With the same setting as the preceding simulation (10 quadratic objectives, randomly generated ), again we fix an initial point at . For the first set of simulation, the communication network is designed to be complete, i.e., at each iterate, any pair can be chosen with equal probability. For the second test, the communication network is the same as the preceding simulation, i.e., the 10-th objective being a central coordinator and one of the rest of the objectives is randomly selected with equal probability to communicate with 10-th objective at each iterate. For each test, we run the RCD algorithm for 10 times, and the resulting value trajectories for the 10-th objective function are recorded in the following figures.

As is clearly indicated in Figure 4, the limiting levels of the 10 runs differ dramatically from each other. As for the central coordinator network (see Figure 5), the limiting level of the coordinator is relatively more stable at around 0.38. What is interesting is that when compared to the complete network with random pairing, the central coordinator somehow manages to achieve a significantly better level of objective value. Indeed, the lowest level of the 10-th objective in the complete network barely touches 0.47. This is probably because the central coordinator is involved in every single iterate; as a result, its corresponding objective is more heavily optimized.

6. Discussion and Concluding Remarks

We propose a randomized coordinate descent algorithm to solve the large-scale, linearly coupled resource allocation problem with selfish agents. This method has a low computational cost in each iteration and can guarantee convergence to the Pareto optimal solution under mild conditions, and then we derive the convergence rate of such an algorithm. As the sampling probability is fixed exogenously in the current framework, one potential extension is to identify these sampling probabilities with respect to the problem’s parameter, which may further enhance the efficiency of the algorithm. Specifically, it is known in the literature that the selfish/greedy behavior of the individual agents generally leads to efficiency loss from a systemic perspective [27, 32], and hence the design of a sampling probability that will narrow this efficiency gap would be an interesting direction of future research.

Data Availability

No data were used to support this study. The MATLAB code used in this article is available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was partially supported by the National Natural Science Foundation of China under grant nos. 71971132 and 61573244.