Stochastic Block-Coordinate Gradient Projection Algorithms for Submodular Maximization

Li, Zhigang; Zhang, Mingchuan; Zhu, Junlong; Zheng, Ruijuan; Zhang, Qikun; Wu, Qingtao

doi:https://doi.org/10.1155/2018/2609471

Complexity

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2018 | Article ID 2609471 | https://doi.org/10.1155/2018/2609471

Stochastic Block-Coordinate Gradient Projection Algorithms for Submodular Maximization

Zhigang Li,¹Mingchuan Zhang,²Junlong Zhu,²Ruijuan Zheng,²Qikun Zhang,¹and Qingtao Wu²

Academic Editor: Mahardhika Pratama

Received25 May 2018

Accepted26 Nov 2018

Published05 Dec 2018

Abstract

We consider a stochastic continuous submodular huge-scale optimization problem, which arises naturally in many applications such as machine learning. Due to high-dimensional data, the computation of the whole gradient vector can become prohibitively expensive. To reduce the complexity and memory requirements, we propose a stochastic block-coordinate gradient projection algorithm for maximizing continuous submodular functions, which chooses a random subset of gradient vector and updates the estimates along the positive gradient direction. We prove that the estimates of all nodes generated by the algorithm converge to some stationary points with probability 1. Moreover, we show that the proposed algorithm achieves the tight approximation guarantee after iterations for DR-submodular functions by choosing appropriate step sizes. Furthermore, we also show that the algorithm achieves the tight approximation guarantee after iterations for weakly DR-submodular functions with parameter by choosing diminishing step sizes.

1. Introduction

In this paper, we focus on the submodular function maximization, which has recently attracted significant attention in academia since submodularity is a crucial concept in combinatorial optimization. Furthermore, they have arisen in a variety of areas, such as social sciences, algorithm game theory, signal processing, machine learning, and computer vision. Furthermore, submodular functions have found many applications in the applied mathematics and computer science, such as probabilistic models [1, 2], crowd teaching [3, 4], representation learning [5], data summarization [6], document summarization [7], recommender systems [8], product recommendation [9, 10], sensor placement [11], network monitoring [12, 13], the design of structured norms [14], clustering [15], dictionary learning [16], active learning [17], and the utility maximization in sensor networks [18].

In submodular optimization problems, there exist many polynomial time algorithms for exactly minimizing the submodular functions, such as combinatorial algorithms [19–21]. In addition, there also exist many polynomial time algorithms for approximately maximizing the submodular functions with approximation guarantees, such as the local search and greedy algorithms [22–25]. Despite this progress, these methods use the combinatorial techniques, which have some limitations [26]. For this reason, a new approach is proposed by using multilinear relaxation [27], which can lift the submodular functions optimization problems into the continuous domain. Thus, the continuous optimization techniques are used to minimize exactly or maximize approximately submodular functions in polynomial time. Recently, most literature is devoted to continuous submodular optimization [28–31]. The algorithms cited above need to compute all the (sub)gradients.

However, the computation of all (sub)gradients can become prohibitively expensive when dealing with huge-scale optimization problems, where the decision vectors are high-dimensional. For this reason, coordinate descent method and its variants are proposed for solving efficiently convex optimization problems [32]. At each iteration, the coordinate descent methods only choose one block of variables to update their decision vectors. Thus, they can reduce the memory and complexity requirements at each iteration when dealing with high-dimensional data. Furthermore, coordinate descent methods can be applied in support vector machine [33], large-scale optimization problems [34–37], protein loop closure [38], regression [39], compressed sensing [40], etc. In coordinate descent methods, the choice of search strategy mainly include cyclic coordinate search [41–43] and the random coordinate search [44–46]. In addition, the asynchronous coordinate decent methods are also proposed in recent years [47, 48].

Despite this progress, however, stochastic block-coordinate gradient projection methods for maximizing submodular functions have barely been investigated. To fill this gap, we propose the stochastic block-coordinate gradient projection algorithm to solve stochastic continuous submodular optimization problems, which are introduced in [30]. In order to reduce the complexity and memory requirements at each iteration, we incorporate the block-coordinate decomposition into the stochastic gradient projection in the proposed algorithm. The main contributions of this paper are as follows:(i)We propose a stochastic block-coordinate gradient projection algorithm for maximizing continuous submodular functions. In the proposed algorithm, each node chooses a random subset of the whole approximation gradient vector and updates its decision vector along gradient ascent direction.(ii)We show that each node asymptotically converges to some stationary points by the stochastic block-coordinate gradient projection algorithm; i.e., the estimates of all nodes converge to some stationary points with probability 1.(iii)We investigate the convergence rate of stochastic block-coordinate gradient projection algorithm with approximation guarantee. When the submodular functions are DR-submodular, we prove that the convergence rate of is achieved with approximation guarantee. More generally, we show that the convergence rate of is achieved with approximation guarantee for weakly DR-submodular functions with parameter .

The remainder of this paper is organized as follows. We describe mathematical background in Section 2. We formulate the problem of our interest and propose a stochastic block-coordinate gradient projection algorithm in Section 3. In Section 4, the main results of this paper are stated. The detailed proofs of the main results of the paper are provided in Section 5. The conclusion of the paper is presented in Section 6.

2. Mathematical Background

Given a ground set , which consists of elements. If a set function satisfiesfor all subsets , then the set function is called submodular. The notation of submodularity is mostly used in discrete domain, but it can be extended to continuous domain [49]. Given a subset of , , where each set is a subset of and is compact. A continuous function is called submodular continuous function if, for all , the following inequalityholds, where (coordinate-wise) and (coordinate-wise). Moreover, if , we have for all , and then the submodular continuous function is called monotone on . Furthermore, a differentiable submodular continuous function is called DR-submodular if, for all such that , we have ; i.e., is an antitone mapping [29]. When the submodular continuous function is twice differentiable, the submodular is submodular if and only if all off-diagonal components of its Hessian matrix are nonpositive [28]; i.e., for all ,Furthermore, if the submodular function is DR-submodular, then all second-derivatives are nonpositive [29]; i.e., for all ,In addition, the twice differentiability implies that the submodular is smooth [50]. Moreover, we say that a submodular function is -smooth if, , we haveNote that the above definition is equivalent toFurthermore, a function is called weakly DR-submodular function with parameter ifMore details about weak DR-submodular functions are available in [29].

3. Problem Formulation and Algorithm Design

In this section, we first describe the problem of our interest, and then we design an algorithm to efficiently solve the problem.

In this paper, we focus on the following constrained optimization problem:where denotes the constraint set, denotes an unknown distribution, is a submodular continuous function for all . Moreover, we assume that the constraint set, , is convex, where each is convex and closed set for all . The problem has recently been introduced in [30]. In addition, we use the notation to denote the optimal value of for all , i.e., . Furthermore, we can see that the function is submodular function because each function is submodular continuous function for all [28].

To solve problem (8), the projected stochastic gradient methods are a class of efficient algorithms [31]. However, we focus on the case that the decision vectors are high-dimensional in this work; i.e., the dimensionality of vectors is large. The full gradient computations are prohibitive expensive and become computational bottleneck. Therefore, we propose a stochastic block-coordinate gradient method by combining the great features of block-coordinate and stochastic gradient. We assume that the components of decision variables are arbitrarily chosen but fixed for each processor. Furthermore, at each iteration, each processor randomly chooses a subset of (stochastic) gradients, rather than all the (stochastic) gradients. The detailed description of the proposed algorithm is as follows. Starting from an initial value , for , each updates its decision variable aswhere is the step-size, denotes the Euclidean projection of on the set , are independent and identically Bernoulli random variables with for all and , and denotes the unbiased estimate of the gradient , which denotes the -th coordinate in .

We introduce the following matrix. Therefore, we can write relation (9) more compactly aswhere , and . Note that the -th coordinate of is missing when at each iteration , and then the -th coordinate of is not updated. Therefore, a random subset of is updated at each iteration . In addition, we use the notation to denote a diagonal matrix with size ; i.e., , where .

Let denote the history information of all random variables generated by the proposed algorithm (11) up to time . In this paper, we adopt the following assumption on the random variables , which is stated as follows.

Assumption 1. For all , the random variables and are independent of each other. Furthermore, the random variables are independent of and for any decision variables .

In addition, we assume that the function and the sets satisfy the following conditions.

Assumption 2. Assume that the following properties hold:
(a) The constraint set is convex, and each set is convex and closed for all .
(b) The function is monotone and weakly DR-submodular with parameter over .
(c) The function is differentiable and -smooth with respect to norm .
Next, we make the following assumption about stochastic oracle .

Assumption 3. Assume that the stochastic oracle satisfies the following conditions: and The above assumption implies that the stochastic oracle is an unbiased estimate of .

In this section, we first formulate an optimization problem, and then design an optimization method to solve it. Moreover, we also give some standard assumptions to analyze the performance of the proposed method.

4. Main Results

In this section, We first provide the performance of convergence. To this end, we first introduce the definition of a stationary point, which is defined as in [31].

Definition 4. For a vector and a function , if , then is a stationary point of over .
From Definition 4, the convergence of our proposed algorithm is given in the following theorem.

Theorem 5. Let Assumptions 1–3 hold. Assume that the set of stationary points is nonempty and . Moreover, the sequence is generated by the stochastic block-coordinate gradient projection algorithm (11). Then, the iterative sequence converges to some stationary point with probability 1.
The proof can be found in the next section. The above result shows that the iterations converge to some local maximum with probability 1.
Furthermore, when the function is differentiable and DR-submodular, we have the following result.

Theorem 6. Let Assumptions 1–3 hold. Moreover, assume in (7) and . The sequence is generated by the stochastic block-coordinate gradient projection algorithm (11). Furthermore, the random decision variable is picked by choosing , with probability and the other variables with probability . Then, for any random variable for , we havewhere , .

The proof can be found in the next section. From the above result, we can see that an objective value in expectation can be obtained after iterations of the stochastic block-coordinate gradient projection algorithm (11) for any initial value. Moreover, the objective value is at least for any DR-submodular function.

In addition, when the function is weakly DR-submodular function with parameter , we also yield the following result.

Theorem 7. Let Assumptions 1–3 hold. The sequence is generated by the stochastic block-coordinate gradient projection algorithm (11) with . Furthermore, the random decision variable is picked by choosing in with probability . Then, for any for , we havewhere , .

The proof can be found in the next section. Note that the stochastic block-coordinate gradient projection algorithm yields an objective value after iterations from any initial value. Furthermore, the expectation of the objective value is in at least for any weakly DR-submodular function.

5. Performance Analysis

In this section, the detailed proofs of main results are provided. We first analyze the convergence performance of the stochastic block-coordinate gradient projection algorithm.

Proof of Theorem 5. By the Projection Theorem [32], we havefor all . Therefore, let and in inequality (16); we obtainwhere we have used relation (11). By simple algebraic manipulations, we yieldFurthermore, when for any , at each iteration . Therefore, we haveFrom the above relation, we also obtainPlugging relation (20) into inequality (18), we haveTaking conditional expectation in (21), we havewhere we have used in the last inequality. In addition, since the function is -smooth, we haveTaking conditional expectation on in (23) and using relation (22), we obtainfor step-size . For brevity, let . Inequality (24) implies thatFrom the definition of , we have for ; i.e., the sequence of random variables is nonnegative for all . Therefore, according to the Supermartingale Convergence Theorem [51], we can see that the sequence is convergent with probability 1. Furthermore, we also havewith probability 1. From relation (11), inequality (26) implies thatwith probability 1, where . Therefore, we obtain thatwith probability 1. Thus, there exists a subsequence , which converges to . Then, we haveSince the gradient projection operation is continuous, we havewith probability 1. The above relation implies thatwith probability 1. Then, relation (31) implies that . Therefore, is a stationary point of over with probability 1. The statement of the theorem is completely proved.

To prove Theorem 6, we first present the following lemmas. The first lemma follows from [52], which is stated as follows.

Lemma 8. For all , we havefor any diagonal matrix .

The next lemma is due to [53], which is stated as follows.

Lemma 9. Assume that a function is submodular and monotone. Then, we havefor any points .

In addition, we also have the following lemma.

Lemma 10. Let Assumptions 1–3 hold. The iterative sequence is generated by the stochastic block-coordinate gradient projection algorithm (11) with step-size , where for . Then, we havewhere .

Proof. In inequality (21), we let , where denotes the diagonal matrix with the -th entry equal to 1 and the other entries equal to 0. Then, we haveFurthermore, the above relation implies thatTherefore, for any , we obtainwhere in the last inequality we have used . Since for all , setting and following from relation (23), we havewhere the last inequality is obtained by using inequality (37), Young’s inequality, and the fact that when for all . Moreover, for all . Rearranging the terms in (38), the lemma is obtained completely.

With Lemmas 8 and 10 in places, we have the following result.

Lemma 11. Let Assumptions 1–3 hold. The iterative sequence is generated by the stochastic block-coordinate gradient projection algorithm (11) with step-size . Then, for all , we havewhere .

Proof. From the result in Lemma 10, we haveIn addition, following on from Lemma 8, we also obtainwhich implies thatCombining inequalities (40) and (42), we yieldwhere the last inequality is due to . Taking conditional expectation of the above inequality on , we yieldThus, by some algebraic manipulations, inequality (39) is obtained.

Next, we start to prove Theorem 6.

Proof of Theorem 6. Setting in Lemma 11, where is the globally optimal solution for problem (8), i.e., , we haveSincetaking conditional expectation of (46) with respect to , we havewhich implies thatSetting and in Lemma 9 and taking condition expectation on , we obtainThus, plugging inequality (49) into relation (48), we getTaking expectation in (50) and using some algebraic manipulations, we havewhere we have used the relation to obtain the first inequality. Summing both sides of (51) for , we obtainwhere in the last inequality we have used the fact that and for all . On the other hand, we also havewhere and the last inequality is due to (52). Since , we havePlugging the above inequality into (53) and dividing both sides by ,where we have used the fact that in the last inequality. Furthermore, the above inequality implies thatIn addition, the sample is obtained for by choosing , with probability and the other decision vectors with probability ; we haveTherefore, the theorem is completely proved.

We now start to prove Theorem 7.

Proof of Theorem 7. From the definition of weakly DR-submodular function, for any , we obtainfor all . Recall that the following relation is from [31]; i.e., for any ,Setting and in the above inequality, then, following from (48) and taking conditional expectation, we obtainTaking expectation in (60) with respect to , we getwhere the last inequality is due to Assumption 3.
Adding the above inequalities for , we havewhere the last inequality follows from and for all . Moreover, since and , we obtainwhere we have used inequality (54) to obtain the last inequality. On the other hand, we havePlugging inequality (63) into equality (64), we getDividing both sides by in (65), we haveIn addition, we obtain the sample by choosing with probability . Then, for any , we haveTherefore, the theorem is obtained completely.

In this section, we proved the main results of the paper in detail. The conclusion of this paper is provided in the next section.

6. Conclusion

In this paper, we have considered a stochastic optimization problem of continuous submodular functions, which is an important problem in many areas such as machine learning and social science. Since the data is high-dimensional, usual algorithms based on the computation of the whole approximate gradient vector, such as stochastic gradient methods, are prohibitive. For this reason, we proposed the stochastic block-coordinate gradient projection algorithm for maximizing submodular functions, which randomly chooses a subset of the approximate gradient vector. Moreover, we studied the convergence performance of the proposed algorithm. We proved that the iterations converge to some stationary points with probability 1 by using the suitable step sizes. Furthermore, we showed that the algorithm achieves a tight approximation guarantee after when the submodular functions are DR-submodular and the suitable step sizes are used. More generally, we also showed that the algorithm achieves the tight after iterations when the submodular functions are weakly DR-submodular with parameter and the appropriate step sizes are used.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grants no. U1604155, no. 61602155, no. 61871430, no. 61772477, and no. 61572445; in part by Henan Science and Technology Innovation Project under Grant no. 174100510010; in part by the basic research projects at the University of Henan Province under Grant no. 19zx010; in part by the Ph.D. Research Fund of the Zhengzhou University of Light Industry; and in part by the Natural Science Foundation of Henan Province under Grant no. 162300410322.

References

J. Djolonga and A. Krause, “From MAP to marginals: Variational inference in Bayesian submodular models,” in Proceedings of the 28th Annual Conference on Neural Information Processing Systems 2014, NIPS 2014, pp. 244–252, Canada, December 2014.
View at: Google Scholar
R. Iyer and J. Bilmes, “Submodular point processes with applications to machine learning,” in Proceedings of the in Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, pp. 388–397, 2015.
View at: Google Scholar
A. Singla, I. Bogunovic, A. Karbasi, and A. Krause, “Near-optimally teaching the crowd to classify,” in Proceedings of the 31st International Conference on Machine Learning, pp. 154–162, 2014.
View at: Google Scholar
B. Kim, R. Khanna, and O. Koyejo, “Examples are not enough, learn to criticize! Criticism for interpretability,” in Proceedings of the 30th Annual Conference on Neural Information Processing Systems, NIPS 2016, pp. 2288–2296, Spain, December 2016.
View at: Google Scholar
V. Cevher and A. Krause, “Greedy dictionary selection for sparse representation,” IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 5, pp. 979–988, 2011.
View at: Publisher Site | Google Scholar
B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause, “Distributed submodular maximization: Identifying representative elements in massive data,” in Proceedings of the 27th Annual Conference on Neural Information Processing Systems, NIPS 2013, USA, December 2013.
View at: Google Scholar
H. Lin and J. Bilmes, “A class of submodular functions for document summarization,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011, pp. 510–520, USA, June 2011.
View at: Google Scholar
Y. Yue and C. Guestrin, “Linear submodular bandits and their application to diversified retrieval,” in Advances in Neural Information Processing Systems, pp. 2483–2491, 2011.
View at: Google Scholar
K. El-Arini, G. Veda, D. Shahaf, and C. Guestrin, “Turning down the noise in the blogosphere,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, pp. 289–297, France, July 2009.
View at: Google Scholar
B. Mirzasoleiman, A. Badanidiyuru, and A. Karbasi, “FAst coNsTrained submodular maximization: Personalized data summarization,” in Proceedings of the 33rd International Conference on Machine Learning, ICML 2016, pp. 2042–2054, USA, June 2016.
View at: Google Scholar
C. Guestrin, A. Krause, and A. P. Singh, “Near-optimal sensor placements in gaussian processes,” in Proceedings of the ICML 2005: 22nd International Conference on Machine Learning, pp. 265–272, August 2005.
View at: Publisher Site | Google Scholar
J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. Vanbriesen, and N. Glance, “Cost-effective outbreak detection in networks,” in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '07), pp. 420–429, New York, NY, USA, August 2007.
View at: Publisher Site | Google Scholar
R. Gomez, J. Leskovec, and A. Krause, “Inferring networks of diffusion and influence,” ACM Transactions on Knowledge Discovery from Data, vol. 5, no. 4, pp. 1–37, 2012.
View at: Google Scholar
F. Bach, “Structured sparsity-including norms through submodular functions,” in Advances in Neural Information Processing Systems, pp. 118–126, 2010.
View at: Google Scholar
M. Narasimhan, N. Jojic, and J. Bilmes, “Q-clustering,” in Proceedings of the 2005 Annual Conference on Neural Information Processing Systems, NIPS 2005, pp. 979–986, Canada, December 2005.
View at: Google Scholar
A. Das and D. Kempe, “Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection,” in Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pp. 1057–1064, USA, July 2011.
View at: Google Scholar
D. Golovin and A. Krause, “Adaptive submodularity: theory and applications in active learning and stochastic optimization,” Journal of Artificial Intelligence Research, vol. 42, pp. 427–486, 2011.
View at: Google Scholar | MathSciNet
Z. Zheng and N. B. Shroff, “Submodular utility maximization for deadline constrained data collection in sensor networks,” Institute of Electrical and Electronics Engineers Transactions on Automatic Control, vol. 59, no. 9, pp. 2400–2412, 2014.
View at: Publisher Site | Google Scholar | MathSciNet
S. Iwata, L. Fleischer, and S. Fujishige, “A combinatorial strongly polynomial algorithm for minimizing submodular functions,” Journal of the ACM, vol. 48, no. 4, pp. 761–777, 2001.
View at: Publisher Site | Google Scholar | MathSciNet
J. B. Orlin, “A faster strongly polynomial time algorithm for submodular function minimization,” Mathematical Programming, vol. 118, no. 2, Ser. A, pp. 237–251, 2009.
View at: Publisher Site | Google Scholar | MathSciNet
A. Schrijver, “A combinatorial algorithm minimizing submodular functions in strongly polynomial time,” Journal of Combinatorial Theory, Series B, vol. 80, no. 2, pp. 346–355, 2000.
View at: Publisher Site | Google Scholar | MathSciNet
G. Cornuejols, M. L. Fisher, and G. L. Nemhauser, “Location of bank accounts to optimize float: an analytic study of exact and approximate algorithms,” Management Science, vol. 23, no. 8, pp. 789–810, 1977.
View at: Publisher Site | Google Scholar | MathSciNet
G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher, “An analysis of approximations for maximizing submodular set functions—I,” Mathematical Programming, vol. 14, no. 1, pp. 265–294, 1978.
View at: Publisher Site | Google Scholar | MathSciNet
M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey, “An analysis of approximations for maximizing submodular set functions. II,” Mathematical Programming, no. 8, pp. 73–87, 1978.
View at: Publisher Site | Google Scholar | MathSciNet
U. Feige, V. S. Mirrokni, and J. Vondrák, “Maximizing non-monotone submodular functions,” SIAM Journal on Computing, vol. 40, no. 4, pp. 1133–1153, 2011.
View at: Publisher Site | Google Scholar | MathSciNet
C. Chekuri, J. Vondrák, and R. Zenklusen, “Submodular function maximization via the multilinear relaxation and contention resolution schemes,” SIAM Journal on Computing, vol. 43, no. 6, pp. 1831–1879, 2014.
View at: Publisher Site | Google Scholar | MathSciNet
G. Calinescu, C. Chekuri, M. Pál, and J. Vondrák, “Maximizing a monotone submodular function subject to a matroid constraint,” SIAM Journal on Computing, vol. 40, no. 6, pp. 1740–1766, 2011.
View at: Publisher Site | Google Scholar | MathSciNet
F. Bach, “Submodular functions: from discrete to continuous domains,” https://arxiv.org/abs/1511.00394v2.
View at: Publisher Site | Google Scholar
A. Bian, B. Mirzasoleiman, J. Buhmann, and A. Krause, “Guaranteed non-convex optimization: Submodular maximization over continuous domains,” https://arxiv.org/abs/1606.05615.
View at: Google Scholar
M. Karimi, M. Lucic, H. Hassani, and A. Krause, “Stochastic submodular maximization: The case of convergence functions,” in Advances in Neural Information Processing Systems, pp. 6853–6863, 2017.
View at: Google Scholar
H. Hassani, M. Soltanolkotabi, and A. Karbasi, “Gradient methods for submodular maximization,” https://arxiv.org/abs/1708.03949.
View at: Google Scholar
D. P. Bertsekas, Nonlinear Programming, Athena Scientific, Belmont, Mass, USA, 2nd edition, 1999.
View at: MathSciNet
K.-W. Chang, C.-J. Hsieh, and C.-J. Lin, “Coordinate descent method for large-scale L2-loss linear support vector machines,” Journal of Machine Learning Research (JMLR), vol. 9, pp. 1369–1398, 2008.
View at: Google Scholar | MathSciNet
S. J. Wright, “Accelerated block-coordinate relaxation for regularized optimization,” SIAM Journal on Optimization, vol. 22, no. 1, pp. 159–186, 2012.
View at: Publisher Site | Google Scholar | MathSciNet
P. Tseng and S. Yun, “A coordinate gradient descent method for nonsmooth separable minimization,” Mathematical Programming—Series B, vol. 117, no. 1, pp. 387–423, 2009.
View at: Publisher Site | Google Scholar | MathSciNet
P. Tseng and S. Yun, “Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization,” Journal of Optimization Theory and Applications, vol. 140, no. 3, pp. 513–535, 2009.
View at: Publisher Site | Google Scholar | MathSciNet
S. Yun and K.-C. Toh, “A coordinate gradient descent method for l₁-regularized convex minimization,” Computational Optimization and Applications, vol. 48, no. 2, pp. 273–307, 2011.
View at: Publisher Site | Google Scholar
A. A. Canutescu and R. L. Dunbrack Jr., “Cyclic coordinate descent: A robotics algorithm for protein loop closure,” Protein Science, vol. 12, no. 5, pp. 963–972, 2003.
View at: Publisher Site | Google Scholar
T. T. Wu and K. Lange, “Coordinate descent algorithms for lasso penalized regression,” The Annals of Applied Statistics, vol. 2, no. 1, pp. 224–244, 2008.
View at: Publisher Site | Google Scholar | MathSciNet
Y. Li and S. Osher, “Coordinate descent optimization for l¹ minimization with application to compressed sensing; a greedy algorithm,” Inverse Problems and Imaging, vol. 3, no. 3, pp. 487–503, 2009.
View at: Publisher Site | Google Scholar
Z. Q. Luo and P. Tseng, “On the convergence of the coordinate descent method for convex differentiable minimization,” Journal of Optimization Theory and Applications, vol. 72, no. 1, pp. 7–35, 1992.
View at: Publisher Site | Google Scholar | MathSciNet
P. Tseng, “Convergence of a block coordinate descent method for nondifferentiable minimization,” Journal of Optimization Theory and Applications, vol. 109, no. 3, pp. 475–494, 2001.
View at: Publisher Site | Google Scholar | MathSciNet
P.-W. Wang and C.-J. Lin, “Iteration complexity of feasible descent methods for convex optimization,” Journal of Machine Learning Research (JMLR), vol. 15, pp. 1523–1548, 2014.
View at: Google Scholar | MathSciNet
Y. Nesterov, “Efficiency of coordinate descent methods on huge-scale optimization problems,” SIAM Journal on Optimization, vol. 22, no. 2, pp. 341–362, 2012.
View at: Publisher Site | Google Scholar | MathSciNet
P. Richtárik and M. Takác, “Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function,” Mathematical Programming, vol. 144, no. 1-2, Ser. A, pp. 1–38, 2014.
View at: Publisher Site | Google Scholar | MathSciNet
Z. Lu and L. Xiao, “On the complexity analysis of randomized block-coordinate descent methods,” Mathematical Programming, vol. 152, no. 1-2, Ser. A, pp. 615–642, 2015.
View at: Publisher Site | Google Scholar | MathSciNet
J. Liu, S. J. Wright, C. R\'E, V. Bittorf, and S. Sridhar, “An asynchronous parallel stochastic coordinate descent algorithm,” Journal of Machine Learning Research (JMLR), vol. 16, pp. 285–322, 2015.
View at: Google Scholar | MathSciNet
J. Liu and S. J. Wright, “Asynchronous stochastic coordinate descent: parallelism and convergence properties,” SIAM Journal on Optimization, vol. 25, no. 1, pp. 351–376, 2015.
View at: Publisher Site | Google Scholar | MathSciNet
S. Fujishige, Submodular functions and optimization, vol. 2nd of Annals of Discrete Mathematics, North-Holland Publishing, Amsterdam, Netherlands, 2005.
View at: MathSciNet
C. Chekuri, T. S. Jayram, and J. Vondrak, “On multiplicative weight updates for concave and submodular function maximization,” in Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, pp. 201–210, ACM.
View at: Google Scholar | MathSciNet
B. T. Polyak, Introduction to Optimization, Optimization Software, New York, NY, USA, 1987.
View at: MathSciNet
C. Singh, A. Nedic, and R. Srikant, “Random block-coordinate gradient projection algorithms,” in Proceedings of the 2014 53rd IEEE Annual Conference on Decision and Control, CDC 2014, pp. 185–190, USA, December 2014.
View at: Google Scholar
C. Chekuri, J. Vondrak, and R. Zenklusen, “Submodular function maximization via the multilinear relaxation and contention resolution schemes,” in Proceedings of the Forty-third Annual ACM Symposium on Theory of Computing, pp. 783–792, ACM, 2011.
View at: Publisher Site | Google Scholar | MathSciNet

Copyright

Copyright © 2018 Zhigang Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

648

Downloads

907

Citations