Table of Contents Author Guidelines Submit a Manuscript
Journal of Control Science and Engineering
Volume 2015, Article ID 264953, 7 pages
http://dx.doi.org/10.1155/2015/264953
Research Article

Combining Multiple Strategies for Multiarmed Bandit Problems and Asymptotic Optimality

Department of Computer Science and Engineering, Sogang University, Seoul 121-742, Republic of Korea

Received 13 January 2015; Accepted 11 March 2015

Academic Editor: Shengwei Mei

Copyright © 2015 Hyeong Soo Chang and Sanghee Choe. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. C. Tekin and M. Liu, “Online learning methods for networking,” Foundations and Trends in Networking, vol. 8, no. 4, pp. 281–409, 2013. View at Google Scholar
  2. A. Mahajan and D. Teneketzis, “Multi-armed bandit problems,” in Foundations and Applications of Sensor Management, A. O. Hero III, D. A. Castanon, D. Cochran, and K. Kastella, Eds., pp. 121–151, Springer, New York, NY, USA, 2008. View at Publisher · View at Google Scholar
  3. C. B. Browne, E. Powley, D. Whitehouse et al., “A survey of Monte Carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 4, no. 1, pp. 1–43, 2012. View at Publisher · View at Google Scholar · View at Scopus
  4. N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games, Cambridge University Press, 2006. View at Publisher · View at Google Scholar · View at MathSciNet
  5. H. Robbins, “Some aspects of the sequential design of experiments,” Bulletin of the American Mathematical Society, vol. 58, pp. 527–535, 1952. View at Publisher · View at Google Scholar · View at MathSciNet
  6. T. Santner and A. Tamhane, Design of Experiments: Ranking and Selection, CRC Press, 1984.
  7. J. Bather, “Randomized allocation of treatments in sequential trials,” Advances in Applied Probability, vol. 12, no. 1, pp. 174–182, 1980. View at Publisher · View at Google Scholar · View at MathSciNet
  8. P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the multiarmed bandit problem,” Machine Learning, vol. 47, no. 2-3, pp. 235–256, 2002. View at Publisher · View at Google Scholar · View at Scopus
  9. J. C. Gittins and D. M. Jones, “A dynamic allocation index for sequential design of experiments,” in Progress in Statistics, vol. 1, pp. 241–266, European Meeting of Statisticians, 1972. View at Google Scholar
  10. P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, “The nonstochastic multiarmed bandit problem,” SIAM Journal on Computing, vol. 32, no. 1, pp. 48–77, 2002/03. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  11. A. Beygelzimer, J. Langford, L. Li, L. Reyzin, and R. E. Schapire, “Contextual bandit algorithms with supervised learning guarantees,” Journal of Machine Learning Research, vol. 15, pp. 19–26, 2011. View at Google Scholar · View at Scopus
  12. S. Bubeck and N. Cesa-Bianchi, “Regret analysis of stochastic and nonstochastic multi-armed bandit problems,” Foundations and Trends in Machine Learning, vol. 5, no. 1, pp. 1–122, 2012. View at Publisher · View at Google Scholar · View at Scopus
  13. H. B. McMahan and M. Streeter, “Tighter bounds for multi-armed bandits with expert advice,” in Proceedings of the 22nd Conference on Learning Theory (COLT '09), Montreal, Canada, 2009.
  14. D. P. de Farias and N. Megiddo, “Combining expert advice in reactive environments,” Journal of the ACM, vol. 53, no. 5, pp. 762–799, 2006. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  15. A. György and G. Ottucsák, “Adaptive routing using expert advice,” The Computer Journal, vol. 49, no. 2, pp. 180–189, 2006. View at Publisher · View at Google Scholar · View at Scopus
  16. V. Kuleshov and D. Precup, “Algorithms for multi-armed bandit problems,” http://arxiv.org/abs/1402.6028.
  17. J. Vermorel and M. Mohri, “Multi-armed bandit algorithms and empirical evaluation,” in Machine Learning: ECML 2005, vol. 3720 of Lecture Notes in Computer Science, pp. 437–448, Springer, Berlin, Germany, 2005. View at Publisher · View at Google Scholar
  18. J. V. Uspensky, Introduction to Mathematical Probability, McGraw-Hill, London, UK, 1937.
  19. K. Rajaraman and P. S. Sastry, “Finite time analysis of the pursuit algorithm for learning automata,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 26, no. 4, pp. 590–598, 1996. View at Publisher · View at Google Scholar · View at Scopus