Table of Contents Author Guidelines Submit a Manuscript
Computational Intelligence and Neuroscience
Volume 2016, Article ID 4824072, 15 pages
http://dx.doi.org/10.1155/2016/4824072
Research Article

Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

Shan Zhong,1,2 Quan Liu,1,3,4 and QiMing Fu5

1School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215000, China
2School of Computer Science and Engineering, Changshu Institute of Technology, Changshu, Jiangsu 215500, China
3Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu 210000, China
4Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
5College of Electronic & Information Engineering, Suzhou University of Science and Technology, Jiangsu, Suzhou 215000, China

Received 29 May 2016; Revised 28 July 2016; Accepted 16 August 2016

Academic Editor: Leonardo Franco

Copyright © 2016 Shan Zhong et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, Mass, USA, 1998.
  2. M. L. Littman, “Reinforcement learning improves behaviour from evaluative feedback,” Nature, vol. 521, no. 7553, pp. 445–451, 2015. View at Publisher · View at Google Scholar · View at Scopus
  3. D. Silver, R. S. Sutton, and M. Müller, “Temporal-difference search in computer Go,” Machine Learning, vol. 87, no. 2, pp. 183–219, 2012. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  4. J. Kober and J. Peters, “Reinforcement learning in robotics: a survey,” in Reinforcement Learning, pp. 579–610, Springer, Berlin, Germany, 2012. View at Google Scholar
  5. D.-R. Liu, H.-L. Li, and D. Wang, “Feature selection and feature learning for high-dimensional batch reinforcement learning: a survey,” International Journal of Automation and Computing, vol. 12, no. 3, pp. 229–242, 2015. View at Publisher · View at Google Scholar · View at Scopus
  6. R. E. Bellman, Dynamic Programming, Princeton University Press,, Princeton, NJ, USA, 1957. View at MathSciNet
  7. D. P. Bertsekas, Dynamic Programming and Optimal Control, Athena Scientific, Belmont, Mass, USA, 1995.
  8. F. L. Lewis and D. R. Liu, Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, John Wiley & Sons, Hoboken, NJ, USA, 2012.
  9. P. J. Werbos, “Advanced forecasting methods for global crisis warning and models of intelligence,” General Systems, vol. 22, pp. 25–38, 1977. View at Google Scholar
  10. D. R. Liu, H. L. Li, and D. Wang, “Data-based self-learning optimal control: research progress and prospects,” Acta Automatica Sinica, vol. 39, no. 11, pp. 1858–1870, 2013. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  11. H. G. Zhang, D. R. Liu, Y. H. Luo, and D. Wang, Adaptive Dynamic Programming for Control: Algorithms and Stability, Communications and Control Engineering Series, Springer, London, UK, 2013. View at Publisher · View at Google Scholar · View at MathSciNet
  12. A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 38, no. 4, pp. 943–949, 2008. View at Publisher · View at Google Scholar · View at Scopus
  13. T. Dierks, B. T. Thumati, and S. Jagannathan, “Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence,” Neural Networks, vol. 22, no. 5-6, pp. 851–860, 2009. View at Publisher · View at Google Scholar · View at Scopus
  14. F.-Y. Wang, N. Jin, D. Liu, and Q. Wei, “Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ε-error bound,” IEEE Transactions on Neural Networks, vol. 22, no. 1, pp. 24–36, 2011. View at Publisher · View at Google Scholar · View at Scopus
  15. K. Doya, “Reinforcement learning in continuous time and space,” Neural Computation, vol. 12, no. 1, pp. 219–245, 2000. View at Publisher · View at Google Scholar · View at Scopus
  16. J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, “Adaptive dynamic programming,” IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 32, no. 2, pp. 140–153, 2002. View at Publisher · View at Google Scholar · View at Scopus
  17. T. Hanselmann, L. Noakes, and A. Zaknich, “Continuous-time adaptive critics,” IEEE Transactions on Neural Networks, vol. 18, no. 3, pp. 631–647, 2007. View at Publisher · View at Google Scholar · View at Scopus
  18. S. Bhasin, R. Kamalapurkar, M. Johnson, K. G. Vamvoudakis, F. L. Lewis, and W. E. Dixon, “A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems,” Automatica, vol. 49, no. 1, pp. 82–92, 2013. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  19. L. Busoniu, R. Babuka, B. De Schutter et al., Reinforcement Learning and Dynamic Programming Using Function Approximators, CRC Press, New York, NY, USA, 2010.
  20. A. G. Barto, R. S. Sutton, and C. W. Anderson, “Neuronlike adaptive elements that can solve difficult learning control problems,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 13, no. 5, pp. 834–846, 1983. View at Google Scholar · View at Scopus
  21. H. Van Hasselt and M. A. Wiering, “Reinforcement learning in continuous action spaces,” in Proceedings of the IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL '07), pp. 272–279, Honolulu, Hawaii, USA, April 2007. View at Publisher · View at Google Scholar · View at Scopus
  22. S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee, “Natural actor-critic algorithms,” Automatica, vol. 45, no. 11, pp. 2471–2482, 2009. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  23. I. Grondman, L. Buşoniu, G. A. D. Lopes, and R. Babuška, “A survey of actor-critic reinforcement learning: standard and natural policy gradients,” IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 42, no. 6, pp. 1291–1307, 2012. View at Publisher · View at Google Scholar · View at Scopus
  24. I. Grondman, L. Busoniu, and R. Babuska, “Model learning actor-critic algorithms: performance evaluation in a motion control task,” in Proceedings of the 51st IEEE Conference on Decision and Control (CDC '12), pp. 5272–5277, Maui, Hawaii, USA, December 2012. View at Publisher · View at Google Scholar · View at Scopus
  25. I. Grondman, M. Vaandrager, L. Buşoniu, R. Babuška, and E. Schuitema, “Efficient model learning methods for actor-critic control,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 42, no. 3, pp. 591–602, 2012. View at Publisher · View at Google Scholar · View at Scopus
  26. B. Costa, W. Caarls, and D. S. Menasche, “Dyna-MLAC: trading computational and sample complexities in actor-critic reinforcement learning,” in Proceedings of the Brazilian Conference on Intelligent Systems (BRACIS '15), pp. 37–42, Natal, Brazil, November 2015. View at Publisher · View at Google Scholar
  27. O. Andersson, F. Heintz, and P. Doherty, “Model-based reinforcement learning in continuous environments using real-time constrained optimization,” in Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI '15), pp. 644–650, Texas, Tex, USA, January 2015.
  28. R. S. Sutton, C. Szepesvári, A. Geramifard, and M. Bowling, “Dyna-style planning with linear function approximation and prioritized sweeping,” in Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI '08), pp. 528–536, July 2008. View at Scopus
  29. R. Faulkner and D. Precup, “Dyna planning using a feature based generative model,” in Proceedings of the Advances in Neural Information Processing Systems (NIPS '10), pp. 1–9, Vancouver, Canada, December 2010.
  30. H. Yao and C. Szepesvári, “Approximate policy iteration with linear action models,” in Proceedings of the 26th National Conference on Artificial Intelligence (AAAI '12), Toronto, Canada, July 2012.
  31. H. R. Berenji and P. Khedkar, “Learning and tuning fuzzy logic controllers through reinforcements,” IEEE Transactions on Neural Networks, vol. 3, no. 5, pp. 724–740, 1992. View at Publisher · View at Google Scholar · View at Scopus
  32. R. S. Sutton, “Generalization in reinforcement learning: successful examples using sparse coarse coding,” in Advances in Neural Information Processing Systems, pp. 1038–1044, MIT Press, New York, NY, USA, 1996. View at Google Scholar
  33. D. Silver, A. Huang, C. J. Maddison et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016. View at Publisher · View at Google Scholar
  34. V. Mnih, K. Kavukcuoglu, D. Silver et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. View at Publisher · View at Google Scholar · View at Scopus