Table of Contents Author Guidelines Submit a Manuscript
Computational Intelligence and Neuroscience
Volume 2016 (2016), Article ID 4824072, 15 pages
Research Article

Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

Shan Zhong,1,2 Quan Liu,1,3,4 and QiMing Fu5

1School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215000, China
2School of Computer Science and Engineering, Changshu Institute of Technology, Changshu, Jiangsu 215500, China
3Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu 210000, China
4Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
5College of Electronic & Information Engineering, Suzhou University of Science and Technology, Jiangsu, Suzhou 215000, China

Received 29 May 2016; Revised 28 July 2016; Accepted 16 August 2016

Academic Editor: Leonardo Franco

Copyright © 2016 Shan Zhong et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


To improve the convergence rate and the sample efficiency, two efficient learning methods AC-HMLP and RAC-HMLP (AC-HMLP with -regularization) are proposed by combining actor-critic algorithm with hierarchical model learning and planning. The hierarchical models consisting of the local and the global models, which are learned at the same time during learning of the value function and the policy, are approximated by local linear regression (LLR) and linear function approximation (LFA), respectively. Both the local model and the global model are applied to generate samples for planning; the former is used only if the state-prediction error does not surpass the threshold at each time step, while the latter is utilized at the end of each episode. The purpose of taking both models is to improve the sample efficiency and accelerate the convergence rate of the whole algorithm through fully utilizing the local and global information. Experimentally, AC-HMLP and RAC-HMLP are compared with three representative algorithms on two Reinforcement Learning (RL) benchmark problems. The results demonstrate that they perform best in terms of convergence rate and sample efficiency.