Table of Contents Author Guidelines Submit a Manuscript
Journal of Robotics
Volume 2010 (2010), Article ID 437654, 9 pages
Research Article

Emergence of Prediction by Reinforcement Learning Using a Recurrent Neural Network

Department of Electrical and Electronic Engineering, Oita University, 700 Dannoharu, Oita 870-1192, Japan

Received 6 November 2009; Revised 1 March 2010; Accepted 17 May 2010

Academic Editor: Noriyasu Homma

Copyright © 2010 Kenta Goto and Katsunari Shibata. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. J. Schmidhuber, “Temporal-difference-driven learning in recurrent networks,” in Parallel Processing in Neural Systems and Computers, R. Eckmiller, G. Hartmann, and G. Hauske, Eds., pp. 209–212, North-Holland, Amsterdam, The Netherlands, 1990. View at Google Scholar
  2. J. Schmidhuber, “Reinforcement learning in Markovian and non-Markovian environments,” in Advances in Neural Information Processing Systems 3, (NIPS '91), D. S. Lippman, J. E. Moody, and D. S. Touretzky, Eds., pp. 500–506, Morgan Kaufmann, Denver, Colo, USA, 1991.
  3. L.-J. Lin and T. M. Mitchell, “Reinforcement learning with hidden states, from animals to animats 2,” in Proceedings of the 2nd International Conference on Simulation of Adaptive Behavior, pp. 271–280, MIT Press, Honolulu, Hawai, USA, 1993.
  4. J. Tani, “Learning to generate articulated behavior through the bottom-up and the top-down interaction processes,” Neural Networks, vol. 16, no. 1, pp. 11–23, 2003. View at Publisher · View at Google Scholar · View at Scopus
  5. M. L. Littman, R. S. Sutton, and S. Singh, “Predictive representations of state,” in Advances in Neural Information Processing Systems, vol. 14, pp. 1555–1561, MIT Press, 2002. View at Google Scholar
  6. R. S. Sutton and B. Tanner, “Temporal-difference networks,” in Advances in Neural Information Processing Systems, vol. 17, pp. 1377–1384, MIT Press, 2005. View at Google Scholar
  7. J. Schmidhuber, “A possibility for implementing curiosity and boredom in model-building neural controllers,” in Proceedings of the 1st International Conference on Simulation of Adaptive Behavior on From Animals to Animats, J. A. Meyer and S. W. Wilson, Eds., pp. 222–227, MIT Press/Bradford Books, Paris, France, 1993.
  8. M. B. Ring, “CHILD: a first step towards continual learning,” Machine Learning, vol. 28, no. 1, pp. 77–105, 1997. View at Google Scholar · View at Scopus
  9. J. Schmidhuber, “Exploring the predictable,” in Advances in Evolutionary Computing, S. Ghoshr and S. Tsutsui, Eds., pp. 579–612, Springer, London, UK, 2002. View at Google Scholar
  10. P.-Y. Oudeyer, F. Kaplan, and V. V. Hafner, “Intrinsic motivation systems for autonomous mental development,” IEEE Transactions on Evolutionary Computation, vol. 11, no. 2, pp. 265–286, 2007. View at Publisher · View at Google Scholar · View at Scopus
  11. R. A. Brooks, “Intelligence without representation,” Artificial Intelligence, vol. 47, no. 1–3, pp. 139–159, 1991. View at Google Scholar · View at Scopus
  12. P. McCracken, M. Bowling, and S. Tsutsui, “Online discovery and learning of predictive state representations,” in Advances in Neural Information Processing Systems 18, pp. 875–882, 2006. View at Google Scholar
  13. K. Shibata and T. Kawano, “Acquisition of flexible image recognition by coupling of reinforcement learning and a neural network,” SICE Journal of Control, Measurement, and System Integration, vol. 2, no. 2, pp. 122–129, 2009. View at Google Scholar · View at Scopus
  14. A. Onat, H. Kita, and Y. Nishikawa, “Q-Learning with recurrent neural networks as a controller for the inverted Pendulum problem,” in Proceedings of the 5th International Conference on Neural Information Processing (ICONIP '98), pp. 837–840, Kitakyushu, Japan, October 1998.
  15. D. Aberdeen and J. Baxter, “Scaling internal-state policy-gradient methods for POMDPs,” in Proceedings of the International Conference on Machine Learning, pp. 3–10, Las Vegas, Nev, USA, 2002.
  16. B. Bakker, V. Zhumatiy, G. Gruener, and J. Schmidhuber, “A robot that reinforcement-learns to identify and memorize important previous observations,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS '03), pp. 430–435, Las Vegas, Nev, USA, 2003.
  17. H. Utsunomiya and K. Shibata, “Contextual behaviors and internal representations acquired by reinforcement learning with a recurrent neural network in a continuous state and action space task,” in Advances in Neural Information Processing Systems, vol. 5507 of Lecture Notes in Computer Science, pp. 970–978, 2009. View at Google Scholar
  18. C. J. C. H. Watkins, “Q-learning,” Machine Learning, vol. 8, pp. 279–292, 1992. View at Google Scholar
  19. D. Rumelhart, G. Hinton, and R.. Williams, “Learning internal representations by error propagation,” in Parallel Distributed Processing, D. Rumelhart, J. McClelland, and R. Williams, Eds., vol. 1, pp. 318–362, MIT Press, Cambridge, Mass, USA, 1986. View at Google Scholar