Table of Contents Author Guidelines Submit a Manuscript
Abstract and Applied Analysis
Volume 2009, Article ID 103723, 17 pages
http://dx.doi.org/10.1155/2009/103723
Research Article

Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces

1Department of Mathematics, Ningbo University, Ningbo 315211, China
2Department of Mathematics, Honghe University, Mengzi 661100, China
3The College of Mathematics and Computing Science, Changsha University of Science and Technology, Changsha 410076, China

Received 24 June 2009; Accepted 9 December 2009

Academic Editor: Nikolaos Papageorgiou

Copyright © 2009 Quanxin Zhu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. R. A. Howard, Dynamic Programming and Markov Processes, The Technology Press of M.I.T., Cambridge, Mass, USA, 1960. View at MathSciNet
  2. R. Dekker, “Counter examples for compact action Markov decision chains with average reward criteria,” Communications in Statistics, vol. 3, no. 3, pp. 357–368, 1987. View at Google Scholar · View at MathSciNet
  3. M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, John Wiley & Sons, New York, NY, USA, 1994. View at MathSciNet
  4. P. J. Schweitzer, “On undiscounted Markovian decision processes with compact action spaces,” RAIRO—Operations Research, vol. 19, no. 1, pp. 71–86, 1985. View at Google Scholar · View at MathSciNet
  5. E. V. Denardo and B. L. Fox, “Multichain Markov renewal programs,” SIAM Journal on Applied Mathematics, vol. 16, pp. 468–487, 1968. View at Google Scholar · View at MathSciNet
  6. X. P. Guo and O. Hernández-Lerma, “Drift and monotonicity conditions for continuous-time controlled Markov chains with an average criterion,” IEEE Transactions on Automatic Control, vol. 48, no. 2, pp. 236–245, 2003. View at Publisher · View at Google Scholar · View at MathSciNet
  7. X. P. Guo and X. R. Cao, “Optimal control of ergodic continuous-time Markov chains with average sample-path rewards,” SIAM Journal on Control and Optimization, vol. 44, no. 1, pp. 29–48, 2005. View at Publisher · View at Google Scholar · View at MathSciNet
  8. O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, vol. 42 of Applications of Mathematics, Springer, New York, NY, USA, 1999. View at MathSciNet
  9. O. Hernández-Lerma and J. B. Lasserre, “Policy iteration for average cost Markov control processes on Borel spaces,” Acta Applicandae Mathematicae, vol. 47, no. 2, pp. 125–154, 1997. View at Publisher · View at Google Scholar · View at MathSciNet
  10. A. Hordijk and M. L. Puterman, “On the convergence of policy iteration in finite state undiscounted Markov decision processes: the unichain case,” Mathematics of Operations Research, vol. 12, no. 1, pp. 163–176, 1987. View at Publisher · View at Google Scholar · View at MathSciNet
  11. J. B. Lasserre, “A new policy iteration scheme for Markov decision processes using Schweitzer's formula,” Journal of Applied Probability, vol. 31, no. 1, pp. 268–273, 1994. View at Google Scholar · View at MathSciNet
  12. S. P. Meyn, “The policy iteration algorithm for average reward Markov decision processes with general state space,” IEEE Transactions on Automatic Control, vol. 42, no. 12, pp. 1663–1680, 1997. View at Publisher · View at Google Scholar · View at MathSciNet
  13. M. S. Santos and J. Rust, “Convergence properties of policy iteration,” SIAM Journal on Control and Optimization, vol. 42, no. 6, pp. 2094–2115, 2004. View at Publisher · View at Google Scholar · View at MathSciNet
  14. Q. X. Zhu, “Average optimality for continuous-time Markov decision processes with a policy iteration approach,” Journal of Mathematical Analysis and Applications, vol. 339, no. 1, pp. 691–704, 2008. View at Publisher · View at Google Scholar · View at MathSciNet
  15. A. Y. Golubin, “A note on the convergence of policy iteration in Markov decision processes with compact action spaces,” Mathematics of Operations Research, vol. 28, no. 1, pp. 194–200, 2003. View at Publisher · View at Google Scholar · View at MathSciNet
  16. X. P. Guo and U. Rieder, “Average optimality for continuous-time Markov decision processes in Polish spaces,” The Annals of Applied Probability, vol. 16, no. 2, pp. 730–756, 2006. View at Publisher · View at Google Scholar · View at MathSciNet
  17. Q. X. Zhu, “Average optimality inequality for continuous-time Markov decision processes in Polish spaces,” Mathematical Methods of Operations Research, vol. 66, no. 2, pp. 299–313, 2007. View at Publisher · View at Google Scholar · View at MathSciNet
  18. Q. X. Zhu and T. Prieto-Rumeau, “Bias and overtaking optimality for continuous-time jump Markov decision processes in Polish spaces,” Journal of Applied Probability, vol. 45, no. 2, pp. 417–429, 2008. View at Publisher · View at Google Scholar · View at MathSciNet
  19. R. B. Lund, S. P. Meyn, and R. L. Tweedie, “Computable exponential convergence rates for stochastically ordered Markov processes,” The Annals of Applied Probability, vol. 6, no. 1, pp. 218–237, 1996. View at Publisher · View at Google Scholar · View at MathSciNet
  20. I. I. Gīhman and A. V. Skorohod, Controlled Stochastic Processes, Springer, New York, NY, USA, 1979. View at MathSciNet
  21. Q. X. Zhu and X. P. Guo, “Markov decision processes with variance minimization: a new condition and approach,” Stochastic Analysis and Applications, vol. 25, no. 3, pp. 577–592, 2007. View at Publisher · View at Google Scholar · View at MathSciNet
  22. Q. X. Zhu and X. P. Guo, “Another set of conditions for Markov decision processes with average sample-path costs,” Journal of Mathematical Analysis and Applications, vol. 322, no. 2, pp. 1199–1214, 2006. View at Publisher · View at Google Scholar · View at MathSciNet
  23. Q. X. Zhu and X. P. Guo, “Another set of conditions for strong n(n=1,0) discount optimality in Markov decision processes,” Stochastic Analysis and Applications, vol. 23, no. 5, pp. 953–974, 2005. View at Publisher · View at Google Scholar · View at MathSciNet
  24. M. Schäl, “Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal,” Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, vol. 32, no. 3, pp. 179–196, 1975. View at Google Scholar · View at MathSciNet