A semimartingale characterization of average optimal stationary policies for Markov decision processes

Zhu, Quanxin; Guo, Xianping

doi:https://doi.org/10.1155/JAMSA/2006/81593

International Journal of Stochastic Analysis

On this page

Abstract References Copyright Related Articles

Open Access

Volume 2006 | Article ID 081593 | https://doi.org/10.1155/JAMSA/2006/81593

A semimartingale characterization of average optimal stationary policies for Markov decision processes

Quanxin Zhu¹and Xianping Guo²

Received30 Nov 2004

Revised10 Jun 2005

Accepted22 Jun 2005

Published13 Apr 2006

Abstract

This paper deals with discrete-time Markov decision processes with Borel state and action spaces. The criterion to be minimized is the average expected costs, and the costs may have neither upper nor lower bounds. In our former paper (to appear in Journal of Applied Probability), weaker conditions are proposed to ensure the existence of average optimal stationary policies. In this paper, we further study some properties of optimal policies. Under these weaker conditions, we not only obtain two necessary and sufficient conditions for optimal policies, but also give a “semimartingale characterization” of an average optimal stationary policy.

References

A. Arapostathis, V. S. Borkar, E. Fernández-Gaucherand, M. K. Ghosh, and S. I. Marcus, “Discrete-time controlled Markov processes with average cost criterion: a survey,” SIAM Journal on Control and Optimization, vol. 31, no. 2, pp. 282–344, 1993.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
C. Derman, Finite State Markovian Decision Processes, vol. 67 of Mathematics in Science and Engineering, Academic Press, New York, 1970.
View at: Zentralblatt MATH | MathSciNet
E. B. Dynkin and A. A. Yushkevich, Controlled Markov Processes, vol. 235 of Fundamental Principles of Mathematical Sciences, Springer, Berlin, 1979.
View at: Zentralblatt MATH | MathSciNet
E. Gordienko and O. Hernández-Lerma, “Average cost Markov control processes with weighted norms: existence of canonical policies,” Applicationes Mathematicae, vol. 23, no. 2, pp. 199–218, 1995.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
X. P. Guo, J. Y. Liu, and K. Liu, “Nonhomogeneous Markov decision processes with Borel state space—the average criterion with nonuniformly bounded rewards,” Mathematics of Operations Research, vol. 25, no. 4, pp. 667–678, 2000.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
X. P. Guo and P. Shi, “Limiting average criteria for nonstationary Markov decision processes,” SIAM Journal on Optimization, vol. 11, no. 4, pp. 1037–1053, 2001.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
X. P. Guo and Q. X. Zhu, “Average optimality for Markov decision processes in Borel spaces: a new condition and approach,” to appear in Journal of Applied Probability.
View at: Google Scholar
O. Hernández-Lerma, Adaptive Markov Control Processes, vol. 79 of Applied Mathematical Sciences, Springer, New York, 1989.
View at: Zentralblatt MATH | MathSciNet
O. Hernández-Lerma and J. B. Lasserre, Discrete-Time Markov Control Processes. Basic Optimality Criteria, vol. 30 of Applications of Mathematics (New York), Springer, New York, 1996.
View at: MathSciNet
O. Hernández-Lerma and J. B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, vol. 42 of Applications of Mathematics (New York), Springer, New York, 1999.
View at: Zentralblatt MATH | MathSciNet
R. A. Howard, Dynamic Programming and Markov Processes, John Wiley & Sons, New York, 1960.
View at: Zentralblatt MATH | MathSciNet
M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, John Wiley & Sons, New York, 1994.
View at: Zentralblatt MATH | MathSciNet
L. I. Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability and Statistics: Applied Probability and Statistics, John Wiley & Sons, New York, 1999.
View at: Zentralblatt MATH | MathSciNet
L. I. Sennott, “Average reward optimization theory for denumerable state spaces,” in Handbook of Markov Decision Processes, E. A. Feinberg and A. Shwartz, Eds., vol. 40 of Internat. Ser. Oper. Res. Management Sci., pp. 153–172, Kluwer Academic, Massachusetts, 2002.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
Q. X. Zhu and X. P. Guo, “Another set of condition for strong $n$ $(n = - 1,0)$ discount optimality in Markov decision processes,” Stochastic Analysis and Applications, vol. 23, no. 5, pp. 953–974, 2005.
View at: Google Scholar | MathSciNet
Q. X. Zhu and X. P. Guo, “Unbounded cost Markov decision processes with limsup and liminf average criteria: new conditions,” Mathematical Methods of Operations Research, vol. 61, no. 3, pp. 469–482, 2005.
View at: Google Scholar

Copyright

Copyright © 2006 Quanxin Zhu and Xianping Guo. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation Order printed copies

Views

142

Downloads

470

Citations