A Consumption and Investment Problem via a Markov Decision Processes Approach with Random Horizon
This work is devoted to a consumption and investment problem, in which there is an investor with certain initial wealth with the possibility of deciding how much of such wealth will be consumed and how much will be invested in each of a series of successive times. The key issue is to find a wealth assignation rule in order to maximize the performance criteria; such dilemma will be achieved by the dynamic programming technique for the Markov decision processes with random horizon.
Markov decision processes (MDPs) provide a very useful system for creating and implementing a decision-making process whose results are partially random. MDPs are useful stochastic processes for boarding a wide range of optimization problems of continuous or discrete nature (In this paper, it will be only considered the discrete framework). In all the sequel, at each step, the process is in some state and the decision maker may choose any action that is available for such a state. The process responds at the next stage by randomly moving to a new state and giving a reward to the decision maker. The central problem of MDPs is to find an “optimal policy”; i.e., a function that specifies some mechanism for selecting actions optimally at each stage.
MDPs can be solved by dynamic programming. For example, in , a comprehensive and theoretical treatment of the mathematical foundations of optimal stochastic control of discrete-time systems is given; meanwhile, in , interest is mostly limited to MDPs with a Borel space of states and possibly unlimited costs. In , it is explained that the theory of the stochastic dynamic programming method is easily applicable to many practical problems, even for nonstationary models.
However, there exist another method which may be considered for solving stochastic optimization problems, for example: In , an emended minimax method is developed based on the semi-autonomized multiobjective optimization algorithm by amending the classical minimax method, which leads to desirable optimal values in the certitude state and to find another Pareto optimal solution under fuzziness in the incertitude state. In , the author focuses on one general fractile criterion iterative-interactive optimization process in order to obtain the preferable Pareto optimal solution, subject to a specified main objective function to multiobjective stochastic linear programming problems in a fuzzy environment. In addition, in , one real-life-based cost-effective and customer-centric closed-loop supply chain management model is considered together with the T set that represents the inherent impreciseness to objective functions which conducts to find that optimal values are superior than stipulated goals to both the objective functions in the T environment. In , the effects of setup cost reduction and quality improvement in a two-echelon supply chain model with deterioration are developed. The objective is to minimize the total cost of the entire supply chain model by simultaneously optimizing setup cost, process quality, number of deliveries, and lot size. In , a set of very interesting situations coming from mobile and wireless networks, connection management, and Internet is considered in which optimal decisions are required and it is necessary to provide a side view about control problems and the theory behind them.
In this paper, the possibility will be taken into account that external factors may force the process to be completed earlier than planned. In this way, it is necessary to consider the horizon as a random variable, which may be independent of the state-action space . Such an idea has been explored; for example, in , where the optimal selection strategy for the Armed Bandit paradigm with random horizon and possibly random discount factors is found.
Hence, it will be considered and investor with certain initial capital which; in each of a random number of times, may reinvest into risky assets, consume or invest in a risk-free bond. The goal is to conceive a strategy of consumption and investment in order to maximize the expected sum of an utility coming from; exclusively, the spent capital at each stage. Hence, in this paper, via the theory of MDPs with a finite random horizon, an optimal policy of consumption and investment will be established, in the case in which utility function responsible for evaluating consumption is of the exponential type. Although these kind of utility functions are rather classical, they are as well useful since such functions consider a constant absolute aversion to risk and are the only risk-averse increasing utility function whose risk premium is invariant with respect to wealth [11, 12]. The term risk aversion refers to the preference for stochastic realizations with limited deviation from the expected value. In risk-averse optimal control, one may prefer a policy with a higher cost in expectation but lower deviations than one with a lower cost but possibly higher deviations .
This work is organized as follows: in section two, fundamental ideas around MDP with random horizon jointly with an equivalence between performance criteria of an MDP with a finite random horizon and the one associated with an MDP with deterministic horizon are analyzed. Section three will be addressed to a consumption and investment scenario with random horizon together with a numerical experiment. Additionally, this section contains the main contribution of this text: the finding of the optimal policy for the consumption and investment problem with a finite random horizon and exponential utility function. Finally, two appendices are included. On the one hand, Appendix A deals with basic definitions around the MDP theory together with some useful assumptions for solving the consumption and investment problem by means of the dynamic programming technique. On the other hand, the concept of the financial market will be discussed in Appendix B.
2. Markov Decision Process with a Random Horizon
In literature, one may find references where discrete-time control problems with a random horizon are discussed, for example: [9, 10]. Therefore, let be a discrete random variable associated with some probability space . Suppose that the mass function of is known and given by , with , where is a natural number or . Consider now a Markov decision model and define the following performance criteria:where , , and denotes the expected value with respect to the joint distribution of the process and . In order to introduce the corresponding optimal control problem, we define the optimal value function as follows:
In this way, the optimal control problem with a random horizon consists of finding a policy such that , . The following assumption will be considered for simplifying the performance criteria under a discrete random horizon .
Assumption 1. For each and induced process is independent of .
Thus, the optimal control problem with a random horizon is equivalent to the optimal control problem with planning horizon , a nonhomogeneous reward function and an equally zero terminal reward. Hence, Theorem A.6 may be considered under conditions of Assumption A.5. An alternative approach discussed in  considers a different set of assumption on the reward function (which remain fixed at each stage) and the transition kernel .
3. Consumption and Investment Problem with a Random Horizon
An investor has an initial wealth of and at the beginning of at most periods (implicitly a random horizon with support on is contemplated) he/she can determine which part of his/her wealth will be consumed and which part will be invested in the financial market given in Appendix B. Amount denotes the consumed amount at time and will be evaluated by an utility function . Remaining money will be invested in risky assets and in a risk-free bond. Terminal riches is judged via another utility function . The main problem is designing a strategy of sequential decisions of consumption and investment in order to maximize the sum of his/her expected gains.
In all the sequel, Assumption B.9 will be supposed and no arbitrage opportunities are available. In addition, it is supposed that the domain of both utility functions and is . In this context, dynamics of the wealth is as follows:where is a consumption and investment strategy, i.e., and are adapted, and .
The consumption and investment problem previously described may be associated with a Markov decision model with the following components: , , for , where denotes the relative risk, is the transition function, is the distribution of , is the reward function, is the terminal reward function, is the random horizon with support on .
In this framework, value function is defined as follows:where the supremum is taken over all policies with .
Sufficient conditions will be given to propose the solution of the consumption and investment problem with a random horizon with a finite support. Under assumption 1, the proof of the following result can be obtained by using Theorem A.6 and the wealth dynamics given in (4) . Its conclusion allows to associate a MDP with a random horizon with support on with another MDP with a nonhomogenous reward function, deterministic horizon , and equally zero terminal cost.
Theorem 1. In the multiperiodic consumption and investment problem, we define functions on by the following:, there exist maximizers of and the strategy is optimal for the consumption and investment problem.
3.1. Exponential Utility Function
This section deals with a version of the consumption and investment problem with exponential utility functions and a finite random horizon. In this setting, the process that describes the evolution of investor’s capital may end before some fixed horizon due to external causes. However, Assumption 1 prevents the decision maker to finish such process because of bad investments, which may lead to drop this process below zero.
Utility functions arise naturally in economics and finance, for example:
On the mean-variance approach of Merton and Samuelson, it has already found that a quadratic utility provides a closed-form solution for the portfolio selection under very general conditions; however, on the case of power and the exponential utility function, there is no possibility to find closed-form solutions without information on the distribution of the return process . In addition, in , by assuming that a portfolio’s returns follow an approximate log-normal distribution, the closed-form expressions of the optimal portfolio weights were obtained for both power and logarithmic utility functions.
In portfolio optimization, in order to maximize the widespread logarithmic utility of some investor, assets whose prices depend on their past values in a non-Markovian way are taken into account . On the same topic , Chapter 9 provides a very interesting contribution on the treatment of utility functions, in particular the risk aversion is deeply addressed.
On a similar matter, in , it is possible to review a self-contained survey of utility functions (exponential and power utilities of the first and second kind) together with some of their applications in finance. This reference also discusses the Pareto optimal risk exchanges and presents very illustrative examples dealing with earlier mentioned utility functions.
Exponential utility functions are widely employed because they consider a constant absolute aversion to risk [20–22], and they are the only risk-averse increasing utility functions whose risk premium is invariant respect to wealth [11, 12]. The term risk aversion refers to the preference for stochastic realizations with limited deviation from the expected value. In risk-averse optimal control, one may prefer a policy with a higher cost in expectation but lower deviations than one with lower cost but possibly higher deviations . In addition, from the technical point of view, if both utility functions and are of the following form:then they are bounded superiorly, and hence, Assumption B.9 is directly satisfied.
The following assumption is needed by ensuring that optimal consumptions do not exceed available capital.
Assumption 2. For each and , suppose thatwherewith , , , andwhich exists, thanks to Theorem 4.1.1 of .
Then, it is possible to deduce the following result for the consumption and investment problem with a (finite) random horizon and risk-averse increasing utility function whose risk premium is invariant with respect to wealth (exponential utility function). This theorem provides a mechanism for acting optimally with respect to the consumption and risk investments at each stage, such optimal decisions come from the optimization of equations expressed in Theorem 1 which are relatively simple; in this case, thanks to the definitions of coefficients and .
Theorem 2. Suppose that both and are exponential utility functions of the form (5) with . Then, it holds that:, where , , and are given by equations (9)–(11), respectively. In addition, optimal consumption at stage is , whereand the optimal investment at stage is , where is the solution of (11), which may be found via where is the minimum point of
Remark 1. Preceding theorem supplies the pursued consumption and investment optimal policy which is such thatwith and as in Theorem 2.
Proof 1. First of all, consider the following sets of functions in order to test Assumption A.5(i) and , and both belong to .(ii)If , then it is straightforward to see that . Assume now that, for . In consequence, Consider the transformation: and , hence: where is given in (11).(iii)The existence of a maximizer in the set will be proven. For this, we examine the real function stated as It is possible to discover its maximum through standard optimization techniques. Hence, from Assumption 2, it is observed that the only critical point of in is as follows: which is a relative maximum via the criterion of the second derivative. By substituting the value of in , it is found that: Therefore, the maximizer for is of the form and (iv)Expressions for and their corresponding maximizers by utilizing Theorem 1 will be attained.For each ,By an inductive process and following essentially the same lines as those in and , it may be found for that:where and are expressed in (9) and (10). Additionally, optimal consumption is given by with:and the optimal investment is , where is the solution of
Remark 2. The earlier theorem states that under its own assumptions, it is possible to explicitly find the optimal strategy; hence, it is not necessary to perform numerical methods for solving the dynamic programming equation at each stage. However, if this was not the case, there exist several papers dealing with complexity of solution algorithms for MDPs for finite state and action spaces. Nevertheless, we refer to the reader to the contribution of Chow and Tsitsiklis  where tight lower bounds on the computational complexity of dynamic programming for the case where the state space is continuous, and the problem is to be solved approximately, within a specified accuracy. On the same direction, Section 12.5 of  is also relevant for the framework studied in this article.
3.2. Numerical Example
In this section, results of Section 3.1 will be illustrated. For this, consider (two risky assets) and that distribution of relative risk random vectors may be approximated by a bivariate normal distribution with parameters and . In this case, it may be found that, for each
For the sake of simplicity, we consider that random vectors are independent and identically distributed with parameters and . Additionally, set for having a not so flat utility function; over the horizontal axis (bigger values of lead to utility functions closer to the x-axis), in addition a constant interest rate will be contemplated: . Finally, it will be established that a random horizon will behave as a binomially distribution with parameters 10 and 0.5.
Now, we split the simulation example into two stages:
3.2.1. Stage I: Before Implementing Dynamics of the Wealth Process
At this stage, it is possible to find corresponding values of and . Given the simplifications considered above, it is achievable to find constant values of
And the values of and (that will help on construction of optimal consumptions) observed in Table 1.
This is possible since these parameters do not depend on the initial capital or the wealth process.
3.2.2. Stage II: Performing the Wealth Process
At this stage, the initial capital becomes relevant; hence, Tables 2–4 expose the evolution of relative and absolute optimal consumptions as well as a trajectory of with an initial capital equal , and . A decreasing behavior is observed on the wealth process accompanied by an increasing behavior of relative consumption, which leads to an “almost” constant performance of absolute consumption. On the same fashion, Figures 1–3 illustrate same observations but allow to compare trajectories of , and .
In Figure 4, the dynamics of relative consumption are observed by considering different values of with remaining parameters fixed as before. It may be noticed that practically, no effect is observed.
In Figure 5, sensibility analysis of optimal relative consumption with respect to interest rate is schematized. It is possible to appreciate an increasing performance as the constant interest rate increases; nevertheless, Figure 6 exhibits a similar shape on absolute optimal consumptions than Figure 3. In addition, (27) implies a decreasing behavior of optimal investments.
Furthermore, Figure 7 shows a concave behavior on wealth dynamics as interest rate increases, however, one may remark that: naturally, wealth is bigger for huger values of .
In order to perform an analysis of execution time by implementing the strategy dictated in Theorem 2, an initial capital of with a fixed interest rate of .05 and a horizon with discrete uniform distribution will be taken into account. Hence, in Table 5, the support of random horizon is increased as well as the value of parameter . From such a table, together with Figure 8, it is observed that the value of has no influence on execution time and that it poses a slow increasing rate (below the identity function) as maximum value of random horizon grows.
On the same fashion, a discretely uniform horizon will be considered again, an initial capital of but now and interest rate will vary from 0.01 to 0.75. In this setting, it is possible to observe in Table 6 as well as in Figure 9 that the interest rate has no repercussion on execution time and that it follows a moderate rise as a random horizon owns a larger support.
By summarizing, it may be seen that neither the value of parameter of exponential utility function nor the interest rate have an effect on performance time and that implementation time grows slower than the maximum value of horizon.
In this paper, a consumption and investment problem was studied through a Markov decision process with random horizon of finite support. In this framework, the optimal consumption and investment was obtained via a dynamic programming approach by evaluating consumptions via an exponential utility function.
A. Markov Decision Processes with Fixed Horizon
First of all, it will be defined the main topic of this paper: Markov decision processes, which will be the tool to solve the consumption and investment problem described in section 3. As an initial state, horizon will be considered as a fixed natural number. Hence, in such context, the following definition  is provided:
Definition A.1. A Markov decision model (MDM) with fixed horizon consists of the set with , where is a Borel space, called the state space, endowed with the -algebra . is a Borel space, called the space of actions, equipped with the -algebra . is a measurable subset of which denotes to the set of possible state-action combinations. It is assumed that contains the graph of a measurable function . For , the set is called the set of admissible actions. For each , is a measurable function, which gives the one-stage reward of the system at stage . is a measurable function, such that provides the terminal reward if final state is . is a stochastic transition kernel from to . Quantity gives the probability that the next state is in if current state is and action is taken.
Remark A.2. Usually, the definition of a Markov decision model considers all its components as time invariant; however, in view of the purposes of this paper, the reward function will depend on time, in fact this condition arises naturally when a random horizon is considered. Additionally, it is also possible to define MDM in a more general setting by considering a time dependence on the set of state-action set, transition functions, and transitions kernels; however, such a paradigm is beyond the interest of this paper, nevertheless, corresponding ideas may be found for example in [14, 25].
In Section 3, transition kernel is characterized by random variables defined on some measurable space called the disturbances space. It is assumed that such random variables have a common distribution which may depend on and that there exists a measurable function known as the transition function. Here, provides the next state of the system when the current state is , action is taken and disturbance occurs as follows. Hence, the corresponding transition kernel is defined as follows:
In the context of MDM, decisions are modeled via measurable functions from to as can be observed in the following definition.
Definition A.3. (a)A measurable function , such that for any , is called the decision rule. Let denote to the set of all decision rules.(b)A sequence of decision rules with is called policy or strategy. The set of this class of policies is denoted by .
One can find more general approaches dealing with policies, a very important reference in that direction is ; however, the last definition was adjusted to the intentions of Section 3. The formalization of Markov decision models under a probability space will allow to associate them with some probability measure, and consequently, it will be possible to define the corresponding mathematical expectation.
For this, we contemplate a Markov decision model in stages, an initial state , a fixed policy, and the canonical probability space guaranteed by the Ionescu-Tulcea Theorem [14, 25], usually denoted by , where and is the corresponding product -algebra. In addition, if , the state of the system at time is modeled via a random variable ; for by
On this probability space, is called the Markov decision process. Given that, the optimization problems that are treated in this article are related with optimization of expected values of aggregated rewards, the following assumption  is considered:
Assumption A.4. For , .
In all the sequel, it will supposed that Assumption A.4 holds for a Markov decision process with horizon .
Contemplated performance criteria of policy when initial state is is the so-called total expected reward:
Then, the value function for is defined by the following:
Functions and are well defined since
The following assumption allows to provide sufficient conditions to establish the existence of optimal policies .
Assumption A.5. There exist sets and , for , such that:(i).(ii)If , then is well defined and . (Here , ).(iii)For all there exists a maximizer of ; i.e., .
Theorem A.6. Let be functions on defined by the following:and for Suppose that Assumption A.5 holds, then there exist maximizers of , in addition the deterministic Markov policy is optimal, and the value function equals , i.e.,
B. Financial Markets
Financial markets allow an efficient allocation of resources within the economy. Through organized and regulated exchanges, these markets will give to participants a certain guarantee that they will be treated fairly and honestly. In short, it is a platform that allows traders to easily buy and sell financial instruments and securities, for example, stocks, bonds, commercial paper, bills of exchange, debentures, and more. Financial markets lie in the fact that they act as an intermediary between savers and investors, or they help savers to become investors [26, 27].
It will be considered a financial market of -periods with risky assets and a risk-free bond with considerations treated in . It will be assumed that random variables are defined in a probability space together with a filtration with . The financial market is given by the following: A risk-free bond with and where denotes the deterministic interest rate for period . There are risky assets and the price process of -th asset is given by known andWhere -a.s. for all and . is the relative price change on interval for -th risky asset and process is assumed to be adapted to for any .
Positive random variable defines the relative price change . Relative risk process is defined by and .
Consider now the following notation, , and . As is adapted it holds that: for . It is assumed that is the filtration generated by stock prices, that is . Subsequent definition is the main needed mathematical object for investing in the earlier described financial market.
Definition B.7. A portfolio or trading portfolio is a stochastic process -adapted, where and for . Random variable denotes the amount of money invested in -th asset during .
Therefore, the wealth process evolves as follows:
In order to solve the consumption and investment problem of Section 3, an utility function for evaluating consumptions will be needed.
Definition B.8. A function, is called a utility function, if is strictly increasing, strictly concave and continuous on its domain.
The following assumption correspond to the proper version of Assumption A.4 in the Financial market context.
Assumption B.9. . Where , for and .
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
Conflicts of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control: The Discrete-Time Case, Athena Scientific, Belmont, MA, USA, 1978.
O. Hernández-Lerma and J. B. Laserre, Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, Berlin, Germany, 1999.
K. Hinderer, Foundations of Non-sationary Dynamic Programming with Discrete-Time Parameter, Springer-Verlag, Berlin, Germany, 1970.
G. Arindam and S. Biswajit, Economically Independent Reverse Logistics of Customer-Centric Closed-Loop Supply Chain for Herbal Medicines and Biofuel, Elsevier, Amsterdam, Netherland, vol. 334, Article ID 129977, 2022.
G. Arindam, “Fractile criterion iterative-interactive optimisation process for multi-objective stochastic linear programming problems in fuzzy environment,” International Journal of Mathematics in Operational Research, vol. 18, no. 3, pp. 289–309, 2021.View at: Google Scholar
G. Arindam and K. R. Tapan, “Multi-objective optimization of cost-effective and customer-centric closed-loop supply chain management model in T-environment,” Soft Computing, vol. 24, 2020.View at: Google Scholar
S. Biswajit, M. Arunava, S. Mitali, K. D. Bikash, and R. Gargi, “Two-echelon supply chain model with manufacturing quality improvement and setup cost reduction,” Journal of Industrial and Management Optimization, vol. 13, no. 2, pp. 1085–1104, 2017.View at: Google Scholar
K. Pentikousis, O. Blume, R. A. Calvo, and S. Papavassiliou, “Mobile networks and management,” Social-Informatics and Telecommunications Engineering, Springer, Berlin, Germany, 2009, Lecture Notes of the Institute for Computer Sciences.View at: Google Scholar
H. Cruz-Suárez, R. Ilhuicatzi-Roldán, and R. Montes-de-Oca, “Markov decision processes on Borel spaces with total cost and random horizon,” Journal of Optimization Theory and Applications, vol. 162, no. 1, pp. 329–346, 2012.View at: Google Scholar
K. Yoshinobu, K. Masami, and Y. Masami, “Discounted Markov decision processes with utility constraints,” Computers and Mathematics with Applications, Elsevier, Amsterdam, Netherland, vol. 51, pp. 279–284, 2006.View at: Google Scholar
S. Carpin, Y.-L. Chow, and M. Pavone, “Risk aversion in finite Markov Decision Processes using total cost criteria and average value at risk,” in Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 335–342, Stockholm, Sweden, May 2016.View at: Publisher Site | Google Scholar
N. Bäuerle and U. Rieder, Markov Decision Processes with Aplications to Finance, Springer-Verlag, Berlin, Germany, 2001.
T. Bodnar, D. Ivasiuk, N. Parolya, and W. Schmid, “Mean-variance efficiency of optimal power and logarithmic utility portfolios,” Math Finance Economics, vol. 14, pp. 675–698, 2020.View at: Google Scholar
R. A. Jarrow, Continuous-Time Asset Pricing Theory, Springer Finance, New York, NY, USA, 1st edition, 2018.
N. Bäurle and U. Rieder, “More risk-sensitive Markov decision processes,” Mathematics of Operations Research, vol. 39, no. 1, pp. 105–120, 2014.View at: Google Scholar
A. L. Almudevar, Approximate Iterative Algorithms, CRC Press, Florida, FL, USA, 1st edition, 2014.
O. Hernández-Lerma and J. B. Laserre, Discrete-Time Markov Control Processes: Basic Optimality Criteria, Springer-Verlag, Berlin, Germany, 1996.
R. L. Heilbroner and W. Milberg, The Making of Economic Society, Prentice-Hall, New Jersey, NJ, USA, 13th edition, 2011.
J. Madura, Financial Markets & Institutions, Cengage Learning Inc., USA, 13th edition, 2020.