Computation of Invariant Measures and Stationary Expectations for Markov Chains with Block-Band Transition Matrix
This paper deals with the computation of invariant measures and stationary expectations for discrete-time Markov chains governed by a block-structured one-step transition probability matrix. The method generalizes in some respect Neuts’ matrix-geometric approach to vector-state Markov chains. The method reveals a strong relationship between Markov chains and matrix continued fractions which can provide valuable information for mastering the growing complexity of real-world applications of large-scale grid systems and multidimensional level-dependent Markov models. The results obtained are extended to continuous-time Markov chains.
This paper is concerned with the computation of invariant measures of discrete-time Markov chains having a block-partitioned one-step transition matrix of the form where all elements are nonnegative matrices of suitable dimension, that is, for . The states of the Markov chain are denoted by , and , and are ordered in lexicographic order, that is, , …, , , …, , , …. Following Neuts , the set of states will be called the level. We refer to the first component of (the level) as , and by , we denote the second component of . Throughout this paper, it is assumed that and that the Markov chain is irreducible and recurrent. Note that irreducibility implies for all . From the general theory of Markov chains, we then know that the system has a unique solution (up to a constant factor) which is strictly positive (see Karlin and Taylor (, chapter 11)). For the solution vector , we use the notation and Every constant positive multiple of is called an invariant measure of . The class of irreducible recurrent chains can be further subdivided into positive and null-recurrent chains according to whether or , with being a vector with all its components equal to one.
Vector-state Markov chains of the form (1) generalize the theory of matrix-geometric processes which was pioneered by M. Neuts . As he points out in his monograph, the matrix-geometric approach is applicable to a wide class of stochastic models. The characteristic feature of these models is that the blocks within the transition probability matrix (or the generator matrix in case of continuous-time models) only depend on their relative position to the main diagonal, that is, only depends on . Otherwise, Neuts’ method fails.
A general strategy for solving systems of the form (1) is the following: (i)Truncate the system at level , that is, consider the northwest corner truncation of with blocks(ii)Derive a stochastic matrix by appropriately augmenting (iii)Solve .
Then might converge to the desired solution under some appropriate conditions (a natural condition is some normalization, e.g., ).
In case of (1) with and only depending on , level-independent quasi-birth-death processes arise. For and arbitrary , the quasi-birth-death process is level-dependent. Several publications have dealt with the problem of computing the invariant measures for this class of stochastic processes numerically: In  and in , algorithms based on taboo probabilities were developed. In , a method relying on matrix continued fractions was discussed. In principle, these three algorithms differ with respect to the question how to compensate the truncation of the infinite state space which is inevitable for algorithmic procedures, that is, how to augment . The major part of the algorithmic procedure is similar in all algorithms, and as pointed out in , for large truncation levels, it can be used without any augmentation. Since is not stochastic anymore, the last step of the above general method has to be interpreted appropriately: shall solve up to the first scalar equation. This simplified matrix-analytic algorithm becomes even more powerful when only stationary expectations have to be computed since then, the memory-efficient variant suggested in  can be applied.
Recently, Masuyama published a technical report  in which Markov chains (with a focus on continuous-time Markov chains) with a general block structure are discussed. Under somewhat restrictive conditions, it is shown that the above method (truncation, augmentation, finding a solution of the truncated and augmented system) converges to the desired invariant measure if the truncation level tends to . Since Markov chains with general block structure are discussed, transition matrices of the form (1) or its continuous analogue ((20) below) can be interpreted as special cases. However, the focus in  is the discussion of augmentation procedures.
We will treat the problem of finding an invariant measure numerically with a different perspective: After the truncation, we perform no augmentation but find in a similar manner as in the method for quasi-birth-death processes described in . In some situations, not using an augmentation might result in requiring a higher truncation level than with an appropriate augmentation (although this is not clear). However, we discuss not only a basic algorithm for computing invariant measures but also the direct computation of stationary characteristics for which we introduce a memory-efficient implementation. Due to the memory-efficiency, our method can be applied with a very high truncation level.
In total, in some way, we combine the ideas of Neuts’ original matrix-geometric approach with considerations for level-dependent quasi-birth-death processes (although the resulting transition structures are not as general as in ), and with memory-efficient algorithms as developed in  for quasi-birth-death processes.
The rest of the paper is organized as follows: We begin with a derivation of our results and methods based on taboo probabilities in Section 2 and extend the results to continuous-time Markov chains in Section 3. The resulting algorithm for computing invariant measures is presented in Section 4 whereby we also derive a memory-efficient implementation for computing stationary expectations. Section 5 is dedicated to an alternate approach. This derivation is based on generalized continued fractions or matrix continued fractions, respectively. Although these considerations do not result in new computational methods, they give us a better insight in the algebraic and numerical aspects of the approach. Finally, in Section 6, we demonstrate the applicability of our methods by briefly discussing a queueing model with batch service and server breakdowns.
2. A Derivation Based on Taboo Probabilities
According to [2, 9, 10], an invariant measure of an irreducible recurrent Markov chain with state space and transition probability matrix is given by and for , where is some fixed state and denotes the probability of event . The summands on the right-hand side are taboo probabilities. Irreducibility of guarantees that the series converges for all . can be interpreted as the expected number of visits to state before the Markov chain returns to its initial state . Note that is the unique invariant measure of with .
In the present case, we choose and obtain for . By , we denote the first hitting time of of some level .
denotes the northwest corner truncation of at level , is an identity matrix of appropriate size, and is the vector with unity in the first position and zeroes elsewhere. The solution is unique, and with , we have and for with . Obviously, increases monotically in , and by definition, we have .
Furthermore, we set to be the first positive hitting time of some level (note that due to irreducibility and recurrence, almost surely), and we define the matrices
Again, the entries of can be interpreted as expectations. In this case, conditioned on the initial state , the entry at position is the expected number of visits in before returning to some level . Similarly, the entries of are expectations for the expected number of visits in before returning to some level or hitting some level . Obviously, for . Again, componentwise convergence of is a direct consequence of the monotone convergence theorem. We are now ready to find the representation of the invariant measures which will be exploited numerically later on.
Theorem 1. With the structure (1) ofand the notations introduced above, we haveIn (12) and (13), the inverse matrices exist. In (11), the solution of the system of linear equations is unique up to constant multiples, and additionally,holds up to the first scalar equation.
Proof. The interpretations of are very similar to the level-independent case (see ) and to quasi-birth-death processes (see [4, 5]). Therefore, we do not state all proofs in detail. (i)For , we have This proves (9) for . For , we have an additional summand for , but on the other hand, for , the sum regarding starts with . It turns out that (9) still holds. (10) is proven analogously.(ii)Using the structure (1) of and (9) repetitively, it becomes clear that satisfies equation (11) which can be rewritten as , where By a calculation similar to that for proving (9), we see that the entry of at position is given by the probability that the Markov chain returns to level before hitting a level and that this return is to . In particular, the entries of represent the probabilities that the Markov chain eventually returns to level and that this return occurs in . Irreducibility and recurrence of imply stochasticity and irreducibility (and due to being finite, recurrence) of . Hence, is an invariant measure for which is uniquely defined up to a constant factor.(iii)Alternatively, (11) can be shown by arguments similar to those leading to (9). These arguments also confirm that satisfies the corresponding equation up to the first scalar equation(iv)Additionally, define as the matrix of the expectations of the number of visits in before hitting some level . Then considerations similar to those yielding (9) lead to , and hence, for and for .The fact that converges monotonically increasing to entails that the are minimal nonnegative solutions to the equations which corresponds to similar results in . However, we will not make use of this fact in Algorithms 1 and 2 which we suggest for computing invariant measures and stationary expectations.
3. Continuous-Time Markov Chains
For continuous-time Markov chains with block-structured generator matrix invariant measures satisfy . For the construction of a continuous-time Markov chain from its infinitesimal generator as well as for precise definitions of properties such as regularity, recurrence, and ergodicity, the reader is referred to [11, 12]. Assuming regularity, irreducibility, and recurrence, the solution to is unique up to constant multiples and strictly positive. Furthermore, these assumptions allow constructing the embedded jump chain which is a discrete-time Markov chain with transition probability matrix with structure (1) and where is the diagonal matrix with entries at position . Irreducibility of implies for all , and hence, is indeed invertible for all . Furthermore, irreducibility and recurrence of imply irreducibility and recurrence of and vice versa (note that this is not true for ergodicity). Finally, if is invariant for , an invariant measure for can be constructed by setting .
Taking into account that satisfies (12), we arrive at
Since the th diagonal entry of can be interpreted as the mean sojourn time in , the entries of are the expected time intervals spent in state before returning to some level , conditioned on starting in . Therefore, the entries of represent the proportion of the expected times spent in before returning to some level , and the time spent in the initial state .
Summarizing, we get
Theorem 2. With the above assumptions on, we havewhere the solution of the system of linear equations in (26) is unique up to constant multiples. The ’s satisfy and can be uniquely constructed by for , where for and satisfies up to the first scalar equation.
Next, we demonstrate how to use the results numerically. We focus on continuous-time Markov chains since the methods for discrete-time Markov chains follow an analogous scheme.
Basically, we have to use (28) for computing approximations to , then we determine an approximation for by solving (26) up to the first scalar equation, and finally, we use (24) for computing approximations to . In order to save operations, we suggest the following algorithm for computing an approximation to the invariant distribution.
In case of positive recurrence, an invariant distribution can be obtained afterwards by renormalizing . We give some remarks: (i)Algorithm 3 in  is quite similar. While our algorithm relies on the nonstochastic matrix or the nonconservative matrix , respectively, in , an augmentation is performed first in order to obtain a stochastic or conservative matrix. Note that Theorem 1 guarantees that for , we have convergence of the approximations or to the desired invariant measure without any augmentation. The only use of augmentations is that the truncation error might be decreased a bit for small values of the truncation level. Instead of considering an augmentation, we suggest to choose very large. If such a choice results in computational problems, we suggest to use the memory-efficient algorithm for computing stationary expectations (see below). Furthermore, note that the results in  were derived under different conditions: While is allowed for all (lower-block Hessenberg matrices), some restrictive conditions (Assumptions 2.1 and 2.2 in ) are assumed(ii)A major problem is the choice of the truncation level . There are some approaches concerning a priori estimates of the truncation error (e.g., ), but still the problem is not completely solved. Instead of using such error estimates, alternatively, as in , we might increase the truncation level iteratively, that is, we start with some small and increase by in each step until for some fixed norm and some fixed small . Note that for each new value of , we have to restart Algorithm 1. Furthermore, note that a small difference does not imply that is small where is the true invariant measure(iii)In many applications, the state space is finite with a transition structure corresponding to the northwest corner of (1) or (20), respectively. Then Algorithm 1 is exact up to numerical deviations(iv)Instead of computing the inverse matrix first and then perform the matrix multiplication for determining , it is recommended to solve the system directly(v)The matrices can be generated when they are used for computation. Hence, there is no need to store these matrices(vi)We have to store about matrices at the same time. If the dimensions of the matrices are constant, say , the memory requirement is approximately real numbers(vii)Concerning the computational effort, we have to solve approximately linear systems of equations of the form where the coefficients and and the solution are matrices of dimension (for constant ). In addition, we have to perform matrix multiplications. The computational effort is approximately with some constants and (the exponent and the constants depend on the method of choice, we have for “conventional matrix multiplication” and for Strassen’s algorithm, see )(viii)For , Algorithm 1 more or less coincides with the methods proposed in [3–5]. One might expect that also in general, a discrete-time Markov chain with transition probability matrix (1) and bandwidth or a continuous-time Markov chain with generator matrix (20) and bandwidth , respectively, may be reduced to the block-tridiagonal case by appropriately rearranging the states, i.e., define a new level as the union of the original levels and define a new level as the union of the original levels , …. But the increased block size heavily impacts the performance of the algorithm. If we apply the same truncation procedure, the truncation level for the modified matrix structure is , and for , all matrices involved have the dimension . This results in a computational effort of approximately . Since , the computational effort for the transformed system becomes significantly larger than the effort for the original system. Similarly, the memory requirement of is replaced by . For these reasons, one should discard this supposedly simple “trick” (compare (, pp.~73ff)).
In applications, often only stationary expectations have to be considered: Consider the case of positive recurrence, and denote by the invariant distribution. Then the ergodic theorem for discrete-time Markov chains guarantees that for any function for which converges absolutely. Again, by , we refer to a vector of appropriate dimension consisting of s. Analogously, for continuous-time Markov chains, we have if converges absolutely. Now, let be a vector valued function with for all . Then for continuous-time Markov chains, we have for . Hence, computing allows to determine the stationary characteristics which coincide with the long-run costs or rewards measured by .
When computing , we can apply a Horner-type scheme which allows us not to store the matrices and therefore save quite a lot memory requirement. This idea is very similar to that in . With we write where
Obviously, we have and
The matrices are computed by a backward scheme, and (35) is a backward scheme, too. Since we are only interested in computing , there is no need to store the matrices or . We suggest
Again, we give some comments: (i)The problem of finding an appropriate truncation level to keep the error in prescribed limits is still to be solved. In , a generalization of the idea from  is discussed, and and are compared. Unfortunately, the method is quite cumbersome to apply, especially because it would be more desirable to compare with . Again, an alternative consists in iteratively increasing until becomes small. Note that again, this method is somehow problematic since the value of this difference does not tell anything about the true error (ii)For finite state spaces with the transition structure under consideration, Algorithm 2 is exact up to numerical deviations(iii)Of course, it is still recommended to solve the system directly, and still, the matrices should be generated when they are needed(iv)In total, we have to store the matrices and . Since we need matrices at the same time, the total amount of memory requirement is if the matrix dimensions are constant. In particular, the memory requirement does not depend directly on the truncation level (v)As pointed out above, in (, section 3.2), lower-block Hessenberg matrices (block -type matrices) are considered, that is, for , but is allowed for all . All our theoretical considerations carry over (we only have to replace by or by at some points). Unfortunately, when applying Algorithm 2 to such block matrices, we still have to store matrices, and hence, it is not memory-efficient anymore. Since the memory-efficient implementation makes the theoretical considerations applicable to a wide range of stochastic models, we have restricted ourselves to considering band-block-structured matrices(vi)The considerations concerning the computational effort are similar to those for Algorithm 1. In particular, we do not recommend to reduce the structure to the case (vii)Note that often, “natural” state spaces are multidimensional. However, if there is a partition into finite levels such that transitions with positive probability/transition rate increase the level at most by and decrease it at most by , an appropriate renumbering of the states results in a structure given by (1) or (20). We refer to the end of Section 6 for an example and to [7, 13, 17] for the corresponding discussion concerning quasi-birth-death processes (that is, ).
5. Relationship to Matrix Continued Fractions
This section is devoted to the interdependence between Markov chains and matrix continued fractions. The results obtained are of importance not only from an inner mathematical but also from a numerical point of view. Matrix continued fractions are generalizations of ordinary continued fractions and generalized continued fractions (or -fractions, see [18, 19]). We will show that with each transition matrix of the form (1), one can associate a matrix continued fraction (or a matrix-driven Jacobi-Perron algorithm) being convergent if is irreducible and recurrent. In addition, it is proven that its approximants and its limits coincide with the coefficients of the recurrence relations (10) and (9), respectively.
A milestone in pure mathematics has been the arithmetic characterization of quadratic irrationalities which has been discovered by J. L. Lagrange (see ). Lagrange’s theorem states that the expansion of a real number as a simple continued fraction is periodic if and only if it is the real root of a quadratic equation in one variable with rational coefficients. Since then, mathematicians like C. G. J. Jacobi , O. Perron , P. Bachmann , L. Bernstein , and F. Schweiger  tried to expand Lagrange’s approach to higher degree irrationalities. G. C. F. Jacobi and O. Perron developed a tool for approximating real algebraic irrationalities of degree . To the honor of Jacobi and Perron, this algorithm has been named as the Jacobi-Perron algorithm. Although the Jacobi-Perron algorithm does not lead to the desired results, it has been successfully addressed to a lots of other unsolved problems in approximation theory and numerical analysis. In particular, in connection with modern computer technology, the Jacobi-Perron algorithm turns out to be an efficient tool for identifying and calculating subdominant solutions of higher order linear vector recursions.
In the sequel, we will show that the schemes (10) and (13) are equivalent to a specific matrix continued fraction or a matrix-driven Jacobi-Perron algorithm associated with the transition probability matrix (1). For this purpose, we assume that for all and that is invertible for all . Define where denotes the unit matrix. Then (36) and (2) yield the th-order homogeneous linear vector recursion subject to the boundary condition
In this perspective, the problem at hand is to find the “right” initial values, say sufficiently large, such that for all . In order to find a solution to this problem based on generalized continued fractions, we generalize and combine the results from [5, 26].
Remember that where solves the truncated system (6). (36) and (6) lead to the finite difference scheme coupled to the boundary conditions: where is a vector with unity in the first position and zeroes elsewhere. For simplification we transform the system (40) into a first order scheme of the form by putting
Now, let be fixed. By repeated substitutions, we obtain from (43) with
For computing the -matrices , the following result reduces the computational effort.
Theorem 3. Let, be defined as in (36) and let, for convenience,Then where
Proof. From (46), it is clear that the matrices can be recursively computed by with initial condition Rewriting (52) in expanded form we get By repeated substitutions, it is readily seen that the matrices satisfy the recurrence relation (50) at least for . To proceed in this manner we assume, for the moment, that the recurrence relation (50) may be continued for , without affecting the initial condition (53) and the scheme (54). Since the matrices do not appear in formula (46), they can be chosen arbitrarily. By putting and we get which explains (49) and (51). Since , (55) becomes compatible with (52) and (53), which proves our result for .