Journal of Applied Mathematics

Journal of Applied Mathematics / 2020 / Article

Research Article | Open Access

Volume 2020 |Article ID 2678374 | https://doi.org/10.1155/2020/2678374

Hendrik Baumann, "Finite-State-Space Truncations for Infinite Quasi-Birth-Death Processes", Journal of Applied Mathematics, vol. 2020, Article ID 2678374, 23 pages, 2020. https://doi.org/10.1155/2020/2678374

Finite-State-Space Truncations for Infinite Quasi-Birth-Death Processes

Academic Editor: Elisa Francomano
Received03 Mar 2020
Revised14 Jun 2020
Accepted23 Jun 2020
Published07 Aug 2020

Abstract

For dealing numerically with the infinite-state-space Markov chains, a truncation of the state space is inevitable, that is, an approximation by a finite-state-space Markov chain has to be performed. In this paper, we consider level-dependent quasi-birth-death processes, and we focus on the computation of stationary expectations. In previous literature, efficient methods for computing approximations to these characteristics have been suggested and established. These methods rely on truncating the process at some level , and for , convergence of the approximation to the desired characteristic is guaranteed. This paper’s main goal is to quantify the speed of convergence. Under the assumption of an -modulated drift condition, we derive terms for a lower bound and an upper bound on stationary expectations which converge quickly to the same value and which can be efficiently computed.

1. Introduction

In many applications, discrete-time or continuous-time Markov chains are used to model real-life problems. For these models, characteristics of interest can be determined, enabling us to understand relationships between parameters and characteristics, solve optimization problems, and so on. When Markov models are used, the characterization of the dynamics by its probability transition matrix or generator matrix , respectively, is quite simple due to all theoretical problems being solved. However, in applications, the determination of interesting characteristics of the process is very important. In many models, there is no chance of finding explicit analytical representations of these characteristics, and hence, we have to use numerical calculations or simulation methods. In most practical situations, interesting characteristics refer to stationary probabilities or stationary expectations which can be derived from the invariant distribution. In queueing theory, elementary examples are given by moments of the stationary number of customers in a queueing system or queueing network, or by the stationary probability that the number of customers exceeds some threshold. Similarly, such characteristics are interesting in Markovian population models, epidemic models, etc. Formally, stationary expectations are given by , where is the state space of the Markov chain, is the invariant distribution, and is some cost or reward function.

The invariant distribution for an irreducible and positive recurrent Markov chain is characterized by the unique probability vector which solves a certain homogeneous system of linear equations. The number of states coincides with the number of equations and the number of variables. Unfortunately, in realistic models, we have a very large number of states or even infinitely many states. In these situations, numerically computing stationary probabilities or stationary expectations requires a truncation of the system of equations (which means a truncation of the state space), and this truncation results in inevitable errors.

Hence, numerical calculations only compute some approximation to the desired characteristic , and from both a mathematical and a practical point of views, bounds on the truncation error are interesting. In this paper, we will derive lower and upper bounds on which is equivalent to computing an approximation to and bounds on . Since all values are computed numerically, our results and methods provide a posteriori error estimates. Competing methods for finding a posteriori error estimates were given in [13], and an approach for finding a priori estimates can be found in [4, 5]. In Section 8, we will briefly compare these methods to the method suggested in this paper.

Although our theoretical results will be quite general, we will only discuss the efficient implementation of the computation of the bounds for level-dependent quasi-birth-death processes (LDQBDs). These processes are characterized by the block-tridiagonal structure of the transition probability matrix (discrete-time) or generator matrix (continuous time). This structure is very typical for a large class of queueing models, population and epidemic models, etc.

The rest of the paper is organized as follows: After introducing some notation in Section 2, we will present our theoretical results on lower and upper bounds on stationary expectations in Section 3. In Section 4, we will derive a method for computing these bounds efficiently for LDQBDs. Afterwards, we will apply our method to the elementary queue (Section 5), a variant of the queue (Section 6), and a popular retrial queueing model (Section 7). Whereas we have an explicit analytical representation of the invariant distribution and of stationary characteristics in the first example, numerical computations are the only chance to find stationary expectations in the latter two examples. Finally, we give a comparison of the suggested method to competing ones (Section 8).

2. Preliminaries

2.1. Notations and an Auxiliary Result

In order to introduce our goals formally, we review some well-known facts on the limit behaviour of Markov chains. In this context, and in the whole paper, we will use the following notations: (i) is a finite or infinite identity matrix. The dimension will become clear from the context(ii)1 denotes a finite or infinite (column) vector with all entries being 1. Alternatively, 1 refers to a function with constant value 1. The meaning will be clear from the context(iii) denotes the indicator function of set , that is, if , and otherwise. With slight abuse of notation, we use for Boolean expressions . If is true, is 1, and otherwise, is 0(iv)id is the identity function(v)We use the notation and .

Furthermore, we will use a probabilistic proof of invertibility frequently throughout the paper: If is finite and transient, is invertible. We give some details: A finite substochastic matrix is referred to as transient if (component-wise). This is true if and only if contains no stochastic submatrix for some . Hence, is transient if and only if, for all , there are with for , and .

Due to being finite, entails for all eigenvalues of P, and hence, converges, and by a standard argument, we obtain which means that is invertible.

2.2. Results on the Limit Behaviour of the Markov Chains

We briefly summarize some results on the limit behaviour of Markov chains since these results are the reason why it is important to be able to compute (approximations to) stationary expectations .

We consider Markov chains in discrete time or Markov chains in continuous time with countable state space . Throughout the paper, we assume that is irreducible and recurrent with transition probability matrix or that is nonexplosive, irreducible and recurrent with generator matrix . Then, there is an invariant measure , that is, a positive vector with or , respectively. In both cases, is unique up to constant multiples. For , we introduce the notation . If and converge absolutely with for functions , a fundamental result on the limit behaviour of Markov chains is in the sense of almost sure convergence (see [6], pp. 85-86, 203-209). If we have the stronger assumption of positive recurrence, the sum of the entries of is finite. Hence, by multiplying with an appropriate constant, we obtain the uniquely determined invariant distribution which still is an invariant measure, but additionally satisfies . Then, we can set and constant, and obtain almost surely if converges absolutely. If measures costs or rewards that are associated with the time spent in state , is the long-run average of costs or rewards per time unit. In particular, for , we can specify and is the long-run average of the proportion of time spent in set . For , is the long-run average of the proportion of time spent in state . Note that there is another interpretation of the entries of the invariant distribution : For irreducible and positive recurrent discrete-time Markov chains which are additionally aperiodic, converges to some random variable in distribution, and for nonexplosive, irreducible, and positive recurrent continuous-time Markov chains, converges to some random variable in distribution. Then, is the distribution of or (see any textbook on Markov chains, e.g., [6, 7]), that is, and . Hence, if converges absolutely for some function , we have or , respectively.

Due to these results on the limit behaviour, in many applications, all characteristics of interest can be deduced from for functions , where we will set in case of positive recurrence. Furthermore, by splitting functions into positive and negative part, we can assume for all without restriction. We remark that is vector valued, .

2.3. Notations for Block-Structured Markov Chains

Throughout this paper, let the state space be partitioned into finite and pairwise disjoint sets which will be referred to as levels. We introduce some notation: (i)We fix . will be chosen such that the -modulated drift condition (see below) holds on . Furthermore, throughout this paper, is some arbitrary, but fixed state within level (ii) denotes the invariant measure with . In case of positive recurrence, is the invariant distribution(iii)For , or , respectively, shall denote the transition probabilities or transition rates, respectively, for transitions from states to states .(iv)For , we write and for functions .

Later on, multidimensional functions will be useful for computing approximations to several stationary expectations at the same time. With this notation, we have , and we are interested in computing approximations to and use them for approximating , or we want to find an approximation to directly by other means. As in other approaches for finding error bounds on the approximation of stationary characteristics, we assume that an -modulated drift condition is satisfied, that is,

In scalar notation, (6) reads as for . -modulated drift conditions are a popular tool to prove positive recurrence and convergence of or . In our examples in Sections 57, we will use this tool, too. For details, we refer to Theorem 1 and Theorem 2 in [8]. Note that the idea of making use of -modulated drift conditions for this purpose is much older (see, e.g., [913], but some of these classical references formally require , whereas the results in [8] only require .

Finally, throughout the paper, we will require that

We will comment this structural requirement in Section 8.

3. Upper and Lower Bounds on : Theoretical Results

Our goal is to find a lower bound and an upper bound on such that (i)Both bounds converge to as the truncation level tends to (ii)Both bounds can be computed numerically in an efficient manner.

In this section, we focus on the “mathematical” criteria, that is, we derive exact terms for the lower and upper bounds in which both converge to . The latter goal of computational efficiency will be considered in Section 4 for the special case of QBDs.

First, we focus on the discrete-time setting. We use the notation for block-structured Markov chains as introduced above, and we begin with a representation of invariant measures and their approximations. For this purpose, we introduce the hitting times , , and .

Lemma 1. Let be an irreducible recurrent discrete-time Markov chain with block-structured transition probability matrix as in the preliminaries, let with and . Let and . (a) is well defined, and we have and all increase monotonically in . (b) exists for all with and exists with For the limits, we have for all and . (c) is stochastic and irreducible. Let be the invariant measure for subject to , and let for . Then is the unique invariant measure for subject to .(d)We have where .

Proof. We refer to the representations of , , , and as “probabilistic interpretations.” (a)The definition of implies Since is finite and transient, is invertible, and hence, is well defined for . Afterwards, the definition of for is explicit. We omit the proof of the probabilistic interpretation since such considerations are standard in the context of finding probabilistic interpretations for invariant measures (see, e.g., [7, 14]), and since these considerations are similar to those in the proof of Lemma 3.2b) or Lemma 4 below. Note that the probabilistic interpretation directly implies the monotonicity since increases monotonically. (b)More precisely, converges to almost surely. By monotone convergence, the probabilistic interpretation of and follows. Note that these expectations are finite due to recurrence. for and can be proved by means of the probabilistic interpretations. Again, we omit all further details.(c)The probabilistic interpretation of can be rewritten as Due to recurrence, we have almost surely, and stochasticity and irreducibility follow easily. Since is finite, it admits an invariant measure which becomes unique by requiring . Multiplying by immediately leads to for , and analogously, we find . Hence, is invariant for . Note that we can find the probabilistic interpretation by standard considerations (again, we omit details). This is the standard representation of invariant measures with (see, e.g., [7, 14]). (d)The statement is a direct consequence of and the representation of found in (c).Both and depend on for all . Our goal is to find bounds both on and which can be computed from , since these matrices only depend on the finite matrix . We begin with bounds on .

Lemma 2. Let all requirements of Lemma 1 hold and choose such that is an invertible diagonal matrix. (a)Such a matrix exists, and for two such matrices we have with an invertible diagonal matrix D.(b)For any such choice of , we have (c)Let and be defined as in (b). Then increases monotonically in N, decreases monotonically in , and both converge to for all .

Proof. We adapt ideas for proving classical bounds on quotients of entries of the invariant measure (see, e.g., [15]). (a)The probabilistic interpretation of implies that is transient. Hence, is invertible and we could choose . If both and are invertible diagonal matrices, so is .(b)First, we note that and do not depend on the choice of since a left-hand multiplication by some diagonal matrix has no impact on the quotients under consideration. Now, we will directly construct a matrix with for all . With and , this works as follows: We set Additionally, we set . Then for all, is the standard construction (see, e.g., [7, 14]) of the minimal subinvariant measure for the substochastic matrix subject to . Indeed, for . Hence, is a diagonal matrix. Analogously, the rows of are subinvariant measures for . This latter matrix is finite and stochastic, and hence, every subinvariant measure is invariant. In particular, every row of is a constant multiple of , that is, for all . Due to , we have by a trivial induction, and thus, . From , we obtain for all . By setting , we obtain (due to ), and by setting , we obtain .(c)We have , implying , and by monotone convergence, we obtain By setting or , respectively, the statement follows.
Lemma 2 gives a lower and an upper bound on all entries of ψK. It remains to find bounds on . A lower bound is a direct consequence of Lemma 1.

Lemma 3. Let all requirements of Lemma 1 hold, let , and let (a)We have .(b) increases monotonically in with limit .

As pointed out above, for finding an upper bound, we have to use some information on with or . This is done by using the -modulated drift condition (5). Note that the function is given, and the function solving the drift condition can often be found by “educationally guessing” even if there is no chance to find explicit analytical terms for or .

Lemma 4. Let all requirements of Lemma 1 hold, let the -modulated drift condition (5) hold, and let the structural requirement (7) be fulfilled. Furthermore, define . (a)For , we have .(b)Additionally, let . Then converges to .

Proof. (a) First, remember where for implying For , we have and together, we obtain Analogously, we find Hence, it remains to show for all
. For this purpose, we first write Let . By means of the tower property of conditional expectations, we find Together, we obtain for the entries of . Below, we will show for , and due to and , the desired statement follows.
For proving , we define for . Then for , and by induction, we obtain for all and all , since transitions from have probability . Due to recurrence, is finite almost surely, implying almost surely, and by monotone convergence, , implying for all as desired.
(b) Since converges to , it suffices to prove that converges to . For this purpose, we write Again, let . For , we have (due to the respective probabilistic interpretation, see proof of Lemma 1 for the interpretation of ), and since we assume that converges, we have convergence to for . Since the choice of is arbitrary, this is true for any .
By summarizing all previous results, we obtain our main result for the discrete-time setting where we will omit the dependency of on . Furthermore, we will phrase the result directly for both discrete-time and continuous-time setting. All previous results may be transferred into the context of continuous-time Markov chains easily by means of the embedded jump chain. Since similar considerations have been used in the literature frequently (see, e.g., [4, 16, 17]), we omit further details.

Theorem 5. Let be an irreducible and recurrent discrete-time Markov chain, or let be a nonexplosive, irreducible, and recurrent continuous-time Markov chain, respectively. Let the transition probability matrix or , respectively, be block-structured as introduced in the preliminaries. Let the drift condition (5) or (6), respectively, and the structural requirement (7) hold. Furthermore, let (i) be defined by and or (ii) be chosen such that or AT, respectively, is a diagonal matrix, let and for , and let and as row vectors(iii) or , resp., for (iv) or , resp., be component-wise finite for all (v)and (a) is uniquely defined for , and or T, resp., is invertible, in particular, there is a matrix A with the desired properties(b)We have (c)If , we have

Remark 6. Theorem 5 is phrased for Markov chains with an infinite state space, and the goal is to find approximations to which rely on the transition probabilities for transitions within a finite subset of the state space. Since even finite state spaces can be so large that an exact computation of cannot be performed, we might want to apply Theorem 5 to Markov chains with a finite state space in order to reduce the state space to a smaller one.
Indeed, for , , for , and , (a) and (b) in Theorem 5 remain true. Note that formally, we cannot allow since would become stochastic, and would not be invertible anymore. However, means that we have no reduction of the state space, and hence, this setting makes no sense. Part (c) becomes obsolete.

4. Efficiently Computing the Bounds for Quasi-Birth-Death Process (Discrete-Time Setting)

We turn now to developing methods for computing both bounds on efficiently (and simultaneously). As pointed out in the introduction, we focus on quasi-birth-death processes (QBDs). QBDs are characterized by or , resp., for with , that is, each jump changes the level at most by 1.

Level-independent QBDs were analysed by Neuts [18] by means of matrix-geometric methods. First algorithmic approaches for level-dependent QBDs (LDQBDs) are due to Bright and Taylor [17] and Hanschke [19]. The approach in [17] generalizes Neuts’ probabilistic interpretations of the matrices which arise in the matrix-geometric method, whereas the approach in [19] is motivated by the relationship between second-order vector-matrix difference equations and matrix-valued continued fractions. Remarkably, both methods are equivalent (up to suggestions regarding some initializations). More details and comparisons can be found in [20, 21]. All these methods intend to find (approximations to) an invariant measure or the invariant distribution. In [22], an algorithm was suggested which allows to compute stationary expectations directly. In the continuous-time setting, this method reads as follows: (i)Choose large , set and (ii)For , replace by and by (iii)Determine as an (approximative) solution to (iv)Return .

Since the memory requirement does not depend directly on (with , it depends on max{}), the truncation level can be chosen very large. Despite the possibility to choose large , results on the truncation error still remained desirable from both a mathematical and a practical points of view.

It is not difficult to prove that (with the notation of the present paper) the method from [22] computes the lower bound on for ( has no impact on the lower bound). The goal of this section is to generalize this method in such a way that any value is allowed, and that the upper bound will be computed, too. For means of conciseness, we focus on the continous-time setting.

Note that the QBD property directly implies for , that is, (7) holds. The drift condition (6) is met if and the definition of simplifies to

Most importantly, the computation of the matrices according to the system becomes much easier (the matrix-analytic methods in, e.g., [17, 1921] use this fact). Many of the following identities are consequences of the probabilistic interpretation of the involved matrices, and we omit these proofs since the considerations are similar to those in [17] anyway. (i)Let be defined by and Then, the inverses in (39) exist, and we have (ii)Similarly, for defined by and we have (iii)The structure of allows to write .

If the level sizes are relatively small, the computational effort induced by (39) and (41) is acceptable. Then, it seems natural to compute all and , then all and finally and . However, in [22], it was pointed out that for , computing can be performed in a much more efficient way (in particular with respect to memory requirement) by using a Horner-type scheme. Here, we generalize this procedure slightly with respect to two issues: We consider an arbitrary , and we consider the simultaneous computation of and . Introduce

With this notation, we have

The recurrence schemes for , , and all start at , and are used in the sequel for computing the other values for . Similarly, the computation of and starts at , and then the values for can be computed. For finding , we only need the values for . Hence, for , is only used for finding , , and . Similarly, the other matrices are only used in a single step. Hence, there is no need to store these matrices. Instead, we suggest to use the recurrence scheme for all these matrices as an “update” procedure. In total, we suggest the following method. (i)Choose, set , , and .(ii)For , update (iii)Set , and .(iv)For , update (v)Set and determine such that is a diagonal matrix, define by , and define by .(vi)Return as a lower bound and as an upper bound.

Usually the matrices