Abstract

This paper deals with moment matching of matrix exponential (ME) distributions used to approximate general probability density functions (pdf). A simple and elegant approach to this problem is applying Padé approximation to the moment generating function of the ME distribution. This approach may, however, fail if the resulting ME function is not a proper probability density function; that is, it assumes negative values. As there is no known, numerically stable method to check the nonnegativity of general ME functions, the applicability of Padé approximation is limited to low-order ME distributions or special cases. In this paper, we show that the Padé approximation can be extended to capture the behavior of the original pdf around zero and this can help to avoid representations with negative values and to have a better approximation of the shape of the original pdf. We show that there exist cases when this extension leads to ME function whose nonnegativity can be verified, while the classical approach results in improper pdf. We apply the ME distributions resulting from the proposed approach in stochastic models and show that they can yield more accurate results.

1. Introduction

Probability distributions that can be expressed as the composition of exponential stages have gained widespread acceptance in recent years, specifically as ways of modeling nonexponential durations maintaining the Markov property of the underlying stochastic process. Distributions with these characteristics are represented by the family of phase type (PH) distributions which are given by the distribution of time to absorption in Markov chains (see, e.g., Chapter 2 in [1]). In the literature, mainly continuous-time PH distributions have been studied, but more recently some works that deal with the discrete-time version have also been proposed [2]. Phase type distributions are practically interesting since they allow studying with Markov models problems that are characterized by distributions with coefficient of variation either smaller or larger than one.

Matrix exponential (ME) distributions [3] are an extension of the PH class. They can provide more compact approximation than the PH class [4] and can capture cases in which the density functions that we want to represent may be zero on positive real numbers (i.e., they may exhibit multimodal shapes). ME distributions have the same algebraic form as PH distributions, but they do not enjoy a simple stochastic interpretation (a stochastic interpretation of ME distributions is given in [5], but this is much more complicated and less practical than that possessed by PH distributions).

The main reason to apply PH and ME distributions in stochastic modeling is that these distributions can be easily used as building blocks of more complex models. This fact is well-known for PH distributions because it is straightforward that if we are given a system in which all sojourn times are according to PH distributions and the next state distribution is Markov then the overall system behavior can be described by a Markov chain. Starting from [6], the construction of the Markov chain, often referred to as the expanded Markov chain, was proposed in the literature for several modeling formalisms, such as stochastic Petri nets (SPN), stochastic process algebras, or stochastic automata networks [79]. The downside of dealing with the expanded chain is that if the model has many states and/or it describes many activities performed in parallel, then the chain can have a huge number of states. To alleviate this problem several authors proposed techniques for the compact representation of the expanded chain. Such techniques are based on either Kronecker algebra (among the first works see, e.g., [10]) or, more recently, decision diagrams techniques (e.g., [11]).

The necessary theoretical background to use ME distributions as building blocks was developed instead only recently. In this case the overall model is not a Markov chain, but the transient system behavior can still be described by a set of ordinary differential equations. In the context of SPN, this was shown for a subclass of ME distributions in [12], while in [4] the result was extended to the whole family. In the context of quasi-birth-and-death processes the possibility of using ME distributions was investigated in [13].

In order to apply PH or ME distributions in stochastic modeling one needs methods to create distributions that capture real measured durations. Most of the existing techniques fall into two categories: those that are based on the maximum likelihood (ML) principle and those that aim to match moments.

One of the first works on ML estimation considered acyclic PH distributions [14] (i.e., PH distributions whose underlying Markov chain is acyclic), while an approach for the whole family, based on the expectation-maximization method, is proposed in [15]. Since these early papers, many new methods and improvements have been suggested for the whole PH family and for its subclasses (see, e.g., [16, 17]). Much less research tackled ML based fitting of ME distributions because of the lack of a practical stochastic interpretation. One such method, based on semi-infinite programming, is described however in Chapter 9 of [18] where the computational complexity of the problem is discussed and an algorithm is devised.

For what concerns moment matching methods the following results are available. For low-order (≤3) PH and ME distributions (the order of PH and ME distributions is the size of their generator matrices) either moment bounds and moment matching formulas are known in an explicit manner or there exist iterative numerical methods to check if given moments are possible to capture [1921]. For higher orders there exist matching algorithms, but these often result in improper density functions (by proper pdf we mean a pdf which is nonnegative and normalized) and the validity check is a nontrivial problem [18, 22]. In [23] a simple method is provided that constructs a minimal order acyclic PH distribution given three moments. Characterization of moments of PH and ME distributions is discussed in [24]. Moreover, tool support is available for the construction of PH and ME distributions. Specifically, ML based fitting is implemented in PhFit [25] and a set of moment matching functions is provided in BuTools [26].

In this paper we consider moment matching of ME distributions by Padé approximation [18]. This method provides an order- ME distribution given moments. The problem is that the probability density function (pdf) associated with the distribution can be improper (i.e., it may assume negative values). We show that Padé approximation can be extended in such a way that it considers not only moments, but also the behavior of the original pdf around zero (called also zero-behavior in the sequel). We present examples where this extension leads to ME distributions with proper pdf for cases where the original approach results in improper pdf. Moreover, it can give a better approximation of the shape of the original pdf.

The paper is organized as follows. In Section 2, we provide the necessary theoretical background. The extended Padé approximation is introduced in Section 3. The relation between the zero-behavior of ME distribution and its moments is discussed in Section 4. Numerical illustration of the proposed approach is provided in Section 5. Application of ME distributions resulting from the extended Padé approximation in stochastic models is discussed in Section 6. The paper is concluded then in Section 7.

2. Background

The cumulative distribution function of a matrix exponential random variable, , is of the form where is a row vector, referred to as the initial vector, is a square matrix, referred to as the generator, and is a column vector of ones. When the cardinality of the vector is , then the distribution is called order- matrix exponential distribution (ME()). The vector-matrix pair is called the representation of the distribution. Throughout the paper we assume that the distribution is without mass at time zero, that is, , but the extension to the case with mass at zero is straightforward. Note that the entries of may be negative; that is, it is not necessarily a probability vector. If is a probability vector and is the infinitesimal generator of a transient Markov chain, then the distribution belongs to the class of PH distributions.

The pdf of ME distribution can be computed as and the moment generator function can be expressed as where is the identity matrix. The last expression results in a rational function of the form where if there is no probability mass at zero (i.e., if as we have assumed before). If the ME distribution is nonredundant, then the number of poles of (4) is equal to the order of the matrix representation of the ME distribution [27]. In this paper, we consider only nonredundant distributions.

The moments can be obtained from the moment generating function and the th moment is It can be seen from (4) that ME() distribution without probability mass at zero is determined by parameters and, consequently, moments can be matched by ME() if the moments are inside the moment bounds of the ME() family (i.e., if the moments are such that they can be realized by ME() distribution).

There exist methods to construct ME distributions given moments. The applicability of these methods is limited by the facts that(i)explicit moment bounds of the ME family are known only for low-order () ME distributions,(ii)the validity check of the pdf for higher order () ME distribution is an open research problem. In the following we report those results that are relevant to our paper.

For order-2 distributions the PH and the ME families are equivalent and the moment bounds are provided in [19] in terms of the squared coefficient of variation defined as . The bounds are meaning that if three moments satisfy the above conditions, then there exists a proper ME distribution that realizes the moments.

For order-3 ME distributions the moment bounds are not known explicitly. In [21] an algorithm is proposed which, given ME density, checks if it is proper. In some cases the algorithm requires the numerical solution of a transcendent equation. This means that, given 5 moments, one can create ME distribution and then check its validity by the method provided in [21]. This check is implemented in the BuTools package [26].

3. Padé Approximation

Padé approximation for ME distributions exploits the fact that for any distribution the following relation holds between the moment generating function and the moments of the distribution: where we assume having which implies that the pdf is normalized. Accordingly, one may consider approximating the moment generating function by a rational function as where the parameters of the moment generating function of the ME distribution, that is, ,   and ,  , can be determined based on the first moments of the distribution whose moments we aim to match. More precisely, we look for such parameters with which the first coefficients of the polynomial are equal to the first coefficients of the polynomial The above corresponds to solving a linear system of equations given as with if ,  ,  , and if .

The above-described procedure results in a moment generating function that exactly matches the first moments of the original distribution. It may, however, fail as there is no guarantee for the nonnegativity of the corresponding pdf.

The procedure can be extended to capture the behavior of the original pdf around zero. This is done by adding new equations to those provided by (11) and removing those referring to the highest order moments. For example, the pdf of ME() at zero assumes simply the value while the derivative at zero can be obtained as Matching means that the number of moments that can be captured is decreased to , while matching both and means that we can capture moments.

In general, the value of the th derivative of the pdf at zero can be obtained by applying two properties of the moment generating functions. First, the moment generating function of the th derivative of is given by Second, the value at zero can be obtained as Equations (14) and (15) provide a recursive calculation procedure to calculate as function of the parameters of the generating function. For example, in case of , we have

It can be seen from (12) that matching directly determines . Having determined , (13) provides a linear equation that must be satisfied if one aims to capture . In general, capturing for   adds simple linear equations to those that are used to match the moments.

4. Relation of Derivatives at Zero and Moments by Hankel Matrices

In this section, we discuss the relation between the zero-behavior of ME distribution and its moments. Let us consider an order ME distribution with representation and introduce the Hankel matrix where . The entries of have the following interpretation: (i)for we have that is related to the th moment of the distribution through ,(ii),(iii)for we have that is related to the derivative at zero of the cdf of the ME distribution because

The rank of the equals [27] which implies that is nonzero for and zero for . As a consequence, given the quantities , one can determine based on the equation For example, given 3 moments of ME distribution, the fourth moment can be calculated by considering or, given and 2 moments, the third moment can be determined based on

The above relation of the derivatives and the moments can be exploited to transform moment matching procedures into zero-behavior matching procedures or into procedures matching part of the zero-behavior and some moments. Moreover, existing moment bounds can be transformed into bounds regarding zero-behavior.

For example, the set of inequalities provided in (6) can be transformed into inequalities regarding , and by solving (21) for and using the result in (6). This leads to meaning that if , and satisfy the above conditions, then there exists a proper ME distribution that realizes them.

5. Numerical Examples

In this section we provide numerical examples of considering the zero-behavior in constructing ME distributions. In Section 5.1, we show that there are cases in which considering the moments of the distribution leads to ME distribution with improper pdf, while considering also the behavior around zero results in a proper distribution. In Section 5.2, we provide an example for which moment matching gives a proper pdf, but the shape of the pdf can be improved by considering the value of the original pdf at zero instead of considering the highest order moment. In Section 5.3, we show that even if the original characteristics around zero are not possible to capture, it is possible to set the characteristics of the ME distribution around zero in such a way that the fitting of the shape of the pdf improves considerably.

5.1. Proper Distribution by Behavior around Zero

Consider the lognormal distribution whose pdf is given by with parameters

Using our proposed approach, we derived two order-3 ME distributions to match the characteristics of the above pdf. The first one matches 5 moments and has an improper pdf as it assumes negative values around 0. Its representation is given by the vector-matrix pair The second one matches three moments and also and and results in a proper distribution. Its representation is The validity of the second ME distribution was checked by using BuTools [26]. In Figure 1, we depicted the original pdf and the two matching ME distributions. In Table 1, we provide the first five moments of the original and the second matching distribution (the moments of the first one are identical to those of the original distribution). For the second ME distribution, even if the 4th and 5th moments are not matched, they are close to those of the lognormal distribution and also the shape of the pdf is similar to that of the lognormal distribution.

5.2. Improving the Shape of the ME Pdf by Matching Exactly the Behavior around Zero

We consider now the three-parameter Weibull distribution commonly used to model time to failure. The pdf is of the form where is the so-called shape parameter, is the scale parameter, and determines the shift with respect to the two-parameter Weibull distribution. We assume that the parameters are

We constructed two order-2 ME distributions to match the three-parameter Weibull distribution. The first one matches the first three moments of the distribution and has a proper pdf, but the shape of the pdf is rather different from that of the original one. The second ME distribution matches instead two moments and the value of the original pdf at zero . The validity of the matching ME distributions is guaranteed by the fact that the moments are in the area defined by (6). In Figure 2, we show the original pdf and the pdf of the two matching distributions. The representations of the two ME distributions are

In Figure 3, we show instead two order-3 ME distributions. The first one captures 5 moments of the shifted Weibull distribution, while the second one captures 3 moments as well as the value of the pdf and of the first derivative of the pdf at zero. It can be seen that not even 5 moments are enough to have ME distribution whose shape is similar to that of the original distribution. Instead, taking into account the behavior around zero improves substantially the fitting of the shape. The representations of the two ME distributions are

In Table 2, we provide the first five moments of the original shifted Weibull distribution and those of the ME distributions depicted in Figures 2 and 3. The second ME distribution gives a bad fit already of the third moment. It is interesting to note that the second ME distribution gives a slightly better fit of the fourth and fifth moments than the first ME distribution and this is thanks to the better fitting of the shape of the distribution.

5.3. Improving the Shape of the ME Pdf by Approximating the Behavior around Zero

We consider again a lognormal distribution, now with parameters In this case, matching 5 moments results in ME with pdf which is considerably far from the original one. Its representation is Matching and four moments or and and three moments results in ME that are not proper distributions. By observing the original pdf (Figure 4), one can see that it starts from 0, but then it immediately has a steep peak with a gradual monotonic decrease. Setting the characteristics of the ME distribution around 0 as and matching 3 moments result in a much more satisfactory fit of the original one (see Figure 4). The representation of this ME is The choice given in (33) is motivated by the fact that the line approximates well the initial decay of the lognormal pdf right after the peak. In general, numerical values for and can be obtained by applying a linear fit to the original pdf.

In Table 3, we report the first five moments of the lognormal distribution and those of the second ME distribution. The price we pay for fitting better the shape of the distribution is the distance of the 4th and 5th moments of the ME from the original moments.

6. Application of ME Distributions in Stochastic Models

In this section, we show the impact of applying the ME distributions reported in Section 5 in stochastic modeling. In the following three subsections we illustrate three cases:(i)the case when capturing the moments is more important than having a proper distribution or a good fit of the shape of the pdf,(ii)the case when the use of an improper ME distribution leads to state probabilities that are either negative or larger than 1,(iii)the case when capturing well the shape of distribution is more important than capturing the moments and leads to better approximations of the state probabilities.

6.1. M/G/1 Queue

In this section, we illustrate the goodness of the approximation applying the ME distributions in an M/G/1 queue and assuming that the steady state queue length distribution is the measure of interest.

As a first example we apply the two ME distributions presented in Section 5.1. We assume hence that the service time distribution of the queue follows a lognormal distribution with parameters and  . The arrival intensity is such that the probability of the empty queue is 1/4. In Figure 5, we depicted the relative error in the steady state distribution applying the two ME distributions. Both ME distributions capture exactly the probability of the empty queue since it depends only on the mean service time. The other local minima appear in the relative error when the exact and the approximate queue length distributions cross each other. Furthermore, in an M/G/1 queue the first moments of the steady state queue length distribution are determined by the first moments of the service time distribution. Accordingly, the ME distribution with improper pdf, but capturing 5 moments, leads the queue length distribution in which 4 moments are exact. The ME distribution with proper pdf captures instead only 2 moments of the queue length exactly. As the stationary queue length distribution is of a regular shape, it is likely that the more moments are captured the better the approximation is. In Figure 5, it can be observed that indeed the ME distribution with improper pdf, but capturing more moments, outperforms the other ME distribution.

In our second example, we use as service time distribution the three-parameter Weibull distribution of Section 5.2 and apply as approximations the two ME distributions given in Figure 3. The relative error in the stationary queue length distribution is depicted in Figure 6. The same reasoning applies as in the preceding paragraph: both ME distributions capture the empty queue probability and one of them captures 4 moments of the queue length while the other one matches only 2. In this case as well, the ME distribution that captures more moments gives more precise steady state probabilities.

6.2. Improper State Probabilities

We use in this section an extremely simple model to show that the negativity of the pdf of ME distribution, when applied in a stochastic model, can lead to negative or larger-than-one state probabilities. The model is depicted in Figure 7(a) and formulated as a stochastic Petri net [7]. It contains two concurrent activities modeled by transitions and . There are four possible states, as the tokens can be distributed in the following four ways: , , , and . The state transition diagram is depicted in Figure 7(b). At the beginning firing times are chosen for both and according to their firing time distributions. Let us denote these firing times by and . If , then after time units fires and the state becomes . Subsequently, after time units, fires and the model arrives to its final state, that is, to state . If , then fires first, leading to state , and afterwards the firing of takes the model to state .

We assume that the firing time of transition is according to the ME distribution with improper pdf reported in Section 5.1, and the firing time of is exponential with parameter . As described in [4], the overall behavior of the model can be described by a block matrix which algebraically plays the same role as the infinitesimal generator of a Markov chain (if the firing time distributions of and were of PH, then would be a proper infinitesimal generator). This block matrix is of the form where is the generator of the ME distribution and the 0s represent zero matrices of appropriate sizes. The block rows of the matrix correspond to the states in the order , , , and . For a detailed description of the construction of the reader is referred to [4]. Here we mention only that(i)the matrix describes the parallel evolution of transitions and ,(ii)the column vector describes the firing of transition ,(iii)the matrix describes the firing of maintaining the current state of .

We assume that the initial state is and, accordingly, the initial situation is described by the vector , where is the initial vector of the ME distribution. State probabilities at time then can be obtained as and were the descriptors of a Markov chain; that is, we need to calculate the vector and sum up the entries according to the block structure [4]. As is small, namely, it is an 8 times 8 matrix, the calculation of can be done easily in many numerical computing environments.

In Figures 8 and 9 we depicted the transient state probabilities of the model for two different values of . In the first case, two states have negative probabilities in the time interval , while in the second case we find one state with larger-than-one probabilities in and another state with negative probabilities in . This means that the negativity of the pdf of the applied ME distribution can easily lead to negative and larger-than-one state probabilities in the overall distribution of the model. Moreover, the way this phenomenon presents itself depends on the remaining parameters of the model.

6.3. Importance of the Shape of the Pdf

In this section we evaluate a system composed of two machines and an intermediate finite buffer (depicted in Figure 10). This small system is often used as a building block in the approximate analysis of long production lines [28]. The first machine puts parts into the buffer and the time needed to produce a part follows the lognormal distribution considered in Section 5.2. The production of the current part is subject to a forced restart mechanism (the use of this mechanism in computing systems is studied in detail in [29]); that is, after a given amount of time the part under operation is discarded and the machine starts the production of a new one from scratch. Forced restart happens after an exponentially distributed amount of time with parameter . Since the lognormal distribution has a heavy tail, the forced restart mechanism decreases the mean production time of the machine. The second machine takes parts from the buffer and the time needed to treat a part follows a two-stage Erlang distribution with mean equal to 4. The first (second) machine is blocked when the buffer is full (empty). The buffer can accommodate at most 5 parts at a time.

We will evaluate the system both by simulation and by applying the two approximating ME distributions introduced in Section 5.2. Denoting by the representation of the approximating ME distribution and by the representation of the two-stage Erlang distribution, the behavior of the model is captured by the matrixwhere the blocks in the diagonal represented by are equal to and and denote the Kronecker product and sum operators, respectively. The first block row corresponds to empty buffer (only the first machine works), while the last one refers to the full buffer (only the second machine works). As before, we do not detail the construction of but mention that (i)the matrix describes the evolution of the first machine together with possible forced restarts that reinitialize the corresponding ME distribution,(ii)the term captures the parallel execution of the two machines which is then corrected by the term in order to include the forced restart mechanism,(iii)the matrix describes the completion of a part as well as the restart of the first machine and maintains the state of the second one (the term does the opposite),(iv)the term describes the start of a new job for both machines (this term is used when the system leaves either the empty or the full states). The matrix is 42 times 42. The first (last) three rows correspond to the empty (full) buffer situation and the remaining rows describe the intermediate buffer levels. The transient analysis can be performed easily using a numerical computing environment.

We evaluated the model with and . For what concerns simulation, the presented results are based on 50000 runs which are enough to obtain a satisfactory indication of the target distribution used to check the goodness of the approximation provided by the two approximating ME distributions. The analysis based on the ME distributions requires a few seconds on an ordinary laptop, while 50000 simulation runs require about one minute.

The results in case of , that is, without forced restart mechanism, are depicted in Figure 11. It can be seen that the second ME distribution, which captures only three moments while taking into greater account the shape of the original pdf, outperforms the first ME distribution that captures 5 moments. It gives a more precise view on both the transient period and the long run probabilities.

In Figure 12 we provide the results for . The state probabilities are less precise than in the previous case. This is due to the fact that the forced restart mechanism alters the moments of the time needed by the first machine to produce a part. This change in the moments is not captured well in the approximating model. For instance, in the original model the forced restart mechanism decreases the mean time to produce a part in the first machine from 13.7 to 9.6. In the approximating model with the first ME distribution the same mean is about 13 and with the second ME distribution it is 11.4. In this case, the second ME distribution yields a better approximation of the state probabilities as well.

At last, we evaluated the model with (Figure 13). In this case the forced restart is more frequent and in the original model the mean time to produce a part for the first machine is 5.6. The same mean in the approximating model is about 13 with the first ME distribution and about 6 with the second ME distribution. The second ME distribution captures better the real mean because its pdf follows much more precisely the original lognormal distribution in the interval . Accordingly, the second ME distribution gives much better approximation of the state probabilities as well.

6.4. Discussion

There exist models, like the M/G/1 queue, in which the moments of the involved random variables determine directly moments of the measure of interest. In case of these models, if the pdf of the measure of interest is likely to have a regular shape, then it can be more convenient to capture more moments even if the shape of the approximating is a poor representation of the original one. We have seen (Section 6.1) that even a pdf with negative values can give more accurate results if it matches more moments than a proper pdf. There is no guarantee however that the negativity of the pdf does not appear in the distribution of the measure of interest, thus providing undesirable effects. Indeed, it is easy to encounter models (like the one in Section 6.2) in which the negativity of the pdf translates into negative transient probabilities. Hence it is always advisable to avoid the use of improper distribution functions.

When distributions are placed into a more complex context then the goodness of fitting the shape of the pdf can have a strong impact on the adequacy of the approximation of the overall system behavior. We have shown in Section 6.3 that this happens when two or more activities are performed in parallel and it is important to capture precisely the probability that one is completed before the others.

7. Conclusions

In this paper, we proposed using the behavior of the pdf around zero in constructing matrix exponential distributions. We have shown that matching these characteristics can be incorporated easily into the well-known Padé approximation. We illustrated by numerical examples that matching the behavior around zero can be beneficial when matching only the moments results in improper density functions or in a density function whose behavior differs a lot from the original one when evaluated at points close to zero.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work has been supported in part by “Advanced Methodologies for the Analysis and Management of Future Internet” (AMALFI) project sponsored by Universitå di Torino and Compagnia di San Paolo and by Project Grant no. 10-15-1432/HICI from King Abdulaziz University, Saudi Arabia.