Table of Contents Author Guidelines Submit a Manuscript
Computational and Mathematical Methods in Medicine
Volume 2018, Article ID 6367243, 13 pages
Research Article

Understanding Dynamic Status Change of Hospital Stay and Cost Accumulation via Combining Continuous and Finitely Jumped Processes

1School of Finance, Zhejiang University of Finance and Economics, China
2School of Data Sciences, Zhejiang University of Finance and Economics, China

Correspondence should be addressed to Xiaoqi Zhang; ude.olaffub@hziqoaix

Received 1 March 2018; Accepted 23 April 2018; Published 10 June 2018

Academic Editor: Kazuhisa Nishizawa

Copyright © 2018 Yanqiao Zheng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


The Coxian phase-type models and the joint models of longitudinal and event time have been extensively used in the studies of medical outcome data. Coxian phase-type models have the finite-jump property while the joint models usually assume a continuous variation. The gap between continuity and discreteness makes the two models rarely used together. In this paper, a partition-based approach is proposed to jointly model the charge accumulation process and the time to discharge. The key construction of our new approach is a set of partition cells with their boundaries determined by a family of differential equations. Using the cells, our new approach makes it possible to incorporate finite jumps induced by a Coxian phase-type model into the charge accumulation process, therefore taking advantage of both the Coxian phase-type models and joint models. As a benefit, a couple of measures of the “cost” of staying in each medical stage (identified with phases of a Coxian phase-type model) are derived, which cannot be approached without considering the joint models and the Coxian phase-type models together. A two-step procedure is provided to generate consistent estimation of model parameters, which is applied to a subsample drawn from a well-known medical cost database.

1. Introduction

Rising expenditures and constraints on health care budgets have prompted the development of a variety of methods for the analyses of hospital charge and length of stay (LOS) as discussed in Gold [1], Lipscomb et al. [2], and Lin et al. [3]. Correctly fitting the charge and LOS data is a critical step in optimizing the allocation of healthcare resources. But due to the protection of private information, the detailed information regarding the treatment process that patient experience in hospital is not available from many well-known medical outcome databases, like the New York State’s Statewide Planning and Research Cooperative System. The missing longitudinal information of the treatment process makes it more challenging to generate good fitting; meanwhile it becomes demanding to have a dynamic model, through which effective inference can be made against the hidden treatment process. To that goal, a bunch of stochastic-process-based models have been well developed and applied to analyze the medical datasets.

The continuous-time Phase-Type (PH) model has been widely used in the study of hospital charge and LOS data. Many authors focus in particular on a special subclass of PH model/distribution, namely, the Coxian phase-type (CPH) model/distribution Tang [4]; Faddy et al. [5]; Marshall et al. [68]; Fackrell [9]. Unlike other popular theoretical distributions widely used in inpatient data, such as log-normal and gamma distribution, the CPH model/distribution not only provides a theoretical distribution that can be used to fit the empirical data, but also gives us a sketch of the treatment dynamics that patient experience in hospital. In fact, from CPH models, we can track the pathways that patient went through in different medical stages (characterized by the discrete set of phases in the PH model) during a hospital stay. The pathway information makes it possible to cluster patients and facilitate the use of healthcare process improvement technologies, such as Lean Thinking or Six Sigma McClean et al. [10, 11].

The other popular approach to study hospital charge and LOS is through dynamically modelling the charge accumulation process and the determination of the time to discharge, which belongs to a more general class of joint models of the longitudinal measurements and time to event, Ibrahim et al. [12]; Tsiatis and Davidian [13]; Henderson et al. [14]; Kim et al. [15]; Sousa [16]; Lawrence Gould et al. [17]. In medical cost studies, the charge accumulation is a monotonic nondecreasing process; the joint model used in this case is reduced to a class of random growth with random stopping time (RGRST) models.

Like CPH models, the RGRST models do also capture the treatment dynamics that patient experience in hospital. But in contrast to tracking the pathways of patient moving through different medical stages, the RGRST models focus more on describing how patient and/or doctor makes the discharge decision in reaction to the change of actual charge level and the length of time that patient has stayed in hospital. Therefore, the story of RGRST models is more about the behavioural patterns of patient/doctor behind the treatment dynamics, while the story of CPH models is more on the medical side.

It is natural in this paper to think of the possibility of combining CPH models and RGRST models together in order to extract more information regarding the discharge decision-making on different medical stages. However, there is a natural gap between the two models. The CPH model is a finitely jumped stochastic process in essence, while the charge accumulation in the RGRST model is continuous. It is not trivial to combine a jump process with a continuous process. To deal with that difficulty, we propose a partition-based approach with each partition cell determined by solving a boundary differential equation. These boundary differential equations are subtly designed to merge the continuous charge into discrete “phases” involved in a Coxian phase-type model. In sum, the main contributions of this paper are as follows:

(i) We show that there is a natural way to convert a special subclass of RGRST models to CPH models.

(ii) We propose an algorithm to estimate the transition matrix of the CPH model converted from a given RGRST model and the parameters involved in that RGRST model.

(iii) Based on the correspondence between RGRST models and CPH models, we derive a variety of different measures of the “cost” of staying in a medical stage at each time. That “cost” information is important for the purpose of insurance payment and healthcare process improvement.

McClean et al. [11] tried a different way to incorporate the charge accumulation process into a CPH model. But in their work only the case that the charge accumulation process adopts a piece-wise linear form was discussed. It turns out that the piece-wise linear assumption is quite restrictive while crucial to their main result. Without it, the matrix technique in McClean et al. [11] is no longer applicable to achieve the th order moments of total charge for , while our differential-equation-based approach does still work. In fact, we believe our method extends the work of McClean et al. [11] in the following two aspects.

Instead of being piece-wise linear, we consider a much more general situation in which the charge accumulation process can take arbitrary forms as long as a conditional expectation function of that process satisfies a general regularity condition. In particular, within our framework, it is possible to consider the potential influence of the current charge level on the future charge accumulation which is neglected by the piece-wise linear assumption.

In addition to the moments of total charge, it is derivable from our model of the joint distribution of the total charge and LOS, and the joint distribution of the costs and time being spent on every stage by every fixed time . Therefore, our model provides more detailed information of the treatment that the patient experiences in hospital.

Although the motivation of our work is the analysis of the charge accumulation and the determination of hospital length of stay, it turns out that the proposed method is useful for many other problems where the relation among the time to event and a hidden continuous process as well as a jump process is in interest. For example, in the field of investment risk management, it is always important to detect how the default probability of the corporate bond issued by a firm is affected by the growth stage and profitability (say measured by the flow of revenue) of that firm. In this case, our model can definitely provide some insights if we identify the default as the event in interest and consider the revenue flow as determined by a continuous process similar to the charge accumulation and the transition among different growth stages of the firm as described by a CPH process. In addition to problems of the survival-type, it is also natural to extend our work to the case of competing risks, of which every stage in our model can be identified with a type of risk. Although in competing risk models, the CPH transition matrix is no longer sufficient, it turns out that the partition-based technique introduced below is extendible to derive the joint distributions of a wide class of the competing risk models, the details of which will be discussed in a related paper by the authors.

This paper is organized as follows. In Section 2, after a short review of the CPH models and RGRST models, we present the correspondence between them and briefly introduce the estimation algorithm. In Section 3, we conduct numerical studies to show the validity and usefulness of our model. A couple of interesting findings toward the medical outcome database, the New York State’s Statewide Planning and Research Cooperative System 2013, are discussed. Section 4 concludes the paper.

2. Model

In this section, a new model (denoted as CPH-RGRST model) is constructed that connects the CPH models to RGRST models in the sense that

a CPH-RGRST model is a RGRST model;

charges in a CPH-RGRST model can be classified into a number of stages such that every stage is identified with a phase in a given CPH model in the sense that, at every time , the probability of staying in a stage is exactly given by the probability in the th phase of the CPH model.

In particular, the marginal distribution of LOS induced by a CPH-RGRST model is a CPH distribution. We shall state the detailed construction of the CPH-RGRST models after a brief review of the definition and some basic properties of RGRST models and CPH models.

2.1. the Joint Model (RGRST) versus the CPH Model

A RGRST model can be formally defined as follows as discussed in Gardiner et al. [18, 19] and Polverejan et al. [20]:

where the process represents the actual charge level at each time. The random variable indicates the LOS, and is the indicator function. ( for short) is the event process representing whether or not to stay in hospital for longer time at each time point . is a nonnegative process characterizing the potential increment rate of charge per unit time provided that patient decides to stay, and is a nonnegative random variable representing the charge at the initial time. We shall denote by the potential charge accumulation process in distinguishing the actual charge process .

As shown in the supplementary materials (available here) note that a RGRST model can be completely specified by the initial probability density function (pdf), , induced by the initial charge and the following two conditional expectation functions:

And using (2), the joint probability density function (pdf) of the LOS () and the total charge () at the discharge time can be expressed as follows:

where the function in variable is the time-dependent pdf induced by . The detailed derivation of (3) can be found in the supplementary materials. Expression (3) is useful in the estimation algorithm stated in the next section as it is the key component of the likelihood function.

To associate RGRST models with the CPH models, the hospital length of stay, represented as the random variable in (1), should induce a CPH distribution generated from a CPH model, which is a finite-state continuous-time stationary Markovian process with only one absorbing state/phase (we shall use the term “phase”, by convention, in place of “state”). A CPH model is determined by an initial probability mass vector with and , and the transition intensity matrix

where and the entry of represents the transition intensity of a patient from phase to phase at every time ; formally:

As suggested in McClean et al. [10], a phase in a CPH model can be identified with a treatment stage during hospital stay, such as diagnosis, acute care, assessment, rehabilitation, and long-stay care. The transition of patients among these stages characterizes the treatment progress.

2.2. Correspondence between CPH and RGRST Models

The main result of this section is that there does exist a correspondence between CPH and RGRST models. The correspondence is built through converting the continuous variable, charge, in a RGRST model to finite many discrete states by partitioning the product space, (representing charge and time, respectively), into a number of cells such that each cell corresponds to a phase in a CPH model, while the evolution of the probability of staying in those cells is exactly determined by the given CPH model. More precisely, we have the following theorem.

Theorem 1. Fix a RGRST process represented as a triple with being the pdf of initial charge and , as defined in (2). Suppose functions , , and are smooth and , satisfyfor some constant . Then, for any fixed positive integer , an -dimensional vector with , and an -dim vector , there exists an partition of the space denoted as such that the following time-dependent probability mass function defined on the tuple :is generated by a CPH model with the initial mass and its transition matrix is given as in (4) with for .

The proof of Theorem 1 is presented in the Appendix. From the proof, it is clear that the connection between RGRST models and CPH models is equivalent to a constraint put on the conditional probability function in (2) of the underlying RGRST model by the condition (6). In fact, the functional form of is completely determined by (6) and the function , which gives a first-order partial differential equation (PDE) of . This equation turns out to be solvable and has a unique solution for a given boundary condition. Therefore, using the characteristic method, Evans [21], we can solve (6) and express the function as follows:

where evaluated at is constantly which means that all patients have to stay in hospital for a positive time before discharge; the form of the boundary and the value of the function on (denoted as ) are constructed by iteratively solving the Initial Value Problem (IVP) (A.11) in the proof; the details of the iteration are presented in Algorithm 1 of Corollary 2. is the first time when the solution trajectory () of IVP (A.11) (starting from ) touches the boundary curve . Equation (8) is crucial to determining the parametric form of the joint pdf (3) and the likelihood function used for estimation.

Algorithm 1: Construct_.

The next corollary is a direct result of Theorem 1. It extends the construction in Theorem 1 to a more general situation where the transition intensity from different transient states to the absorbing state does not have to be identical; i.e., does not have to be equal to for different . Therefore, it is always possible to achieve an arbitrary CPH model from a RGRST model satisfying a generalized version of condition (6) with replaced by for different .

Corollary 2. The smooth requirement on the function in Theorem 1 can be replaced by the following weaker condition.
Condition 3. The function is continuous and almost everywhere differentiable under the standard Lebesgue measure on and has integrable partial derivatives.
Under the Condition 3, for an arbitrary given CPH model represented by the transition matrix (4) and the initial probability mass vector , there always exists a RGRST model together with a set of partition curves such that the mappingconverts the RGRST model to the given CPH model, where is the th phase in the CPH model.
Moreover, the desired RGRST model and the partition curves can be inductively constructed through Algorithm 1, where “” represents the operation to solve the equation “”.

Notice that given the partition curves and the joint pdf (3), deriving the conditional probability density of the cumulative charge is very simple given that at time patients stay in the th-stage, :

With the help of the conditional density (10), we can define a variety of measures of the “cost” of staying in a stage. For instance, fixing a stage and a time , we can think of the price as the amount that has been charged since patients arrived in that stage for the first time, the daily price as the amount being charged per day within that stage, and the time cost as the length of time that patients have spent in that stage by . Formally, the price (), daily price (), and the time cost () for every stage and every time are defined as:where is the conditional mean of the first arrival time into the stage given the charge level and current time , and similarly, represents the conditional mean of the charge at the first arrival time to the stage given and .

Although the price, daily price, and the time cost are defined through the first-order moment, the availability of the conditional probability (10) enables us to define the quantile version of (11). When there are large parts of outliers, a quantile version of those “cost” measures turns out to be more useful.

The information regarding the price and time spent in every such stage, as defined above, is helpful in rationalising the care process, thus reducing waste, in terms of unnecessary or inappropriate treatment, and avoiding delay, often the result of batch and queue processes, in a similar fashion to that adopted for industrial processes (McClean et al. [10]).

2.3. A Two-Stage Algorithm

Corollary 2 implies a two-step algorithm that uses the real hospital charge and LOS data as input to estimate the underlying CPH model and the RGRST model from which the CPH model is derived.

Step 1. Apply the full information maximum likelihood method (FML) and the marginal LOS data to estimate the transition matrix and the initial probability mass that determines the marginal CPH distribution of LOS. The resulting estimators are denoted as , , and .

Step 2. Apply Algorithm 1 to construct the function from the estimators , , and , and construct the joint pdf of charge and LOS by formula (3). With the joint pdf, construct the likelihood function and apply FML to estimate the remaining parameters, which are used to characterize the function and the initial density (denoted by ).

The use of FML guarantees that all estimators obtained from the two-stage algorithm are consistent and asymptotically normal-distributed.

3. Numerical Studies

In this section, we conduct the numerical studies to show the validity of our two-stage estimation procedure. We will apply our procedure to both of the real data and simulation sample.

Our data source is the medical outcome database, New York State’s Statewide Planning and Research Cooperative System 2013 (SPARCS 2013). The histogram of the entire SPARCS 2013 indicates that the total charge approximately follows a log-normal distribution; therefore, we will take the following parametric form for the function :

and the initial is assumed to satisfy

It turns out that under (12), (13), and (6), the resulting marginal distribution of total charge is close to a log-normal distribution.

When covariates exist (denoted by ), we assume that the random vector is independent from the covariate vector, and follows the joint pdf given by (3) with the initial pdf, , and specified as in (13), (12), and (6), respectively. The covariates are linked with the total charge, , and LOS, , through the following regression equations:

, are the regression coefficient vectors.

As for the dimension of the underlying CPH model, we follow the convention in the previous studies of Faddy et al. [5]; Tang [4]; McClean et al. [10] and only consider the two cases where the number of nonabsorbing phases is 3 and 4. After a preliminary study, we select the 4-Phase CPH model as it can generate better fitting to the SPARCS data.

Under the specification above, there are three classes of parameters to estimate. They are the parameter vectors , and involved in the CPH model, the parameter involved in the initial pdf, and the regression coefficients . We call the parameters of types and as the dynamic parameters because they specify the joint model that generates the distribution of charge and LOS.

In the real data study, we draw 5000 subsamples from SPARCS 2013 with the covariates consisting of the Severity of Illness, Mortality Risk of Illness (In SPARCS 2013, both of the two variables, Severity and Mortality, are quantified through a grading score, which is a number in the set .), and 24 categorical variables which represent 25 Major Diagnosis Codes (MDC), each of which associates with a class of illness. The summary statistics of our subsample verses the entire SPARCS 2013 with respect to the covariates are described in Table 1.

Table 1: Descriptive statistics of SPARCS 2013.

In the simulation study, we generate 5000 samples from a joint model without covariates and the true value of the dynamic parameters is taken as the estimated value from the real data study, which is given as in Table 2.

Table 2: Estimated dynamic parameters.

In both of the real data and simulation studies, the computer code is written in the language of Python 2.7 with python-scipy, python-numpy libraries being used.

3.1. Simulation Study

The goodness of fit is measured through comparing the fitted curves and the empirical histogram (drawn from the simulation sample) for both of the marginal charge and LOS, as shown in first line of Figure 1. We conduct Pearson’s test; the value of the statistics and the associated values are (0.0171, 1.0) for the marginal charge and (6.1054, 0.8662) for the marginal LOS. From both of the fitting plots and the results of test, our fitting is fairly good.

Figure 1: Goodness of Fit. Plots 1 and 2 are the fitted marginal CPH-RGRST distribution versus empirical histogram for log-charge and LOS.

We also evaluate the goodness of fit in terms of the joint distribution through Pearson’s test; the statistics and its value are (0.1911, 1.0), which is consistent with Figure 1. Therefore, the simulation study verifies the effectiveness of our two-step estimation procedure.

3.2. Real Example Study
3.2.1. Regression Coefficients

The estimated regression coefficients are reported in Table 3, from which both of the severity and mortality of illness have significantly positive effect on both of the total charge and LOS that is consistent with the intuition.

Table 3: Estimated regression coefficients.

On the other hand, among all the MDC groups, the Newborn And Other Neonates (MDC_15) and the Diseases and Disorders of the Musculoskeletal System And Connective Tissue (MDC_8) has the greatest negative and positive effects on the total charge, respectively, which is also consistent with the intuition. In contrast, the MDC groups with greatest negative and positive effect on LOS are the Diseases and Disorders of the Ear, Nose, Mouth and Throat (MDC_3) and the Mental Diseases and Disorders (MDC_19), respectively.

In addition, it turns out that the effects of different illnesses on the charge and LOS are not always homogeneous. There are a couple of MDC groups which affect the total charge and LOS in distinct direction. They are the Diseases and Disorders of the Nervous System (MDC_1), Diseases and Disorders of the Circulatory System (MDC_5), Diseases and Disorders of the Male Reproductive System (MDC_12), Diseases and Disorders of the Female Reproductive System (MDC_13), and Alcohol/Drug Use or Induced Mental Disorders (MDC_20). Among them, except the MDC_20 group, all the other groups have a more expensive bill but shorter hospital stay, and therefore a higher daily charge. In contrast, patients with alcohol/drug abuse tend to pay less but stay in hospital longer.

Combining the estimated coefficients in Table 3 and dynamic parameter in Table 2, we can even identify the stage from which a patient exits to discharge. The discharge stage encodes critical information of the treatment pathways and is important for management purpose. Table 4 reports the estimated conditional mean (log-)charge, LOS, severity, and mortality risk for patients who exit from every stage. It is clear that the value of all the four variables monotonically increases as patients get discharged from later stages. Especially for severity and LOS, there are clear jumps from stages 1 and 2 to stages 3 and 4. The mean severity is doubled when transiting from stage 2 to stage 3 while the mean LOS almost gets tripled. This gap suggests that patients who are diagnosed with more severe conditions during admission are more likely to go over the treatment in stage 3 or 4, while their rehabilitation usually takes more time and more medical resources (McClean et al. [10]). This observation is consistent with the intuition and, to some extent, verifies the viewpoint that interprets the entire treatment process as a series of transitions among multiple medical stages.

Table 4: Summary of discharge stages.
3.2.2. Cost

As discussed in the end of Section 2.1, the CPH-RGRST model enables us to evaluate the “cost” of each medical stage in different manners. Using the estimation results provided in the previous section, we can numerically compute the “cost” for our SPARCS sample.

In Figure 2, we plot the estimated mean price, mean daily price, and mean time of staying for each of the four stages of the CPH model, where the “mean” refers to the CPH-RGRST process that generates the mean charge and LOS, , and . In Figure 3, the probability of staying in every nonabsorbing stage is plotted against the time. There are the following three major findings.

Figure 2: Cost of Stages. Plots 1, 2, and 3 sketch the log of the price, daily price, and the time cost (as defined in (11)) versus their quantile version, respectively. Plot 4 shows the other version of the time cost ( with ) versus its quantile version.
Figure 3: Probability of staying in every stage by time.

(i) From the plot 2 in Figure 2, the daily price for stage 1 declines over time, which is caused by the fact that, for those long-stay patients, they must have already switched into the higher stage treatments after the preexam period (represented by stage 1), which is very well captured in plot 1 of Figure 3. In contrast, for all the stage 2, 3, and 4, the daily price inclines to grow up in long run, which rejects the piece-wise linear assumption claimed in McClean et al. [11]. In fact, in contrast to the constant growth rate of charge within each stage, the increasing growth rate tends to be more reasonable, because a longer stay usually implies a worse health condition for a patient, who, therefore, needs better care, including more expensive medicines, more frequent exams, and the like. These items lift up the cost of stay per day. The same reasoning also applies well to the observation that the time cost of all stages is increasing over time as shown in plots 3 and 4 in Figure 2.

(ii) Although the time cost is slightly lower in stage 3 than in stage 4, both of the two stages (by (11), the time cost for stage 1 is trivial and constantly equal to the total time in hospital, so we omitted it in Figure 2) have their time cost almost identical to the total time that patients spent in hospital since they were admitted. In contrast, the time cost of stage 2 displays quite different features, which is not only much lower than that of the other stages, but, within the first 13 days, its growth rate is also slower. The different features of stage 2 are consistent with the estimated dynamic parameters in Table 2 and plot 2 in Figure 3. From Table 2, it is clear that the intensity of switch-in and switch-to-discharge in stage 2 is significantly higher than in the other stages, which means that there are two factors that lower down the time cost at stage 2. There are a large portion of patients switching from stage 1 to stage 2 in the early time (<5 days, see plots 1 and 2 in Figure 3); in contrast there is almost no patient who could switch from lower stage to stage 3 or 4 (see plots 3 and 4 in Figure 3), which implies on average that the first arrival time to stage 2 is later than to stages 3 and 4. The portion of patients switching out of stage 2 (mainly to discharge by plots 3 and 4 in Figure 3) is also high, which is not the case for stages 3 and 4 (reflected as the scale of plots 3 and 4 being much smaller than plots 1 and 2 of Figure 3). Therefore, the switch-out time from stage 2 is earlier than from stage 3 or 4 on average.

Factors and shown in Figure 2 and Table 2 indicate that stage 2 should associate to the major treatment procedures, like the main surgery, that most inpatients have to experience when staying in hospital. In fact, it is usual that patients need a couple of days as the preparation period before the main surgery, such as the period for preexams. This preparation period is exactly captured by factor of stage 2. On the other hand, patients usually recovered soon after the main treatment procedure gets done, and then are discharged, which is reflected by factor of stage 2.

(iii) The third interesting observation from Figure 2 is regarding the median price of stage 2. From plots 1 and 2 in Figure 2, we can see that only at stage 2 there is a clear deviation between the expectation version and the median version of the price. More precisely, at stage 2 the median price is significantly lower than the price defined through the first-order moment, and this fact holds at almost all the time and also holds for the daily price. It is well known that when the median of a distribution is below its first-order moment, there exists a group of outliers with extremely great value. In the other words, Figure 2 indicates that a portion of patients in stage 2 are charged much higher than the others in that stage and all the time. From the perspective of patient’s welfare and the effective allocation of medical resources, it is meaningful to have some further researches in identifying the causes that make some patients in stage 2 being charged more.

Remark 3. By Figure 3, it is clear that the probability of staying in stages 3 and 4 is small over all time. This fact might be induced by model overfitting as pointed out by a referee, but it is not. In contrast, the low probability reflects a deep-level distributional property of SPARCS data. A very large proportion of inpatients recorded in SPARCS only have extremely short hospital stay and 99+ percent of them get discharged by day 10, while no more than 0.01 percent of patients can stay in hospital for more than 25 days. But at the meantime, there do exist a small group of patients who can live in hospital for a couple of months before discharge. The same pattern can be observed for the total charge; the most expensive expenditure can take million dollars while more than 99 percent of patients are charged no more than 100,000 dollars.

Based on the observation above and the nondecreasing design of the CPH-RGRST models that higher stages correspond to longer hospital stay and higher total costs, the low probability of staying in the highest two stages just reflects a fact that both of the charge and LOS data in SPARCS 2013 have a very long and thin tail to the right. This tail property may not be well fitted if a CPH-RGRST model with fewer phases is used, because there will not be enough freedom to distinguish the portion of patients in the tail from those whose charge and LOS stay around the mode.

4. Summary

We introduced a methodology whereby the widely used CPH models and RGRST models can be combined together and a variety of measures of the cost of phases in the CPH model can be defined. A two-step procedure is proposed to estimate the combined CPH-RGRST model and the simulation study is done to verify the effectiveness of the estimation procedure. With the data sampled from SPARCS 2013, we estimated a four-phase CPH-RGRST model and drew the cost curves for every phase. To distinguish the effect of different types of illness on the charge and LOS distribution, we incorporated MDC groups and the severity and mortality risk of illness as covariates into the estimation.

We found that the effect of illness on the total charge and LOS is not always homogeneous. In particular, there are five MDC groups that affect the charge and LOS in different direction. Among them, there is only one MDC group, representing the alcohol/drug abuse, which has the negative effect on the final charge while it lifts up the LOS drastically.

The daily charge for all the stages, 2, 3, and 4, is increasing over time. This fact implies a nonlinear charge accumulation process within every stage and therefore contradicts the piece-wise linear assumption used by the other authors, McClean et al. [11]. We believe that the increasing daily charge is more realistic and reflects dynamic interaction between the health condition of patients and the treatment they accept.

Among all the four stages, stage 2 shows quite different features in both the price measure and the time cost measure. In terms of the time cost, stage 2 is significantly lower than stages 3 and 4 almost all the time. This observation is consistent with the relatively high switch-in and switch-to-discharge intensity that the stage 2 has and associates the stage 2 with the major treatment procedures that most patients need to experience when staying in hospital.

The median is much lower than the mean of both the price and daily price in stage 2, while this kind of deviation does not exist for the other stages, and it implies that there is a portion of outlier patients who are charged much more than the other patients in stage 2. We believe that further studies are needed to find out the causes of those patients being charged more in stage 2, since it matters to the efficiency of the allocation of medical resources.


Proof of Theorem 1

The main idea of the proof is to construct the partition by induction. Assume, firstly, that the partition is formed by curves in (denoted as for ) satisfying increasing condition as follows:

in such a way that for (for simplicity of notation, we assume .). So, to prove the theorem, it suffices to find out a family of increasing curves with the induced transition matrix being as stated in the theorem.

Firstly, notice that, by condition (6), we have for and any family of n curves satisfying increasing condition (A.1) the fact that the following holds:Moreover, becausewe can conclude that no matter the choice of , the transition probability matrix always has its column of the following form:On the other hand, the increasing condition (A.1) and the nonincreasing property of the event processes guarantee that for all with or and small enough That is, the induced transition matrix satisfiesSo, to prove the theorem, it suffices to construct a family of increasing curves guaranteeing that However, these two conditions are equivalent to finding

a sequence with and such that

(2) a sequence of solutions with to initial value problems (IVP) for

Condition (A.9) arises from the following equality:where the time-dependent density function of the potential charge process has the form of . The involved function , viewed as a family of functions in variable , solves the following family of IVPs:and is defined as the inverse to such that .

It is easy to check that condition does always hold. Finally, thanks to the existence and uniqueness theorem of solutions to IVPs, the solution curves do exist to the IVPs in (A.9).

Data Availability

The data used in this paper is sampled from New York State’s Statewide Planning and Research Cooperative System 2013, which is publicly available online at the following url: The specific sample used in this study and the python code for processing the data and implementing the estimation and simulation study are available from the corresponding author on request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This work was partially supported by the NSFC under Grant no. 11271317 and Zhejiang Provincial Natural Science Foundation under Grant no. LY16A010007, First Class Discipline of Zhejiang-A (Zhejiang University of Finance and Economics-Statistics).

Supplementary Materials

A note on the joint probability density function of a stochastic growth process with random stopping time. The technical proof of expression (3) is too complicated and left in the supplemental note. (Supplementary Material)


  1. M. Gold, “Panel on cost-effectiveness in health and medicine,” Medical Care, vol. 340, no. 12, pp. 197–S199, 1996. View at Google Scholar
  2. J. Lipscomb, M. Ancukiewicz, G. Parmigiani, V. Hasselblad, G. Samsa, and D. B. Matchar, “Predicting the cost of illness: A comparison of alternative models applied to stroke,” Medical Decision Making, vol. 18, no. 2, pp. S39–S56, 1998. View at Publisher · View at Google Scholar · View at Scopus
  3. D. Y. Lin, E. J. Feuer, R. Etzioni, and Y. Wax, “Estimating medical costs from incomplete follow-up data,” Biometrics, vol. 53, no. 2, pp. 419–434, 1997. View at Publisher · View at Google Scholar · View at Scopus
  4. X. Q. Tang, Modeling hospital length of stay and cost with heterogeneity [Ph.D. thesis], Mechigan State University, Department of Statistics, 2012.
  5. M. Faddy, N. Graves, and A. Pettitt, “Modeling length of stay in hospital and other right skewed data: Comparison of phase-type, gamma and log-normal distributions,” Value in Health, vol. 12, no. 2, pp. 309–314, 2009. View at Publisher · View at Google Scholar · View at Scopus
  6. A. H. Marshall, B. Shaw, and S. I. McClean, “Estimating the costs for a group of geriatric patients using the Coxian phase-type distribution,” Statistics in Medicine, vol. 26, no. 13, pp. 2716–2729, 2007. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  7. A. Marshall, C. Vasilakis, and E. El-Darzi, “Length of stay-based patient flow models: Recent developments and future directions,” Health Care Management Science, vol. 8, no. 3, pp. 213–220, 2005. View at Publisher · View at Google Scholar · View at Scopus
  8. A. H. Marshall, S. I. McClean, C. M. Shapcott, and P. H. Millard, “Modelling patient duration of stay to facilitate resource management of geriatric hospitals,” Health Care Management Science, vol. 5, no. 4, pp. 313–319, 2002. View at Publisher · View at Google Scholar · View at Scopus
  9. M. Fackrell, “Modelling healthcare systems with phase-type distributions,” Health Care Management Science, vol. 12, no. 1, pp. 11–26, 2009. View at Publisher · View at Google Scholar · View at Scopus
  10. S. McClean, M. Faddy, and P. Millard, “Markov model-based clustering for efficient patient care,” in Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems, pp. 467–472, gbr, June 2005. View at Scopus
  11. S. McClean, J. Gillespie, L. Garg, M. Barton, B. Scotney, and K. Kullerton, “Using phase-type models to cost stroke patient care across health, social and community services,” European Journal of Operational Research, vol. 236, no. 1, pp. 190–199, 2014. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  12. J. G. Ibrahim, H. Chu, and L. M. Chen, “Basic concepts and methods for joint models of longitudinal and survival data,” Journal of Clinical Oncology, vol. 28, no. 16, pp. 2796–2801, 2010. View at Publisher · View at Google Scholar · View at Scopus
  13. A. A. Tsiatis and M. Davidian, “Joint modeling of longitudinal and time-to-event data: an overview,” Statistica Sinica, vol. 14, no. 3, pp. 809–834, 2004. View at Google Scholar · View at MathSciNet · View at Scopus
  14. R. Henderson, P. Diggle, and A. Dobson, “Joint modelling of longitudinal measurements and event time data,” Biostatistics, vol. 1, no. 4, pp. 465–480, 2000. View at Publisher · View at Google Scholar
  15. S. Kim, D. Zeng, Y. Li, and D. Spiegelman, “Joint modeling of longitudinal and cure-survival data,” Journal of Statistical Theory and Practice, vol. 7, no. 2, pp. 324–344, 2013. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  16. I. Sousa, “A review on joint modelling of longitudinal measurements and time-to-event,” Revstat Statistical Journal, vol. 9, no. 1, pp. 57–81, 2011. View at Google Scholar · View at MathSciNet
  17. A. L. Gould, M. E. Boye, M. . Crowther et al., “Joint modeling of survival and longitudinal non-survival data: current methods and issues. Report of the DIA Bayesian joint modeling working group,” Statistics in Medicine, vol. 34, no. 14, pp. 2181–2195, 2015. View at Publisher · View at Google Scholar · View at MathSciNet
  18. J. C. Gardiner, Z. Luo, C. J. Bradley, E. Polverejan, M. Holmes-Rovner, and D. Rovner, “Longitudinal assessment of cost in health care interventions,” Health Services and Outcomes Research Methodology, vol. 3, no. 2, pp. 149–168, 2002. View at Publisher · View at Google Scholar · View at Scopus
  19. J. C. Gardiner, Z. Luo, C. J. Bradley, C. M. Sirbu, and C. W. Given, “A dynamic model for estimating changes in health status and costs,” Statistics in Medicine, vol. 25, no. 21, pp. 3648–3667, 2006. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  20. E. Polverejan, J. C. Gardiner, C. J. Bradley, M. Holmes-Rovner, and D. Rovner, “Estimating mean hospital cost as a function of length of stay and patient characteristics,” Health Economics, vol. 12, no. 11, pp. 935–947, 2003. View at Publisher · View at Google Scholar · View at Scopus
  21. L. C. Evans, Partial Differential Equations, American Mathematical Society, Providence, RI, USA, 2010. View at MathSciNet