Abstract

For hospitals’ admission management, the ability to predict length of stay (LOS) as early as in the preadmission stage might be helpful to monitor the quality of inpatient care. This study is to develop artificial neural network (ANN) models to predict LOS for inpatients with one of the three primary diagnoses: coronary atherosclerosis (CAS), heart failure (HF), and acute myocardial infarction (AMI) in a cardiovascular unit in a Christian hospital in Taipei, Taiwan. A total of 2,377 cardiology patients discharged between October 1, 2010, and December 31, 2011, were analyzed. Using ANN or linear regression model was able to predict correctly for 88.07% to 89.95% CAS patients at the predischarge stage and for 88.31% to 91.53% at the preadmission stage. For AMI or HF patients, the accuracy ranged from 64.12% to 66.78% at the predischarge stage and 63.69% to 67.47% at the preadmission stage when a tolerance of 2 days was allowed.

1. Introduction

The demand for health care services continues to grow as the population in most developed countries ages. To make health care more affordable, policy makers and health organizations try to align financial incentives with the implementation of care processes based on best practices and the achievement of better patient outcomes. The length of stay (LOS) in hospitals is often used as an indicator of efficiency of care and hospital performance. It is generally recognized that a shorter stay indicates less resource consumption per discharge and cost-saving while postdischarge care is shifted to less expensive venues [1]. It motivates the endeavor to develop a diagnosis-related group (DRG) for patient classification based on the type of hospital treatments in relation to the costs incurred by the hospital. This quality assurance scheme was then linked to the prospective payment system (PPS) and adopted by the federal government in the United States for the Medicare program in 1983. This payment system was found to moderate hospital cost inflation due to a significant decline in the average length of stay (ALOS), which refers to the average number of days that patients spend in hospital [2]. Under the assumption that patients sharing common diagnostic and demographic characteristics require similar resource intensity, the aim of DRG is to quantify and standardize hospital resource utilization for patients [3].

Other than diagnostic attributes, most research focuses on two types of factors to explain the variation in LOS: patient characteristics and hospital characteristics. In examining data for the National Health Service (NHS) in the United Kingdom, the variation in LOS for those over age 65 was consistently larger across all regions [4]. It was observed that the variation in LOS between hospitals was larger compared to that between doctors in the same hospital [5]. Hospital policy in treatment management can also determine LOS. It was found that psychiatrists were able to predict LOS with significant accuracy, but only for patients they treated. Moreover, the prediction by a hospital coordinator involved in all patient treatments was significantly more correlated to the true LOS than psychiatrists’ predictions [6]. A comparison of data from 24 hospitals in Japan showed that inpatient capacity and the ratio of involuntary admissions correlated positively to longer LOS [7]. A higher level of caregiver interaction among nurses and physicians, such as communication, coordination, and conflict management, was significantly associated with lower LOS [8].

The ability to predict LOS as an initial assessment of patients’ risk is critical for better resource planning and allocation [9], especially when the resources are limited, as in ICUs [10, 11]. Yang et al. considered timing for LOS prediction in three clinical stages for burn patients: admission, acute, and posttreatment. Using three different regression models, the best mean absolute error (MAE) in the LOS predictions was around 9 days in both the admission and the acute stage and 6 days in the posttreatment stage. With three more treatment-related variables, the results showed that the prediction accuracy was significantly improved in the posttreatment stage [11]. An accurate prediction of LOS can also facilitate management with higher flexibility in hospital bed use and better assessment in the cost-effectiveness treatment [12, 13].

This prediction can even stratify patients according to their risk for prolonged stays [14, 15]. Spratt et al. used a multivariate logistic regression method to identify factors associated with prolonged stays (>30 days) for patients with acute ischemic stroke. In addition to advanced age (>65), diabetes and in-hospital infection were significantly associated with prolonged LOS [14]. Lee et al. analyzed LOS data on childhood gastroenteritis in Australia and, using either the robust gamma mixed regression or linear mixed regression method, found that both gastrointestinal sugar intolerance and failure to thrive significantly affected prolonged LOS [16]. Schmelzer et al. used the multiple logistic regression method and found that both the American Society of Anesthesiologists (ASA) scores and postoperative complications were significant in the prediction of prolonged LOS after a colectomy [17].

Rosen et al. studied the LOS variation for Medicare patients after coronary artery bypass graft surgery (CABG) in 28 hospitals. They found that including deceased patients did not significantly influence the results. Other than age and gender, the most powerful predictors were history of mitral valve disease or cerebrovascular disease and preoperative placement of an intra-aortic balloon pump. Different hospitals varied significantly in their LOS, and the readmission rate was linearly related to longer LOS [18]. Janssen et al. constructed a logistic regression model to predict the probability for patients requiring 3 or more days in ICU after CABG. Only 60% of the patients predicted to be high risks had a prolonged ICU stay [15]. Chang et al. identified that, among preoperative factors, age of more than 75 years and having chronic obstructive pulmonary disease (COPD) were associated with increased LOS for patients who underwent elective infrarenal aortic surgery [19].

Even though diagnosis had been considered the primary factor affecting hospital stays, patients’ clinical conditions, such as the number of diagnoses and the intensity of nursing services required, might be as critical in determining LOS variations within some DRGs [20]. One study showed that only 12% of the variation could be explained by patient characteristics and general hospital characteristics for patients with a primary diagnosis of acute myocardial infarction (AMI) [21]. For heart failure patients, Whellan et al. studied data from 246 hospitals for admission predictors for LOS. Patients with longer LOS had a higher disease severity and more comorbidities, such as hypertension, cardiac dysrhythmias, diabetes mellitus, COPD, and chronic renal insufficiency or failure. However, the overall model based on characteristics at the time of admission explained only a modest amount of LOS variation [22].

The purpose of this study is to develop artificial neural network (ANN) models to predict LOS for inpatients with one of the three primary diagnoses: coronary atherosclerosis (CAS), heart failure (HF), and acute myocardial infarction (AMI) in a cardiovascular unit in a Christian hospital in Taipei, Taiwan. A better recognition in critical factors before admission that determine LOS, or a capacity to predict an individual patient’s LOS, could promote the development of efficient admission policy and optimize resource management in hospitals. This study aims to use ANN to predict LOS for patients with three primary diagnoses: coronary atherosclerosis (CAS), heart failure (HF), and acute myocardial infarction (AMI) in a cardiovascular unit. Moreover, two stages in LOS prediction are presented: one uses all clinical factors, designated as the predischarge stage, and the other uses only factors available before admission, designated as the preadmission stage. The prediction results obtained at the predischarge stage are then used to evaluate the relative effectiveness in predicting LOS at the preadmission stage.

The remainder of this paper is organized as follows. In Section 2, the method including steps in data collection and processing and prediction model construction is introduced. Then, the prediction results of various artificial neural network (ANN) models are presented in Section 3. The discussion of the results and the conclusion of the research finding is given in Section 4, with the limitations and future research directions.

2. Method

2.1. Data Sources and Data Preprocessing

This study was approved by the Mackay Memorial Hospital Institutional Review Board (IRB) for protection of human subjects in research. Clinical and administrative data were obtained for cardiology patients discharged between October 1, 2010, and December 31, 2011, in a Christian hospital with two locations in the metropolitan area of Taipei, Taiwan: Taipei branch and Tamshui branch. A total of 2,424 admission cases were collected for patients with one of three primary diagnoses: CAS, HF, and AMI. Then 47 admissions were identified as outliers, with more than three standard deviations from the mean, when fitting for both forward addition regression and backward elimination regression models. For the remaining 2,377 cases, 933 were coronary atherosclerosis (CAS) patients, 872 heart failure (HF) patients, and 572 acute myocardial infarction (AMI) patients, as summarized in Table 1. The LOS of any patient in this cardiology unit was defined as from the time of admission to the time of discharge with range from 1 to 35 days, with an average of 5.73, a standard deviation of 5.44, and a median of 4 days. About 63% were male. The age ranges from 21 to 99 years, with an average of 67.07, a standard deviation of 14.35, and a median of 68; 35% were 75 years or older.

An admission case might have zero to multiple comorbidities and similar medical histories were aggregated into comorbidity factors. For example, the history of hypertensive disease includes four types of diseases as identified by ICD-9 codes 401 (essential hypertension) to 404 (hypertensive heart and chronic kidney disease). Each case might have zero to multiple interventions during the admission. Out of a total of 46 types of intervention or diagnostic ancillary services found in the dataset, only the top 6 interventions with more than 5% occurrence in the entire dataset were adopted in this study. The last characteristic, TW-DRG pay, was regarding whether the admission case had been reimbursed by the pay-per-case (i.e., TW-DRG) system implemented by the National Health Insurance Administration (NHIA). The NHIA in Taiwan provides a universal health insurance system and covers approximately 99% of the population [23]. Except for using fee-for-service payment system, the NHIA started introducing the first phase of TW-DRG with 164 groups from 2010. Since cases in the same DRG are reimbursed with the same amount, it is to encourage hospitals to improve their financial performance by better utilizing medical resources [24]. Among the data collected, 25% were reimbursed through TW-DRG payment by the NHIA.

2.2. Statistical Analysis

Pearson’s correlation coefficients were used to study the relationships between LOS and each inpatient’s characteristics. As summarized in Table 2, it was observed that all characteristics were significantly correlated with LOS except for the comorbidity of chronic airway obstruction (ICD 496). As for the risk factors, the top three significant positive correlated variables for longer LOS were patients with heart failure (ICD 428) as main diagnosis, who were older and female. It was consistent with the findings about factors related to prolonged LOS from literature: female, increasing age, and comorbidities such as cerebrovascular disease and diabetes mellitus [18, 19, 22]. The top three significant negative correlated variables for longer LOS were patients with coronary atherosclerosis (ICD 414) as main diagnosis, who went through either percutaneous transluminal coronary angioplasty (PTCA) or percutaneous coronary intervention (PCI).

As shown in Figure 1, the distribution in LOS data was skewed with few cases staying longer than 14 days. The average and standard deviation of LOS for CAS patients were 2.63 days and 2.25 days, respectively. For AMI and HF patients, the average and standard deviations of LOS were 7.74 days and 5.93 days, respectively. The distribution of LOS for CAS patients was significantly different than that for patients with either AMI or HF (with value < 0.0001), which suggested different prediction models should be built for CAS patients and for non-CAS patients or referred to as AMI and HF patients.

2.3. Structure for Artificial Neural Networks (ANNs)

With the profound growth in clinical knowledge and technology, the development of more sophisticated information systems to support clinical decision making is essential to enhance quality and improve efficiency. Artificial neural networks (ANNs) are useful in modeling complex systems and have been applied in various areas, from accounting to school admission [25]. Walczak and Cerpa proposed four design criteria in artificial neural network (ANN) modeling: the appropriate input variables, the best learning methods, the number of hidden layers, and the quantity of neural nodes per hidden layer [26]. The learning method of ANN can be either supervised or unsupervised, depending on whether the output values should be known before or should be learned directly from the input values. For supervised learning, backpropagation is one of the most commonly used methods due to its robustness and ease of implementation [27].

The clinical benefits of using ANN had been notable in specific areas, such as cervical cytology and early detection of acute myocardial infarction (AMI) [28]. Compared with logistic regression, ANNs were found useful in predicting medical outcomes due to their nature of nonlinear statistical principles and inference [29]. Dybowski et al. adopted an ANN to predict the survival results for patients with systematic inflammatory response syndrome and hemodynamic shock. After improving the performance of ANN iteratively, the predicted outcome was more accurate than using a logistic regression model [30]. Gholipour et al. utilized an ANN model to predict the ICU survival outcome and the LOS for traumatic patients. The results showed that the mean predicted LOS using ANN was not significantly different than the mean of actual LOS [31]. Launay et al. developed ANN models to predict the prolonged LOS (13 days and above) for elder emergency patients (age 80 and over) [32]. Based on the biomedical literature from PUBMED, Dreiseitl and Ohno-Machado showed that the discriminatory performance of ANN models was better or not worse in 93% of the surveyed papers compared to the logistic regression method [33]. Grossi et al. found ANN models outperformed traditional statistic methods in accuracy in various diagnostic and prognostic problems in gastroenterology [34].

The selection of input variables used in an ANN model is critical. Li et al. found that the ANN model using all input variables yielded a slightly higher predictive accuracy than the one using a subset of variables filtered by correlation analysis [35]. Hence, we decided to consider all inpatient characteristics, including gender, age, location, main diagnosis, eight types of comorbidity, six types of intervention, and whether the case met the criteria for TW-DRG reimbursement. These input variables were then categorized into two stages: preadmission stage and predischarge stage, as shown in Table 3. Variables in the preadmission stage included information available prior to hospitalization, such as gender, age, hospital branch (location) to be admitted to, main diagnosis, and comorbidities. In the predischarge stage, additional to variables in the preadmission stage, it includes interventions and whether the case was reimbursed by TW-DRG payment. A case is to be reimbursed by TW-DRG payment, not default pay-per-service, depending on the actual discharge condition such as surgical procedure, treatment, and discharge status according to the NHIA guideline [36].

Separate ANNs were built to predict LOS: one for coronary atherosclerosis (CAS) patients and the other for heart failure or acute myocardial infarction AMI and HF patients. Figure 2 shows the general structure of backpropagation artificial neural networks in this research. The output layer has only one neuron and it generates a number ranged from 0 to 35 to represent the predicted LOS. The size of input layer depends on the number of input variables. Here, the prediction model using input variables in the predischarge stage is referred to as the predischarge model. Likewise, the model using variables in the preadmission stage is referred to as the preadmission model. For a predischarge model, the input layer in an ANN model has 18 neurons () for CAS patients. For AMI and HF patients, the value of is 19 with one additional neuron with Boolean value to represent whether the major diagnosis is HF in a predischarge model. In preadmission models, the value of is 11 and 12 for CAS patients and for AMI and HF patients, respectively.

As for the hidden layer, more neurons were found to enable a better closeness-of-fit [37] with lower training errors [38]. However, large ANN size also required more training efforts [39] and could result in overfitting [38]. Some research suggested the number of neural nodes in hidden layers to be between 2/3 to 2 times of the size of the input layer [26, 39, 40].

3. Results

In this section, the LOS prediction in predischarge model and preadmission model using ANNs is benchmarked with the results using linear regression (LR) models. All prediction models are implemented using IBM® SPSS® v.21 and IBM SPSS Neural Networks 21. Similar to the preliminary trial run, the original data was separated into training dataset and test dataset. The training dataset included 744 admissions for CAS patients and 1,155 admissions for AMI and HF patients, and the test dataset consisted of 189 admissions for CAS patients and 289 admissions for AMI and HF patients. During training any ANN model, 70% of the training dataset were randomly assigned to the training set and the remaining 30% to the validation set. The training stops when the number of training epochs reaches 2,000 or there is no improvement in validation error for 600 epochs consecutively. For LR models, the entire training dataset was used to generate the linear regression functions.

3.1. For CAS Patients

The performance of prediction models is evaluated using the same test dataset. Since the LOS predictions obtained by ANN or LR models are continuous numbers, we further define that a prediction of LOS is considered accurate if the difference is within 1 day from the actual LOS for CAS patients. Moreover, the effectiveness of predictability was measured according to the mean absolute error (MAE) and the mean relative error (MRE), defined as follows:where and are the predicted LOS and actual LOS for the th test data, , and is the number of testing instances.

To incorporate the randomness in data selected for training ANN, the results showed in Table 4 are the 95% confidence intervals (95% CI) for accuracy, MAE, and MRE based on 30 runs. All models were quite effective in predicting LOS, with the accuracy rate ranging from 88.07% to 91.53%, the MAE from 1 to 1.11 days, and the MRE from 0.44 to 0.47. Figure 3 shows a detailed look at the distribution in the accurate LOS prediction in the test dataset. It is observed that LR model performed better than ANN model for patients with LOS of 2 days, which was about 60% of the test dataset. However, both LR and ANN models were unable to predict correctly for LOS more than 5 days, which accounted for 3.7% of the test dataset.

3.2. For AMI and HF Patients

Same performance indices are used to evaluate the effectiveness of prediction models for AMI and HF patients. Results summarized in Table 5 show that these models are not as effective in predicting LOS as for CAS patients, with the accuracy rate ranging from 32.99% to 36.33%. The MAE of all models has been quite stable, ranging from 3.76 to 3.97 days and the MRE from 0.69 to 0.77. Further, considering the high degree in the variation of LOS distribution, the definition of accuracy has been extended to include two more scenarios: a tolerance of 1 day is allowed (the difference of LOS prediction to the actual LOS is less than 2 days) or a tolerance of 2 days is allowed (the difference of LOS prediction to the actual LOS is less than 3 days). However, the accuracy rate of these models is increased from 63.69% to 67.47% only even with 2 days of deviation in prediction being allowed as in Table 5.

Figure 4 shows the breakdown in the accurate LOS prediction with no tolerance in the test dataset. It is observed that both LR and ANN models performed better in predicting LOS between 8 and 11 days. In the predischarge model, ANN performs better than LR model for patients with LOS of 3, 5, 6, or 7 days, which is about 60% of the test dataset. Moreover, as shown in the resized charts in Figure 4, ANN models were able to predict correctly for cases with LOS greater than 11 days, which accounts for 14.5% of the test dataset. However, both LR and ANN models were unable to predict correctly for LOS greater than 18 days, which accounts for 5.9% of the test dataset.

3.3. Validation of ANN Models

To determine a proper structure for ANN used in this study, a preliminary trial run was first conducted to identify a proper structure for ANN models while assuming that the neuron activation function used for each neuron in the hidden layer was log-sigmoidal function with outputs between 0 and 1 [41]. The original data was separated into two sets: training dataset and test dataset. The training dataset included the first 12-month data, from October 1, 2010, to September 30, 2011, with 744 admissions for CAS patients and 1,155 admissions for AMI and HF patients. The test dataset consisted of the data in the last 3 months, with 189 admissions for CAS patients and 289 admissions for AMI and HF patients.

To avoid overfitting, the training dataset was further separated into two sets: a training set, to update the weights and biases, and a validation set, to stop training when the ANN might be overfitting. In this study, the size of training set and validation set in training all ANN models was assumed to be 70% and 30% of the training dataset. The weights in an ANN were modified using a variable learning rate gradient decent algorithm with momentum [42]. The training stopped when the number of training epochs reached 2,000 or there is no improvement in the validation error for consecutively 600 epochs. After an ANN was trained, the model was then used to obtain the predicted LOS in the test dataset. Furthermore, to avoid the effect of randomness when comparing the results, a fixed training set and validation set were used when training the backpropagation ANNs. Figure 5 shows the root mean squared error (RMSE) for the training set, validation set, and test dataset of trained ANN models with different numbers of neurons in the hidden layer, ranging from 10 to 30. The training errors were found to be slightly decreasing as more neurons were included in the hidden layer. However, no overfitting was observed and the test errors had been quite stable for both models.

To balance between the required training effort and the test errors improvement, the number of neurons in the hidden layer, or the value of in Figure 2, was set to be 13 for all ANN models. Figure 6 shows the weight distribution between input neurons and hidden neurons, as each dot indicates the weight of one of the input neurons to some hidden neuron, and each input neuron has a total of thirteen dots (weights) linked to the hidden layer. It further validates the size of ANN used in this study since the weights had been scattered evenly from −1.5 to 1.5 with only a few dots (weights) close to zero.

4. Discussion and Conclusion

This study proposed the use of the neural network techniques to predict LOS for patients in a cardiovascular unit with one of three primary diagnoses: coronary atherosclerosis (CAS), heart failure (HF), and acute myocardial infarction (AMI). The major observation based on the results was that the preadmission models were as effective in predicting LOS as the predischarge models. It was even found that some preadmission models performed slightly better than predischarge models as shown in Tables 4 and 5. This observation indicates that whether a patient might be reimbursed by TW-DRG did not provide additional predictive ability in LOS, and the assumption that a shorter LOS would be preferred in the sake of hospitals’ financial performance when DRG was implemented was not applicable in our case hospital.

The benefit of using ANN models was more significant when predicting prolonged LOS for HF and AMI patients. When predicting prolonged LOS, most literature formulated the prediction models to determine whether an admission might belong to a prolonged stay [14, 15, 17] or whether the LOS might be within a fixed range of LOS days [16, 22]. The study by Mobley et al. [43] predicted the exact LOS days for patients in a postcoronary care unit. With 629 and 127 admissions in the training and test file, a total of 74 input variables were used to predict 1 to 20 LOS days in ANNs. The mean LOS was 3.84 days and 3.49 days in the training file and the test file. They showed no significant difference in the distribution from the predicted LOS and from the actual LOS in the test file. However, ANN with two or three hidden layers made no prediction of LOS beyond 5 days [43]. In this study, the mean LOS was 2.65 days and 2.53 days in the training dataset and the test dataset for CAS patients. With only 18 input variables, our models were able to predict correctly for patients with LOS up to 5 days as shown in Figure 3. For AMI and HF patients, the mean LOS was 7.86 days and 7.23 days in the training dataset and the test dataset. Compared with LR method, the ANN model was able to predict patient stays longer than 11 days, as shown in Figure 4.

In general, it is observed that the LR model performed slightly better than ANN models in terms of accuracy as in Tables 4 and 5. It might be due to the reason that each ANN model was built by only 70% of the training dataset, which consisted of the first 12-month data, and the test dataset, which was the remaining 3-month data, had been highly consistent with the previous 12 months. This phenomenon implies that the clinical pathways were well-established in our case hospital.

Limitation of this research is that the major diagnosis and comorbidities for patients are assumed to be well-known in the preadmission stage. Further study is suggested to fully assess the use of ANN models in LOS prediction, especially for patients who might require longer LOS. Instead of predicting the actual LOS, it might be practical to first categorize LOS into risk groups. More patient characteristics, such as vital signs or lab readings at the time of admission, can be included to improve the performance of LOS predictability.

As the bed supply is limited, the utilization of hospital beds is considered economically critical for most hospitals and any policy related to improving bed utilization has profound impacts on the perception of quality in the provided care and satisfaction of patients and physicians. Currently, hospitalists rely on only aggregated data, such as occupancy rates and average LOS, to access the performance and competitiveness among clinics in the hospital. A reliable LOS prediction in the preadmission stage could further assist in identifying abnormality or potential medical risks to trigger additional attentions for individual cases. It might even allow bed managers to foresee any bottlenecks in bed availability when admitting patients to avoid unnecessary bed transfer between wards.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

The study has received financial support from the National Taipei University of Technology-Mackay Memorial Hospital Joint Research Program (NTUT-MMH-100-15). The authors would like to thank Cheng-Hsien Mao for insightful inputs and the entire administrative staff from the participating hospital for assistance and support.