Abstract

Getting medical services has become more difficult and expensive in China, which led to a problem of illness not being treated and a large number of zeros in the statistics of being hospitalized for the elderly. Traditional classic models such as the Poisson model and the negative binomial model cannot fit this kind of data well. One aim of this study was to use zero-inflated and hurdle models to better solve the problem of excess zeros. Another aim was to discover the factors affecting the decision-making behavior of the elderly being hospitalized and hospitalization service utilization. Therefore, the XGBoost model was firstly introduced to rank the importance of influencing factors in this paper. It was found that the zero-inflated negative binomial model performed best. The results showed that the elderly who had enjoyed NRCM or ERBMI/URBMI were more likely to have a higher number of hospitalizations. This indicated that the high cost of hospitalization had prevented the willingness of the elderly being hospitalized, but the basic medical insurance had increased the times of their repeated hospital readmissions. Policy efforts should be made to improve the level of basic medical insurance.

1. Introduction

Population aging is the trend of economic and social development in China. There were approximately 176 million Chinese people aged 65 and above in 2019, accounting for 12.6% of the total population. According to the United Nations’ standard of whether the proportion of 65-year-old and upper-aged population in the total population is more than 14%, China has officially entered the “aging society” and begun the countdown stage. Compared with other people, the elderly have the characteristics of a high prevalence rate, more chronic and serious diseases, the heavy burden of medical expenses, and so on, which might lead to more medical needs and more hospitalization services [1, 2].

Previous studies mainly focused on the demands and utilization of health services [3], the equality of health services, fall-risk-increasing drugs and falls by the elderly [4], the impacts of health insurances on outpatient visits of older adults, and so on [510]. Older adults are more likely to suffer from diseases and need hospitalizations more urgently [3], resulting in higher direct and indirect medical care costs. Government healthcare finance policy has affected the utilization of health services in the United States, Korea, Vietnam, Singapore, and China [1116]. However, the problem of the affordability of healthcare seems not to be mitigated by the development of social health insurance (SHI), even though such schemes now cover almost the whole population in China. A Chinese survey shows that public complaints about the problems of healthcare reform and affordability in urban areas increased from 21.1% in 2007 to 34.8% in 2009. Therefore, with the exponential increase in medical costs, some elderly do not have effective access to medical services due to economic poverty, lack of medical security, poor medical accessibility, and other factors, and they had to give up hospitalization or treatment due to these reasons, which seriously affects their health. However, few studies explored the hospitalization decision-making needs and influencing factors of the elderly. Therefore, the present study aimed to explore how demographic, socioeconomic, and health insurance factors would impact the whole hospitalization decision-making process of the elderly under the current healthcare system.

The hospitalization decision of the elderly consists of two parts: whether to be hospitalized or not and how many times to be hospitalized. There were many statistical models for the count data to analyze the factors of hospitalization, such as Poisson and negative binomial models [17]. However, modeling the hospitalization decision-making is more challenging because of healthcare data. This type of data commonly presents the problems of overdispersed and excessive zeros [17, 18]. The counts of hospitalizations for the elderly in China had suffered a serious zero-inflated (Figure 1) problem because the elderly gave up hospitalizations, which led to the observed sample variance larger than the sample mean. The standard Poisson model that the mean and variance of a count response variable are equal is not fit. Ignoring the overdispersion in count data would result in an underestimation of the standard errors, an overestimation of parameter significance level, and a biased hypothesis testing [19, 20]. The negative binomial model characterized as equal mean and variance could deal with the overdispersion caused by heterogeneity in the count data, but it cannot effectively solve the zero-inflated problem [19, 20]. Two-part models such as hurdle models had been widely used for handling counts data with excessive zeros, which allowed a logistic or probit regression modeling the probability that a count is zero or positive integer value and a generalized linear regression for the observer healthcare utilization [21, 22]. Another way to deal with excessive zeros is zero-inflated models. A zero-inflated model is a mixture of regular count models such as Poisson or negative binomial model and a component that accommodates the excessive zeros [19, 20, 23]. These models had been applied, modified, and extended as very popular models for healthcare data [2429].

Previous studies paid more attention to the utilization of medical services by certain factors such as new rural cooperative medical insurance and urban workers’ medical insurance [11, 13], ignoring the analysis of the needs of medical services at the individual level, especially for the hospitalization needs of the elderly. Also, in the research methods, several studies mainly used Poisson, negative binomial, zero expansion, and hurdle models to predict the number of outpatients; few studies used these models to analyze the number of hospitalizations of the elderly. The goal of this study is not only to predict the number of inpatients for the elderly using these models but also systematically to analyze the decision-making and demand factors of hospitalizations of the elderly from the aspects of personal medical needs, economic status, demographic characteristics, medical security, and so on. Many studies have either isolated each influencing factor independently or studied the cross-influencing factors of two factors, lacking a comparison of the importance of multiple variables. Based on the analysis of the factors that affect the number of hospitalizations for the elderly, this paper used the XGBoost (eXtreme gradient boosting) model in machine learning to explore the importance of these factors [30]. XGBoost is an integrated learning model with excellent performance for many classification and regression problems, which is an improvement and expansion of the gradient boosting decision tree (GBDT) model. The advantage of the XGBoost model is that it can prevent overfitting through regularization terms. It uses not only the first derivative but also the second derivative, whose loss function can be customized and the loss accuracy is improved. Although machine learning methods have been applied to many other fields, there are relatively few studies in healthcare, especially the work using the XGBoost method to predict the number of hospitalizations. Due to the available hospitalization count data being strongly right-skewed with excessive zeros, the study attempted to compare different statistical models to obtain an optimal model to predict the number of elderly hospitalizations more accurately.

2. Materials and Methods

2.1. Data Source and Study Population

The study used the microdata from the Chinese Longitudinal Healthy Longevity Survey (CLHLS). The survey was jointly performed by the Center for Healthy Aging and Development Studies of the National School of Development at Peking University, which is the earliest and longest lasting social science survey nationwide in China. The survey covers 23 provinces, municipalities, and autonomous regions across China. The subjects of the survey are elderly people aged 65 and over. The survey content of the questionnaire for the surviving respondents included the basic status of the elderly and their families, socioeconomic background and family structure, economic sources and economic status, self-evaluation of health and quality of life, cognitive function, personality and psychological characteristics, daily activity ability, lifestyle, life care, disease treatment, and medical expenses.

The CLHLS conducted a baseline survey in 1998, followed by seven-wave surveys in 2000, 2002, 2005, 2008–2009, 2011–2012, 2014, and 2017–2018 in randomly selected about half of the counties and city districts in 23 Chinese provinces. The CLHLS aimed to understand the demographic characteristics, lifestyles, health services, behavioral, economic status, and so on, among the Chinese elderly including ages 65 years old and over. Detailed information about the survey design and assessment of data quality has been reported in previous studies [3133]. The participants of the CLHLS baseline survey were older adults aged 80 years and over, and the age range was adjusted to 65 years and over after 2002.

We use CLHLS data from the latest follow-up cross-sectional survey (2017–2018) for the surviving, including 10 participants since 1998, 30 participants since 2000, 1,330 participants since 2000, 2,440 participants since 2008 and 2009, 2,884 participants since 2011/2012, 3,463 participants in 2014, and 12,411 participants for the first time in 2018. After filtering for missing and invalid values and outliers, 5,287 participants with complete information on healthcare utilization, sociodemographic, and economic characteristics were included in the analyses of impact factors on hospitalization.

2.2. Covariate Selection

The Andersen Behavioral Model of Health Service Use provides a framework for the study of hospitalization that outlines the three determinants: predisposing, enabling, and need factors [34]. In light of this, we evaluated the effects of health status and functional disabilities, as need factors and associated sociodemographic factors, as predisposing and enabling factors, on hospitalization utilization (Table 1).

2.2.1. Need Factors

This study integrated the health status and functional disabilities as need factors [35]. The health status was evaluated by self-rated health that was a multicategorized variable in order in which “1” to “5,” represented “very bad” to “very well.” The functional disabilities were measured by activities of daily living (ADLs). ADLs requiring any assistance were defined as “with difficulties.” The measure of ADLs was assessed as “no difficulty” or “with difficulties.”

2.2.2. Predisposing and Enabling Factors

According to Andersen’s behavior model, we evaluated sociodemographic characteristics associated with predisposing and enabling factors in this study. The predisposing factors included age, gender (male = 1 and female = 0), marital status (not in marriage = 0 and in marriage = 1), years of education, smoking (smoked in the past = 1 and not smoking = 0), and alcohol (drunk in the past = 1 and not drinking = 0). The enabling factors included total income of individual’s household last year, hospitalization expenditure last year, out-of-pocket expenses for hospitalization, and medical security plans: Urban Employee Basic Medical Insurance (UEBMI), Urban Resident Basic Medical Insurance (URBMI), and New Rural Cooperative Medical Scheme (NRCMS), free medical treatment, and others [34, 36]. Although the data about family income, hospitalization expenses, and out-of-pocket expenses in the survey were right-censored, it did not affect the conclusion because this kind of expenses exceeding 100,000 accounted for a relatively small proportion.

2.2.3. Description of Covariates

One of the outcomes of this study was the number of hospitalizations among the older patients. The CLHLS asked the participants how many times they suffered from a serious illness that required hospitalization or were bedridden at home in the past two years. We took the number of hospitalizations of the sick elderly in the past two years as a dependent variable and regarded the response too seriously ill but “bedridden at home” in the past two years in the questionnaire as not be hospitalized. The response variable was a type of discrete integer numerical variable.

Descriptive analyses were conducted to examine the outcome, demographic, and socioeconomic characteristics of the 5,287 participants (Table 1). Figure 1 shows that the distribution of the number of hospitalizations is heavily right-skewed and the variance is greater than the mean, which could be deemed as overdispersed data. Meanwhile, there are a large number of zero counts presented in Figure 1.

2.3. Statistical Models

The number of hospitalizations was assumed as an overdispersed and zero-inflated count variable; either Poisson or negative model might fit the data well. To accommodate the excess zeros, we utilized hurdle and zero-inflated models to fit appropriately. The difference between the zero-inflated and the hurdle models is that zero observations come from “structure” and “sampling” in zero-inflated models, nevertheless zeros were from one “sampling” source in hurdle models. Given the characters of our data, we applied and compared the classical count regression models such as Poisson and negative binomial (NB), zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), hurdle Poisson (HP), and hurdle negative binomial (HNB).

2.3.1. Poisson Model

Poisson regression is usually a benchmark model to fit count. Within the study, we assume that the number of hospitalizations of the elderly () obey the Poisson distribution with parameter (), and the probability function is defined as [19, 20]where is the factors, , and are the estimated parameters. It was assumed that the mean and the variance for Poisson are equal (e.g.,).

2.3.2. Negative Binomial Model

The negative binomial distribution is another method alternative to the Poisson model when the data are heterogeneous [19, 20]. The Poisson distribution assumes that the older patients are homogeneous, that is, the mean can be regarded as a fixed value, which does not match the fact. So if the average number of hospitalizations in Poisson is regarded as a random variable distributed to a gamma distribution, we can find the negative binomial model as follows:with the mean and variance function , where is known as the dispersion parameter. We can find that the variance is a quadratic function of the mean and is greater than the mean, which may solve the problem of heterogeneity.

2.3.3. Zero-Inflated Models

Figure 1 shows that there is a disproportionately large frequency of zeros in the number of inpatients that leads to poor performances of Poisson and negative binomial models. We promise a method to overcome the problem using zero-inflated models. The zeros in zero-inflated models came from two components: one part arising from a parent distribution and the other corresponds to the excessive zeros that could not be accounted for by the distribution [37]. The zero-inflated model is described as follows [19, 38]:where is a general count model. The mean and variance of the zero-inflated model are defined as follows:where and are the mean and variance, respectively, for the count model. When the distribution of is a Poisson, we can define the zero-inflated Poisson (ZIP) model:and what’s more, the zero-inflated negative binomial (ZINB) model is described as follows:

2.3.4. Hurdle Model

In addition, except for the aforementioned zero-inflated modes, the hurdle model is a widely used alternative for the count data with excessive zeros. The hurdle reflects a two-part decision-making process. The elder patient decides whether to be hospitalized or not firstly and then makes a second decision about how many times for inpatients. Therefore, the zeros are determined by one density , so that , and while the positive counts are from another density . This leads to the hurdle model

The model collapses to the standard count model only if . The density is a count density such as Poisson or negative binomial model, whereas could also be a count data density, or more simply, the probabilities and may be estimated using a logit or probit model [19, 20]. Although there is much literature using the probit model in the first part of the hurdle model, there is no obvious evidence that the choice of probit model or logit model has a serious impact on the results, and there is much work preferring to choosing the logit model mainly because its explanation and calculation are more convenient [21, 25, 26, 28, 37, 38]. In fact, there is also no evidence showing that the conclusions of the logit model are better or more reliable than the probit model in our study. The mean of the hurdle model is determined by the probability of crossing the threshold and by the moments of the zero-truncated density as follows: where is the untruncated mean in density . The hurdle model variance is shown in the following equation:where is the untruncated variance in density .

2.4. XGBoost (eXtreme Gradient Boosting) Model

The XGBoost model is a collection of a series of decision trees [30]. It is a type of boosted tree model (boosted trees). The decision tree in the model does not independently make predictions on the input samples but based on the prediction results of the previous round of the model. Learning the error of the prediction improves the prediction accuracy of the model. Let denote the preresult of the model for the i-th sample after the t-th iteration and denote the prediction score of the t-th decision tree for the i-th sample, then the expression of is as follows:

At the t-th iteration, the prediction result of the previous (t − 1) times of the model has been given, so the training goal of the model is to select the appropriate prediction function as the minimized objective function. The meanings of and are the same as above, which represent the number of hospitalizations and influencing factors, respectively. Assuming that is the loss function that measured the degree of deviation between the predicted category of the sample and the true category, the objective function of the XGBoost model training at the t-th iteration has the following expression:

The first part of the objective function measures the loss of the model’s prediction errors for all samples, and the second term () is a regularization term, which is a penalty for the complexity of the newly added model. Adding a regularization term to the objective function is helpful to prevent the model from overfitting, and it is also an improvement of XGBoost over the GBDT model. For the decision tree model , its complexity can be measured by the following formula:

where T represents the number of leaf nodes of the decision tree, represents the prediction score of the corresponding point of each leaf node, and and are the structure part and leaf weight of the decision tree, respectively. According to formulas (11) and (12), we obtain the penalty coefficient. According to equations (11) and (12), the XGBoost model expands the loss function to the quadratic term by applying the Taylor series at and uses the first and second derivatives of the error function, so it is more accurate than the GBDT model. After unfolding the objective function, through a series of transformations, under the condition of a given decision tree structure, the optimization objective can be transformed into the problem of solving the minimum value of the quadratic function of one variable. Finally, through the greedy algorithm, we continuously try to segment the existing leaf nodes and compare the gain of the objective function before and after the segmentation, until the optimal decision tree model of the t-th iteration is obtained.

The importance of explanatory variables in the XGBoost model can be measured in a variety of ways. For example, we can calculate the number of times the explanatory variable is used as a split feature in all decision trees, calculate the Gini coefficient reduction value of all nodes split by this feature, or calculate the sum of information gain. The importance of all explanatory variables is arranged from large to small to get the importance ranking of the explanatory variables in the XGBoost model. In this paper, we calculated the number of explanatory variables used as split features in all decision trees as a reference standard for importance.

2.5. Model Assessment

To compare the nonnested modes based on maximum likelihood, we use the Akaike information criterion (AIC) [39], which can be expressed as follows:where is the maximum log-likelihood and is the number of parameters in the model. The lower the AIC, the better. Another evaluation method is the Bayesian information criterion, which can be expressed as follows:with the model with the lowest BIC preferred, which was proposed by Schwarz [40]. We compared the predicted versus the observed number of hospitalizations for the competing model and found a well-fitting regression model that leads to predicted values closer to the observed data value. We also conducted a marginal analysis based on the results of regression to explain the impacts on the elderly hospitalization decision-making. All analyses were performed using Stata (Stata/SE version 15.1 for Windows, StataCorp LLP, College Station, TX, USA) statistical software.

3. Results

3.1. Descriptive Analysis

Table 1 presents the descriptive statistics of the number of elderly hospitalizations, demographic, socioeconomic, and health status. The gender distribution was almost balanced, and the average age of the elderly was 84 years. The elderly were more divorced, widowed, or single, and their education years were no more than 4 years. Medical consumption accounts for more than 60% of total household consumption, of which out-of-pocket expenses account for 17.7% of medical consumption expenditure. It can be seen from the box plot (Figure 2) that the elderly with more years of education have relatively better health status. Moreover, as is evident from Figure 3 that the higher the household income, the better the self-rated health status of the elderly.

The number of hospitalizations in our data is overdispersed because of its variance (0.97) greater than the mean (0.41). The t-test and auxiliary regression were applied to further verify whether the number of hospitalizations is overdispersed significantly [41]. We assume that the variance and mean of the number of hospitalizations were satisfied with the following relations, using the statistics to test the parameter was equal to zero or not. The t-test result showed t = 5.84, and the value of p was almost equal to zero, which indicated that the data were overdispersed remarkably. The proportion of elderly people who were not hospitalized within two years was as high as 75.06% as shown in Figure 1.

3.2. Model Evaluation and Comparison

In this study, we used the Poisson, negative binomial, zero-inflated Poisson, zero-inflated negative binomial, hurdle Poisson, and hurdle negative binomial models to fit the data, respectively. Table 2 reports the results of model comparison based on the AIC, BIC, and the log-likelihood. The Poisson model had the worst goodness of fit, while the zero-inflated negative was the best model in the six competing models according to the criteria, which was also consistent with the data characteristics above. Therefore, the conclusion of the zero-inflated negative binomial distribution model was more reliable.

3.3. Factors Associated with Hospitalization

Table 3 presents the analysis of the number of hospitalizations for a series of demographic, health status, socioeconomic, medical expenditure, and habits factors, using different regression. The results were made up of two separate parts: one part models the odds of being hospitalized and the other depicts the number of hospitalizations for the elderly who had at least one hospitalization. The logistic component of ZINB showed that the older adults whose daily activities were restricted, respectively, had 0.691 (95% CI: 0.501, 0.952) and 0.573 (95% CI: 0.363, 0.906) times lower odds of being hospitalized than those patients who were unrestricted. Males were 0.686 (95% CI: 0.483, 0.976) times less likely to be hospitalized than females. The elderly people with an additional year of education had 1.046 (95% CI: 1.003, 1.090) times of being hospitalized higher than before. For extra 1,000 yuan in hospitalization, the odds of being hospitalized were 0.145 (95% CI: 0.089, 0.234) times lower. Patients enrolled in URCMS were 1.937 (95% CI: 1.258, 2.983) times more likely to be hospitalized than patients enrolled in other medical plans. The results of the count component of the ZINB model indicated that the older adults who were in good health have 0.660 (95% CI: 0.462, 0.943) times lower odd of hospitalizations. There was a higher probability of hospitalization among the participants who needed assistance with ADLs, and the IRR is 1.302 (95% CI: 1.156, 1.466) and 1.471 (95% CI: 1.260, 1.717), respectively. Additional 1,000 yuan for household income and out-of-pocket expenses, the IRR of hospitalizations was, respectively, 0.998 (95% CI: 0.997, 1.000) and 0.994 (95% CI: 0.989, 0.999) times less likely to have multiple hospitalizations. On the contrary, the elderly people with more hospitalization medical expenditure had higher IRR, that is, 1.009 (95% CI: 1.005, 1.013) times more likely to have repeated hospital readmissions. For those who were at risk of using multiple hospitalizations service, NRCMS and UEBMI/URBMI were significantly associated with more use. They had higher IRR, respectively, 1.223 (95% CI: 1.015, 1.473) and 1.318 (95% CI: 1.086, 1.599) times more likely to be hospitalized compared to those in the other medical plans.

3.4. Importance Ranking of Impact Factors

The importance of an explanatory variable depends on the sum of the information gain of splitting it divided by the number of times that it is used as a split feature. The larger the value, the greater the importance of the variable. Based on the training results of the XGBoost model, the variables are sorted according to the importance, as shown in Figure 4.

As shown in Figure 4, medical consumption and out-of-pocket expenses were the most important factors for the number of elderly hospitalizations. The main reason was that the current medical burden in China was too heavy [1, 2], which has seriously affected the elderly’s decision-making and hospitalization behavior. The importance of medical insurance factors was relatively weak, which was related to the current low level of medical insurance in China. From the perspective of personal characteristics, age factors, education level, health status, and mobility restrictions were more important for the number of hospitalizations for the elderly, while the impact of smoking, drinking, and other lifestyle habits on the number of hospitalizations was not very important. In addition, due to the low importance of age and marital status, they were not shown in Figure 3, which also showed that these two types of factors had the weakest influence.

3.5. Prediction of the Number of Hospitalizations

After the model fitting, the best zero-inflated negative binomial model was used to produce the predictions of the expected numbers of the elderly being hospitalized. The average predicted frequencies of the zero-inflated negative binomial model almost tended to the average observed frequencies, which had the smallest deviance from the observed as shown in Figure 5. The results indicated that the zero-inflated negative binomial model handled both issues of overdispersed and excessive zeros effectively and improved both modeling fitting and prediction.

3.6. The Effect of Different Medical Insurance on the Number of Hospitalizations

As outlined in Table 3, different medical insurance has different effects on the numbers of hospitalization of the elderly. Household income, hospitalization medical expenditure, and out-of-pocket medical expense had a significant impact on the numbers of multiple hospitalizations of the elderly. It could be observed from Figure 6 that the elderly with NRCMS, UEBMI/URBMI, and free medical care are more likely to choose hospitalization than those with commercial medical insurance, and the impact of UEBMI/URBMI was more significant. In comparison with the trend of hospitalization expenses, we found another tendency (as shown in Figure 7). Figure 7 reveals that the number of hospitalizations increased first and then decreased with the increase of out-of-pocket expenses. The elderly people with UEBMI had the highest number of hospitalizations. As was evident from Figure 8, the number of hospitalizations of the elderly with high family income decreased on the contrary. The older patients with free medical care had a relatively high number of hospitalizations compared with Figures 6 and 7.

4. Discussion

In this study, we compared various counts models such as Poisson, negative binomial, zero-inflated Poisson, zero-inflated negative binomial, hurdle Poisson, and hurdle negative binomial to analyze the influencing factors of the elderly people hospitalizations. It appeared to suggest that the zero-inflated negative binomial model was the best model among the six models applied in this study. Generally speaking, the Poisson was not to fit the overdispersed count data because of its characteristics of equal mean and variance. In this case, the negative binomial model was a better candidate model due to its variance greater than the mean. But when the count data were overdispersed and zero-inflated, neither the Poisson nor negative binomial model was suitable. There was a clear phenomenon that the number of elderly hospitalizations suffered from a serious zero-inflated problem. In this situation, the two-part models, such as zero-inflated and hurdle models, would be considered. The difference between zero-inflated and hurdle models lied in their way of interpreting and analyzing zero counts according to their generating data process [19, 20]. In most cases, zero observations in the zero-inflated models had two different sources, namely, structural and sample zeros, but there was only one source in the hurdle models [23]. In the present study, all the study elderly participants had no hospitalizations, and therefore, the excessive zeros either came from the structural zeros, which was true that there was no need for some elderly to be hospitalized at all, or from the sample zeros, which might be that the elderly had a need for hospitalization, but a zero outcome was produced due to sampling variability. Thus, this finding suggested that the zero-inflated negative binomial model would be more appreciate to handle the elderly hospitalization data.

This study led to a better understanding of factors contributing to hospitalization decision-making behavior and hospitalization needs of the elderly. These results suggested that the impacting factors on the odds of having multiple hospitalizations versus no hospitalization for the elderly were significantly different. To investigate whether there was a “U” relationship between the age and the number of hospitalizations, we need to introduce the age square term. As outlined in Table 3, there is no probability of a “U” relationship because of the square coefficient not being significant. In general, after the elderly choose to be hospitalized, the number of hospitalizations would gradually increase with age, which could be indicated from the odds of age and age square less than one.

Findings from this study revealed a clear tendency: the number of hospitalizations of the elderly had decreased with the increase of years of education and household income as detailed in Figures 8 and 9. This point was perhaps due to the impact of health. The higher level of education and the more household income tended to be in better health, which resulted in fewer hospitalizations [42]. In addition, the elderly with high household income might focus more on the quality of hospitalization service rather than the number of times, and they could choose a better quality of hospitalization service than the times.

For the impact of hospitalization medical expenditure and out-of-pocket expenses on the number of hospitalizations, our results indicated a clear distinction. We noticed that high out-of-pocket expenses might prevent the elderly from being hospitalized, but the increasing hospitalization expenditure brought the increasing number of hospitalizations (as shown in Table 3) on the contrary. Although the SHI system in China had covered the proportion of the population from a moderate rate of 50% to a near-universal rate of 95% during 2005–2011 [42, 43], the out-of-pocket rate associated with SHI, which ranges from 40% to 70%, remains a major financial challenge to patients with severe illnesses [43, 44]. It is seen from Figure 7 that the elderly would be more willing to choose hospitalization when the out-of-pocket expenses were within their affordability (about 30,000 yuan); however, most elderly people might give up hospitalization once this amount was exceeded. We also found that the increase in hospitalization medical expenditure did not reduce the repeated hospital readmissions for the elderly. More elderly people would choose to be hospitalized even if the hospitalization medical expenditure was relatively low (about lower 10,000 yuan). But the elderly will be less willing to choose to be hospitalized when the hospitalization medical expenditure exceeds 10,000 yuan, which could be observed in Figure 6.

Different medical insurance had different effects on the number of hospitalizations for the elderly people enrolled in different SHI from Figures 69. The effect of free medical care on the number of hospitalizations of elderly people was not significant. The reason was that the elderly who enjoyed free medical care were generally in good health (among the 195 elderly people with free medical care, only 3 were in poor health). On the other hand, it might be because there is a small proportion (about 3.69%) of such people in the sample data. We noticed that the enrollment in UEBMI/URBMI indicated no significant probability of whether or not being hospitalized, while they were more likely to use more hospitalizations after being hospitalized. The elderly people enrolled in NRCMS were more likely to incur multiple hospitalizations, which showed that this protective effective effect was significant. We observed from Figures 6 and 7 that the elderly with UEBMI/URBMI were hospitalized more often than the elderly with NRCMS. The reason perhaps was that the three schemes differed in the scopes of covered service and conditions, but also the financial generosity of the schemes varied substantially across regions [43]. The reimbursement scope and upper limits of NCRMS were more limited than those of UEBMI/URBMI because of lower funding levels [45]. As a result of NCRMS reimbursement policy’s upper limits for inpatient medical services being usually quite low and failing to compensate adequately for the hospitalization medical expenses, the elderly people had to abandon being hospitalized or reduce the numbers [42].

In this study, we selected the XGBoost model for the first time to rank the importance of the factors that affected the elderly’s hospitalization decision and found that there was a slight difference from the significant influencing factors in the zero-inflated negative binomial model, but the difference was not significant. At the same time, the reliability of the influencing factors selected in the zero-inflated negative binomial regression model was also verified. This is also a contribution of this paper. Another contribution is the applications of models. Since there were a large number of zeros in the number of elderly hospitalizations, continuing to use the traditional counting models would underestimate the standard deviation and overestimate the significance level, so we consider using the zero-inflated negative binomial model to solve this problem.

4.1. Limitations

Some limitations of this study should be noted. Firstly, the CLHLS study sample included missing data, and we did not have a conservative evaluation of hospitalization, which might result in some biases. Besides, we could not consider the impact of UEBMI and URBMI on the number of hospitalizations separately, since the information was not captured in the questionnaire. Future research incorporating such information is needed for understanding the influence of different medical insurance including commercial medical insurance and basic medical insurance on the number of hospitalization of the elderly. As mentioned above, only 3.6% of the elderly in the sample have access to free medical care, so findings from this study might underestimate its significance. Further studies based on a large sample are warranted in order to investigate the free medical care effect more properly. Machine learning has been applied in many fields [46, 47]. As a relatively new algorithm in the field of machine learning, the XGBoost method is less used in the field of healthcare. We only applied it to select the variables and find their importance. There is a need to further research in the future. In addition, the impact of major health and public health events such as COVID-19, SARS, and so on will also affect the elderly’s decision-making in hospitalization. These issues can be discussed in future studies. Lastly, the data were pooled cross-section data from different years; we could not infer a causal relationship hardly.

5. Conclusions

This study reported that the ZINB model was the best fit among other models account for the number of hospitalizations of the elderly. Our findings indicated that the higher hospitalization medical expenditure inhabited the willingness of the elderly to be hospitalized, and the male elderly were less willing to be hospitalized. Difficulties with upper restricted functioning were even less likely to choose hospitalizations. The NRCMS had significantly increased the willingness of the elderly to be hospitalized. Our findings also revealed that there was no “U” relationship between age and the number of hospitalizations. The elderly with NRCMS or UEBMI/URBMI were more likely to have repeated hospital recommissions, but the free medical treatment had little impact on the hospitalization service. Increasing out-of-pocket expenses for hospitalization would reduce the number of hospitalizations for the elderly, but the impact on their hospitalization needs was not significant. The economic burden of hospitalization for the elderly should be further reduced through various safeguard measures such as major illness medical insurance, medical assistance, and special assistance. Attention should be paid to the medical service needs of low-income elderly; more financial subsidies should be provided to their medical service utilization; and the inequality should be reduced in the utilization of medical services for elderly people with different incomes. Our study led to a better understanding of factors contributing to increased inpatient hospitalizations among the elderly, which might help reduce the financial burden of hospitalization for the elderly and truly achieve “affordable hospitalizations.”

Data Availability

Data are available in a public, open-access repository. The data were collected by the project entitled “Chinese Longitudinal Healthy Longevity Survey” (CLHLS) jointly implemented by the Center for Healthy Aging and Development Studies of Peking University. CLHLS is supported by funds from China Natural Science Foundation, China Social Science Foundation, China 211 projects of the Ministry of Education, China 973 projects of the Ministry of Science and Technology, China National Science and Technology Support Plan, NIA/NIH, and UNFPA (http://chads.nsd.pku.edu.cn/xwdt/512297.htm).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the Planning Project of Beijing Social Science Fund (Grant no. 20SRB010).