Promoting Variable Effect Consistency in Mixture Cure Model for Credit Scoring

Zheng, Chenlu; Zhu, Jianping; Fan, Xinyan; Chen, Song; Zhang, Zhiyuan

doi:https://doi.org/10.1155/2022/3112987

Discrete Dynamics in Nature and Society

On this page

Abstract Introduction Methods Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Fintech and Financial Risk Analysis in the Era of Big Data 2021

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 3112987 | https://doi.org/10.1155/2022/3112987

Promoting Variable Effect Consistency in Mixture Cure Model for Credit Scoring

Chenlu Zheng,^1,2Jianping Zhu ,^1,2Xinyan Fan,³Song Chen,⁴and Zhiyuan Zhang⁵

Academic Editor: Dehua Shen

Received03 Oct 2021

Revised10 Dec 2021

Accepted31 Dec 2021

Published21 Feb 2022

Abstract

Mixture cure models are widely adopted in credit scoring. Mixture cure models consist of two parts: an incident part which predicts the probability of default and a latency part which predicts when they are likely to default. The two model parts describe two quite relevant credit aspects. So, it is reasonable to expect that the two sets of the coefficients are somewhat related. Moreover, in practical cases, it is difficult to interpret the results when the two sets of the coefficients of the same variables have conflicting signs. Most existing works either ignore the interconnections of the two sets of coefficients or impose a strict constraint between them. We proposed a mixture cure model considering the variable effect consistency using a sign-based penalty. It is a more flexible model that allows the two sets of coefficients to be in different distributions and magnitudes. To accommodate high-dimensional credit data, a group lasso penalty is also imposed for variable selection. Simulation shows that the proposed method has competitive performance compared with alternative methods in terms of estimation and prediction. Furthermore, the empirical study illustrates that the proposed method outperforms the alternative method and can improve the interpretability of the results.

1. Introduction

Credit scoring is an effective and crucial approach for evaluating credit risk [1, 2]. A slight improvement in the prediction precision of credit scoring models can bring considerable benefits. Therefore, credit scoring has attracted increasing attention of scholars and practitioners. Many studies treat it as a classification problem to distinguish noncreditworthy customers from creditworthy ones [2, 3]. These studies focus on classification techniques including logistic regression, support vector machine, neural network, and random forest [4, 5]. For example, Zhou et al. [6] proposed a logistic regression method with clustering analysis for credit risk evaluation. Zhang et al. [7] proposed a cost-sensitive logistic regression model to assess the credit risk. Considering the high cost and time consumption of credit scoring, a credit granting process using three-way decisions is proposed to make efficient credit decisions [8]. Since the exposure to risk and the losses caused by default are strongly related to the time when they default [9], credit risk prediction overtime is of great significance for timely risk management.

Survival analysis, with its ability of predicting the probability of default over time, has been first applied in credit scoring in 1992 [10]. It can be more informative than the binary classification model. Subsequently, various survival analysis models are proposed to predict credit risk over time. For instance, Cox proportional hazards (PH) model is adopted to predict early repayment and time to default in personal loans and investigated the effect of different variables on time to default [11, 12]. In addition, macroeconomic factors and time-varying data are also incorporated in survival analysis to improve the performance of prediction in credit scoring [13, 14]. And the models are further extended by a survival gradient boosting decision tree approach to enhance the prediction performance [15].

However, standard survival analysis assumes that the loan term is long enough and every customer will eventually default. In practice, a substantial proportion of customers will not default during the entire loan term. Since mixture cure models applied in medicine assume that some patients have been cured and will not die during the follow-up period, it is more appropriate in the credit market and was first introduced to credit scoring by Tong et al. [16].

Recently, the mixture cure model, an extension of the standard survival analysis, is widely adopted in credit scoring for its ability of predicting not only whether customers will default but also when they are likely to default. Results showed that the mixture cure model is more suitable for credit data compared with standard survival analysis models and the mixture cure model incorporating penalized spline has better performance in prediction [17]. Mixture cure models have been further developed by identifying different risk patterns of customers, considering the influence of competitive risk, and the relationship between the default times and the variables [18–20].

Mixture cure models consist of two parts: an incident part which predicts the probability of default and a latency part which predicts when they are likely to default. In the two model parts, the two sets of the coefficients indicate the two sets of the variable effects on the credit risk. The two model parts describe two quite relevant credit aspects. Nevertheless, most of the existing studies ignore the relations between the two sets of coefficients in two model parts. These works generally assume that there are no direct constraints between the two sets of coefficients, which may get conflicting results of variable effects. For example, Dirick et al. [21] propose a mixture cure model incorporating macroeconomic factors to predict credit risk. The results show that the customers’ annual income has the opposite effect on whether and when to default. In other words, according to the results, customers with lower annual income have a lower probability of default, but they are more likely to default earlier. It is difficult to interpret the conflicting results and apply them in practice.

In fact, the two model parts describe two quite relevant credit aspects, namely, the probability of default and survival (nondefault) time. Customers with high default probability are more likely to default earlier. So, it is reasonable to expect that the two sets of the coefficients are somewhat related. Theoretical derivations [22] and empirical analysis [23] also suggest that relaxing the independence of two sets of coefficients can improve the model performance. The assumption has been relaxed by establishing a joint distribution of the defaulting predictor and the logarithm of the hazard rate in [23]. Note that the two model parts still describe two different aspects of default. The assumption of a joint distribution may be too strict. So, we consider a more flexible model that allows the two sets of coefficients to be in different distributions and magnitudes. Sign consistency penalty is proposed to promote the similarity in sign to get more interpretable results by Zhang et al. [24]. In this paper, we propose a variable effect consistency mixture cure model with a sign-based penalty. The proposed method can promote the similarity in the signs of variable effect in the two model parts to improve interpretability. To accommodate high-dimensional credit data, a group lasso penalty is also imposed for variable selection [25].

The contributions of this paper are as follows. First, we propose a variable effect consistency mixture cure model. The proposed method can lead to more interpretable results by promoting the similarity in the signs of coefficients in the two parts of the mixture cure model. Second, a group lasso penalty is imposed to select important variable subgroups and accommodate the high-dimensional data. Third, simulation and empirical analysis of credit data illustrate that the proposed method can improve the prediction accuracy as well as interpretability, which has important practical significance for applying the prediction results to the credit business.

The remainder of the paper is organized as follows. Section 2 introduces the variable effect consistency mixture cure model. Computational algorithm is presented in Section 3. Simulation is carried out in Section 4. Empirical study is presented in Section 5. Finally, conclusions are discussed in Section 6.

2. Methods

In this paper, we consider credit data with customers and variables. Denote as the time to default and as the time of censoring. Let be the unobservable binary variable with indicating that the customer is cured and will not default (), whereas indicates an uncured customer and will eventually default.

Denote as the censoring indicator of customer , where and are the time to default and censoring time of customer , respectively. if censored and otherwise. Note that there are three possible credit states of customers. (a) and : censored, cured customers who will not default; (b) and : censored, uncured customers who will eventually default and have not been observed to default in censoring time ; (c) and : uncensored, uncured customers who have been observed to default.

Denote . Note that, in many practical cases, variables can be naturally grouped. For instance, many categorical variables may have several levels and can be represented by subgroups of dummy variables [26]. The additive model with polynomial or nonparametric components can be expressed as groups of basis functions [27]. In addition, grouping structure can also be introduced by taking advantage of prior knowledge. For example, genes belonging to the same biological pathway can also be considered a group [28]. Let be the variable vector with subgroups. is the -th subgroup of variable vector, and . The observable data are , .

The incident part of the mixture cure model describes the probability of default, for which we adopt a logistic regression model. Let be the probability of cured (nondefault) customer .where is the intercept, is the vector of unknown regression coefficients, and the -th subgroup of the coefficient vector is .

In the latency part, for uncured customers, we adopt an exponential model for survival. Note that the exponential model has been commonly adopted in mixture cure models [29, 30]. It is easy to capture the relations between the probability and the time to default for it includes only one parameter [23]. The survival function iswhere is the hazard function of customer , is the intercept, is the vector of unknown regression coefficients, and the -th subgroup of the coefficient vector is . Survival function indicates the survival probability of uncured customers in time , that is, the probability of default in time given the customer will default.

The mixture cure model can be given bywhere is the survival probability of customer in time .

For observable data , the objective function can be written as follows.where is the log-likelihood function, which iswith , and . Here, is the penalty function, which iswhere and are tuning parameters, is the norm, and is the sign function. In many practical cases, grouping structure arises naturally. In addition, it is hard to interpret the results when coefficients corresponding to the same variables have conflicting signs. Therefore, we consider a flexible mixture cure model with sign consistency and group variable selection penalties. The first penalty is a group lasso penalty. It can conduct estimation and group variable selection by shrinking the coefficients of insignificant groups to 0. It considers grouping structures and has good prediction performance [26]. The second penalty is the sign consistency penalty. It promotes the sign consistency of and in the two parts of the model, which can lead to more interpretable results [31].

3. Computational Algorithm

In this section, the Expectation Coordinate Descent (ECD) algorithm is developed to optimize the objective function. In E-step, a latent unobserved is introduced to obtain a complete log-likelihood function. In CD-step, group coordinate descent is adopted to iteratively update a single parameter with the remaining parameters fixed at their most recent values. Sign function is difficult to optimize for its discontinuity and nondifferentiability. Therefore, referring to [24, 32], we propose the approximation as follows:where is a small positive constant (more discussions below).

The ECD algorithm updates in the –th iteration as follows.

3.1. E-Step

Denote the observation of the latent as and denote the complete data as . The complete log-likelihood iswhere

The expectation of iswhere

When customer is observed to default (), the unobserved , whereas the expectation of is related to the probability of cured and the uncured but censored customers.

In E-step, we take the expectation of with respect to given the complete data .where

3.2. CD-Step

In CD-step, group coordinate descent is adopted to iteratively update . The intercept is updated by

For , we adopted a fast unified algorithm, Groupwise Majorization Descent (GMD) proposed in [33] to solve the group lasso penalized objective function in (4). The upper bound of the objective function is as follows:

Here, is -length vector, and is a constant as follows:where is the maximum eigenvalue function.

Similarly, the intercept is updated by

For , consider the optimization function:

Here, is -length vector, and is a constant as follows:

The tuning parameters, and , are selected by 5-fold cross-validation. The parameter in the approximation of the sign function controls the degree of approximation [24]. A smaller leads to a better approximation but less stable estimation. The proposed method is valid as long as is not too large, and the parameters with different signs can be distinguished [34]. Therefore, as suggested in [31], we set , which leads to satisfactory results.

The ECD algorithm is summarized in Table 1.

4. Simulations

In this section, some experiment examples are given to illustrate the performance of the proposed method compared to alternative methods. The proposed method is a mixture cure model with group lasso and sign consistency (MCGS). Two alternative methods are the standard mixture cure model without variable selection and sign consistency penalty (Full) and the mixture cure model with group lasso penalty (MCG), respectively. For comparison, alternative methods both adopt the logistic regression in the incident part and the exponential model in the latency part.

Here, we set sample size and consider low-dimensional data with and high-dimensional data with . The censoring time is generated from an exponential distribution with censoring rates . We consider three examples regarding different grouping structures of coefficients and different types of variables. The true values of coefficients are generated according to the following settings in three examples: Example 1: for each subgroup, we set . Intragroup variables and are generated from a multivariate normal distribution with the correlation coefficient , whereas intergroup variables are independent. Denote the true coefficients as and . The coefficients of the two scenarios are shown as follows: Scenario 1: Scenario 2: Example 2: the settings are similar to Example 1 except for the subgroup settings. We set 15 variables in the first subgroup, 5 variables in the second subgroup, and 10 variables for the remaining subgroups. The coefficients are shown as follows: Example 3: consider the case with some discrete variables. For each subgroup, we set . A latent variable is generated from a multivariate normal distribution with the intragroup correlation coefficient and intergroup correlation independent. The coefficient setting is the same as Example 2. is defined as follows:

The performance of each model is me asured by 5 measures. Denote , as the estimation of , as the estimation of , and as the estimation of . The true positive rate (TPR), false positive rate (FPR), and mean square error (MSE) with respect to , , and can be written as follows:where

The relative root mean square error of the cure rate estimation () and the relative root mean square error of the hazard function estimation () are

Tables 2–5 show the mean TPR, FPR, and MSE of the coefficients, as well as the standard deviations over the 100 replicates for each example.

As indicated in Tables 2–5, the two group selection methods (MCGS and MCG) perform significantly better than the Full method. This is expected since the group lasso can select important subgroups of variables. Comparing the two group methods, the proposed method has competitive performance compared with the MCG method. It indicates that promoting sign consistency improves the performance in terms of estimation. For instance, under Scenario 1 in Example 1 with and in Table 2, the mean MSEs of , , and for the proposed method are 0.12, 0.04, and 0.09, respectively, compared to 1.00, 0.05, and 0.63 for the MCG method and 17.3, 2.38, and 11.56 for the Full method.

Tables 6–8 show the mean RMSE of and , as well as the standard deviations over the 100 replicates for each example. The results illustrate the performance in terms of prediction of the probability of nondefault and survival.

As shown in Tables 6–8 the prediction performance of group selection methods is significantly better than that of the Full method, and the proposed method has competitive performance compared with the MCG method. For example, in Example 2 with and in Table 8, the mean and for the proposed method are 0.01, and 0.07, respectively, compared to 0.04 and 0.12 for the MCG method and 0.27 and 10.01 for the Full method. In addition, compared with the results regarding low- and high-dimensional settings, the group selection methods have greater advantages in prediction performance when the dimensionality is higher.

Results of simulation reveal that the proposed variable effect consistency mixture cure model can improve the performance in terms of estimation and prediction compared with alternative methods.

5. Empirical Study

In this section, we applied our proposed method to real data on credit loans. The data come from the personal loan department of a city commercial bank in China, which contains 4796 personal loan samples from 2014 to 2019 after preprocessing. The data include mortgage loans and credit loans, covering consumer durables, personal housing decoration loans, and other personal consumption loans. Censoring time is the interval between the loan value date and either default or the end of observation (June 1, 2019). Therefore, censoring times of customers vary from individual to individual. It has a mean of 1.93 years and a standard deviation of 0.8. Customers whose time to default is longer than the censoring time are censored (). 47 out of 4796 customers are censored. By transforming the discrete variables into dummy variables, the data contain 27 variables. Table 9 provides a list of variables and their descriptions.

In this section, the alternative method is a mixture cure model with group lasso (MCG). Different from the simulation, the real values of parameters and are unknown in real data. Referring to [30, 35], we adopt the (1) log-rank statistics and (2) negative log-likelihood to evaluate the performance of the models instead of , , , , and in simulation.

Log-rank statistics is a commonly used indicator in survival analysis to test the null hypothesis that there is no significant difference in survival distribution between two or more independent groups. It is calculated by cross-validation. We sequentially take samples as the validation set and the remaining as the training set. Apply the proposed and alternative methods to obtain the estimation of and and then calculate the and for the validation set. Results of and are based on 10 replicates. Divide the calculated into two groups at the median and calculate the log-rank statistics. Similarly, divide the calculated into two groups at the median and calculate the log-rank statistics. The mean log-rank of the proposed and the alternative methods is 5.6 and 4.3 respectively, indicating better performance of the proposed method.

Figure 1 shows the Kaplan–Meier curves stratified by different groups. Kaplan–Meier curves are commonly used to describe the change of the survival probability overtime in survival analysis [36, 37]. The probability of being cured is negatively related to , and the survival time is negatively related to . In Figure 1, a group is denoted by “low risk” with lower (a) or lower (b) whereas another group is denoted by “high risk.” As indicated in Figure 1, there are clearly different trends in the curves in different groups. Customers with lower and have lower risk and are less likely to default.

(a)

(b)

To assess the performance of the model, the data are randomly divided into training set and test set by 2 : 1. The training set is used for fitting the model and the test set is used for evaluating the prediction performance of the fitted model. The tuning parameters are selected by 5-fold cross-validation. The mean (standard deviation) negative log-likelihood of the proposed method (MCGS) and the alternative method (MCG) is 106.04 (16.04) and 118.60 (19.09), respectively. The result is based on 100 duplicates. It indicates that the proposed method performs better than the alternative in terms of model fit and prediction.

Table 10 reports the estimations of the MCGS method and the MCG method. A positive coefficient indicates that the variable is positively related to the probability of default, and a positive coefficient indicates that the variable is negatively related to default time. Both probabilities of default and default time are two quite relevant credit aspects. Compared with the alternative method, the signs of the and of the proposed method are promoted to be more consistent, whereas the business type in the MCG model has an opposite effect on the probability and time to default. The results show that promoting variable effect consistency can improve prediction performance as well as interpretability.

The coefficient results of the proposed method reveal that interest rate, loan line, business type, gender, education, and employment status are important variables that affect the probability of default and the time to default. Loan term, age, medical insurance, entrusted payment, early repayment, annual household income, type of workplace, and housing status have no significant impact on credit. The impact of occupation and professional title on credit is not clear.

From the perspective of loan products, we find that interest rate has a negative impact. This is not surprising, as higher interest rates lead to higher costs, and the customers are more likely to default. The loan line has a positive effect. One possible explanation is that low-risk customers are more likely to obtain a higher loan line. Loan term, entrusted payment, and early repayment have no effect on the credit. Different coefficient of business type reveals that, compared with other personal loans, consumer durables are more likely to default.

From the perspective of the influence of the variables of the customers, employed customers have a positive impact. Age, annual household income, housing status, and type of workplace have no significant effect on the credit. Compared with women, men are more likely to default. This is consistent with the results of [38] and the personality characteristics of men’s risk preference [39]. Customers with higher education are less likely to default. Bachelor degree or above has a positive effect on credit. Generally, customers with higher education have a higher chance of getting decent jobs and income. They tend to maintain good credit records and are less likely to default. Compared with other employment groups such as self-employed, freelance, and unemployed ones, the employed group has a more stable income and is less likely to default.

6. Conclusions

The mixture cure model is widely adopted in credit scoring for its ability of predicting whether customers will default and when they are likely to default. However, most of the existing studies ignore the relations between the two sets of variable effects in the two model parts which may get conflicting results of variable effects. It can be difficult to interpret the results and apply them in practice.

In this paper, we propose a variable effect consistency mixture cure model, to promote the similarity of the sign of variables in the two model parts by imposing a sign consistency penalty. Meanwhile, to accommodate the high-dimensional credit data, we also impose a group lasso penalty to conduct variable selection and parameter estimation. Simulation shows that the proposed method has competitive performance compared with the MCG method and significantly outperforms the Full method in terms of estimation and prediction. Furthermore, the empirical study illustrates that the proposed method can improve prediction performance as well as interpretability. The results of the variable effect consistency mixture cure model also offer additional insights into the relationship between the variable effect before and after loan.

Data Availability

The raw/processed data used in the empirical study cannot be shared at this time as the data also form part of an ongoing study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the National Office for Philosophy and Social Sciences of China under Grant no. 20&ZD137 and the National Bureau of Statistics of China under Grant no. 2020ZX20.

References

D. M. B. Silva, G. H. A. Pereira, and T. M. Magalhães, “A class of categorization methods for credit scoring models,” European Journal of Operational Research, vol. 296, no. 1, pp. 323–331, 2022.
View at: Publisher Site | Google Scholar
P. Pławiak, M. Abdar, J. Pławiak, V. Makarenkov, and U. R. Acharya, “A new deep genetic hierarchical network of learners for prediction of credit scoring,” Information Sciences, vol. 516, pp. 401–418, 2020.
View at: Publisher Site | Google Scholar
R. Y. Goh, L. S. Lee, H.-V. Seow, and K. Gopal, “Hybrid harmony search–artificial intelligence models in credit scoring,” Entropy, vol. 22, no. 9, Article ID 989, 2020, https://doi.org/10.3390/e22090989.
View at: Google Scholar
B. Baesens, T. Van Gestel, S. Viaene, M. Stepanova, J. Suykens, and J. Vanthienen, “Benchmarking state-of-the-art classification algorithms for credit scoring,” Journal of the Operational Research Society, vol. 54, no. 6, pp. 627–635, 2003.
View at: Publisher Site | Google Scholar
S. Lessmann, B. Baesens, H.-V. Seow, and L. C. Thomas, “Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research,” European Journal of Operational Research, vol. 247, no. 1, pp. 124–136, 2015.
View at: Publisher Site | Google Scholar
Q. Zhou, L. Wang, L. Juan, S. Zhou, and L. Li, “The study on credit risk warning of regional listed companies in China based on logistic model,” Discrete Dynamics in Nature and Society, vol. 2021, Article ID 6672146, 8 pages, 2021.
View at: Publisher Site | Google Scholar
L. Zhang, H. Ray, J. Priestley, and S. Tan, “A descriptive study of variable discretization and cost-sensitive logistic regression on imbalanced credit data,” Journal of Applied Statistics, vol. 47, no. 3, pp. 568–581, 2020.
View at: Publisher Site | Google Scholar
S. Maldonado, G. Peters, and R. Weber, “Credit scoring using three-way decisions with probabilistic rough sets,” Information Sciences, vol. 507, pp. 700–714, 2020.
View at: Publisher Site | Google Scholar
C. Lohmann and T. Ohliger, “Using accounting-based and loan-related information to estimate the cure probability of a defaulted company,” European Financial Management, vol. 27, pp. 620–640, 2021, https://doi.org/10.1111/eufm.12279.
View at: Google Scholar
B. Narain, “Survival Analysis and the Credit Granting Decision,” in Credit Scoring And Credit Control, pp. 109–121, Oxford University Press, New York, NY, USA, 1992.
View at: Google Scholar
J. Banasik, J. N. Crook, and L. C. Thomas, “Not if but when will borrowers default,” Journal of the Operational Research Society, vol. 50, no. 12, pp. 1185–1190, 1999.
View at: Publisher Site | Google Scholar
M. Stepanova and L. Thomas, “Survival analysis methods for personal loan data,” Operations Research, vol. 50, no. 2, pp. 277–289, 2002.
View at: Publisher Site | Google Scholar
T. Bellotti and J. Crook, “Credit scoring with macroeconomic variables using survival analysis,” Journal of the Operational Research Society, vol. 60, no. 12, pp. 1699–1707, 2009.
View at: Publisher Site | Google Scholar
V. B. Djeundje and J. Crook, “Dynamic survival models with varying coefficients for credit risks,” European Journal of Operational Research, vol. 275, no. 1, pp. 319–333, 2019.
View at: Publisher Site | Google Scholar
Y. Xia, L. He, Y. Li, Y. Fu, and Y. Xu, “A dynamic credit scoring model based on survival gradient boosting decision tree approach,” Technological and Economic Development of Economy, vol. 27, no. 1, pp. 96–119, 2021, https://doi.org/10.3846/tede.2020.13997.
View at: Google Scholar
E. N. C. Tong, C. Mues, and L. C. Thomas, “Mixture cure models in credit scoring: if and when borrowers default,” European Journal of Operational Research, vol. 218, no. 1, pp. 132–139, 2012.
View at: Publisher Site | Google Scholar
L. Dirick, G. Claeskens, and B. Baesens, “Time to default in credit scoring using survival analysis: a benchmark study,” Journal of the Operational Research Society, vol. 68, no. 6, pp. 652–665, 2017.
View at: Publisher Site | Google Scholar
B. C. Alves and J. G. Dias, “Survival mixture models in behavioral scoring,” Expert Systems with Applications, vol. 42, no. 8, pp. 3902–3910, 2015.
View at: Publisher Site | Google Scholar
N. Zhang, Q. Yang, A. Kelleher, and W. Si, “A new mixture cure model under competing risks to score online consumer loans,” Quantitative Finance, vol. 19, no. 7, pp. 1243–1253, 2019.
View at: Publisher Site | Google Scholar
C. Jiang, Z. Wang, and H. Zhao, “A prediction-driven mixture cure model and its application in credit scoring,” European Journal of Operational Research, vol. 277, no. 1, pp. 20–31, 2019.
View at: Publisher Site | Google Scholar
L. Dirick, T. Bellotti, G. Claeskens, and B. Baesens, “Macro-economic factors in credit risk calculations: including time-varying covariates in mixture cure models,” Journal of Business & Economic Statistics, vol. 37, no. 1, pp. 40–53, 2019.
View at: Publisher Site | Google Scholar
C. Han and R. Kronmal, “Two-part models for analysis of Agatston scores with possible proportionality constraints,” Communications in Statistics-Theory and Methods, vol. 35, no. 1, pp. 99–111, 2006.
View at: Publisher Site | Google Scholar
F. Liu, Z. Hua, and A. Lim, “Identifying future defaulters: a hierarchical Bayesian method,” European Journal of Operational Research, vol. 241, no. 1, pp. 202–211, 2015.
View at: Publisher Site | Google Scholar
Q. Zhang, S. Ma, and Y. Huang, “Promote sign consistency in the joint estimation of precision matrices,” Computational Statistics & Data Analysis, vol. 159, Article ID 107210, 2021, https://doi.org/10.1016/j.csda.2021.107210.
View at: Google Scholar
Q. Zhang, S. Zhang, J. Liu, J. Huang, and S. Ma, “Penalized integrative analysis under the accelerated failure time model,” Statistica Sinica, vol. 26, no. 2, pp. 492–508, 2016.
View at: Publisher Site | Google Scholar
J. Huang, P. Breheny, and S. Ma, “A selective review of group selection in high-dimensional models,” Statistical Science: A Review Journal of the Institute of Mathematical Statistics, vol. 27, pp. 481–499, 2012.
View at: Publisher Site | Google Scholar
M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society: Series B, vol. 68, no. 1, pp. 49–67, 2006.
View at: Publisher Site | Google Scholar
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York, NY, USA, 2009.
M. E. Ghitany, R. A. Maller, and S. Zhou, “Exponential mixture models with long-term survivors and covariates,” Journal of Multivariate Analysis, vol. 49, no. 2, pp. 218–241, 1994.
View at: Publisher Site | Google Scholar
X. Fan, M. Liu, K. Fang, Y. Huang, and S. Ma, “Promoting structural effects of covariates in the cure rate model with penalization,” Statistical Methods in Medical Research, vol. 26, no. 5, pp. 2078–2092, 2017.
View at: Publisher Site | Google Scholar
C. Zheng, J. Zhu, and J. Zhu, “Promote sign consistency in cure rate model with Weibull lifetime,” AIMS Mathematics, vol. 7, no. 2, pp. 3186–3202, 2022.
View at: Publisher Site | Google Scholar
X. Shi, S. Ma, and Y. Huang, “Promoting sign consistency in the cure model estimation and selection,” Statistical Methods In Medical Research, vol. 29, no. 1, pp. 15–28, 2020.
View at: Publisher Site | Google Scholar
Y. Yang and H. Zou, “A fast unified algorithm for solving group-lasso penalize learning problems,” Statistics and Computing, vol. 25, no. 6, pp. 1129–1141, 2015.
View at: Publisher Site | Google Scholar
K. Fang, X. Fan, Q. Zhang, and S. Ma, “Integrative sparse principal component analysis,” Journal of Multivariate Analysis, vol. 166, pp. 1–16, 2018.
View at: Publisher Site | Google Scholar
J. Rodrigues, M. de Castro, V. G. Cancho, and N. Balakrishnan, “COM-Poisson cure rate survival models and an application to a cutaneous melanoma data,” Journal of Statistical Planning and Inference, vol. 139, no. 10, pp. 3605–3611, 2009.
View at: Publisher Site | Google Scholar
T. Chen and P. Du, “Promotion time cure rate model with nonparametric form of covariate effects,” Statistics in Medicine, vol. 37, no. 10, pp. 1625–1635, 2018.
View at: Publisher Site | Google Scholar
S. Pal and N. Balakrishnan, “Likelihood inference based on EM algorithm for the destructive length-biased Poisson cure rate model with Weibull lifetime,” Communications in Statistics - Simulation and Computation, vol. 47, no. 3, pp. 644–660, 2018.
View at: Publisher Site | Google Scholar
Y. Li, Y. Li, and Y. Li, “What factors are influencing credit card customer’s default behavior in China? A study based on survival analysis,” Physica A: Statistical Mechanics and Its Applications, vol. 526, Article ID 120861, 2019.
View at: Publisher Site | Google Scholar
Y. Shu and Q. Y. Yang, “Research on auto loan default prediction based on large sample data model,” Management Review, vol. 29, no. 9, pp. 59–71, 2017.
View at: Google Scholar

Copyright

Copyright © 2022 Chenlu Zheng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

240

Downloads

399

Citations