Abstract

This study selects Chinese borrowers’ information from a platform that has both online shopping and consumer loan service as sample, studies the effect of consumer information in personal credit risk evaluation, and uses the lLogistic regression model, light gradient boosting machine (LightGBM) algorithm, and Shapley Additive Explanation (SHAP). The results show that the information of all consumer loan groups cannot be covered by traditional credit information. Consumer information can help predict the behavior of borrower’s repayment and provide support for personal credit risk evaluation effective. Adding consumption information to the personal credit risk evaluation model can improve the accuracy of the model effectively. The model variables are ranked by feature importance, and there are 5 consumption indicators in the first 5 indicators of feature importance, which further verifies the value and effect of consumption information in personal credit risk evaluation. This study not only reveals the effect and value of consumer information in personal credit risk evaluation effectively, but also provides new ideas for the development of consumer financial market.

1. Introduction

At present, a new round of global scientific and technological revolution continues to deepen, financial technology is rising rapidly, and new business forms, new models, and new products emerge one after another. It not only changes the operation mode of traditional financial services, but also promotes the development of Internet personal consumption loans. At the same time, the vigorous development of personal consumer loans will have an impact on customers’ intertemporal consumption behavior. Under this background, the personal credit risk evaluation index system is becoming more and more abundant, and multi-dimensional and massive data are fused and processed. Big data credit investigation is an inevitable trend of personal credit risk evaluation. Nevertheless, some data are missing and the amount of data is lacking, so it is difficult to train [1]. The PBC credit investigation system cannot cover all borrowers’ information, which requires financial institutions to explore indicators beyond traditional personal credit risk evaluation index system. The traditional evaluation index system rarely uses consumption information as the evaluation index of personal credit risk. Can consumer information be used for personal credit risk assessment? What is the value and function of consumer information in personal credit risk evaluation? This is the question to be answered in this paper. This study uses the Chinese borrower’s data of a platform that has both online shopping and consumer loan to explore this problem.

Driven by financial technology, Internet consumer loans came into being and have made great progress. In early 2014, JD finance launched the “JD white note” to open the chapter of Internet consumer credit. Ant group followed closely with the launch of “Ant Huabei.” Since 2015, commercial banks, consumer finance companies, and other financial institutions have vigorously deployed Internet consumer finance business. Up to now, the scale of Internet consumer credit has expanded rapidly, with a scale of 18.7 billion RMB at the beginning of 2014 and 15.4 trillion RMB at the end of 2020, with a compound annual growth rate of 206%. By the end of 2020, there were 778 million credit cards and debit-credit cards used in China, and the number of cards per capita was 0.56. According to the questionnaire analysis of 110 cities in 30 provinces and cities in China by China Institute of Economic Thought and Practice of Tsinghua University, residents’ acceptance and satisfaction with consumer finance companies are increasing, and the annual growth rate of adults willing to accept consumer finance services is more than 10%. Compared with the blowout-type development of Internet loan market, the improvement of personal credit risk management ability is insufficient slightly. The most important factor restricting the in-depth study of consumer finance is the lack of micro-data and continuous micro-household sampling [2]. The use of utilizing new information beyond traditional credit investigation methods by financial science and technology innovation helps to reduce information asymmetry in the consumer credit market. It provides an important channel for marginal groups of traditional financial services to obtain high-quality credit services and establish personal credit records [3].

Personal consumer loans have the characteristics of “small amount, unsecured and flexible,” and the default risk cannot be quantitatively monitored through the traditional postloan behavior score. Therefore, preloan review has become an important measure to prevent credit risk. In recent years, the asset quality of personal consumer loans has been moving downward, and the nonperforming rate has continued to grow, which has brought new requirements and challenges to the credit risk management of consumer loans. At present, the personal credit-related information comes from the personal credit information of the People’s Bank of China. The mainstream three-party credit service institutions such as Baihang credit and Bairong Inc. also provide effective data for personal credit risk evaluation. However, the personal credit reporting system of the people’s Bank of China does not cover enough dimensions in terms of data comprehensiveness, hierarchy, and timeliness. The personal credit information mastered by the third-party data company is different from the traditional credit information, mainly from the customer’s loan information and fraud score on P2P network lending platforms, microfinance companies, and other platforms. The introduction of third-party data has confirmed that it has great practical significance to apply information other than the credit investigation information of the People’s Bank of China to the credit risk evaluation of personal consumer loans. Different from third-party data, consumer information is highly available, but it cannot reflect the solvency and willingness to repay. Therefore, the value and effect of consumer information in personal credit risk evaluation are worthy of in-depth discussion.

Different from the existing research, this paper has made some progress in the following two aspects: first of all, this paper deeply analyzes the differences between consumer information and traditional credit information in personal credit risk evaluation. Second, it proves that consumer information can be an effective supplement to traditional credit information in the practice of personal credit risk evaluation.

2. Previous Research

For a long time, information asymmetry is the main factor affecting personal credit risk. Stiglitz and Weiss [4] first proposed that although the borrower’s credit history information (such as credit rating and historical performance) is open to all investors, network anonymity may aggravate the typical information asymmetry of online lending [4]. Information asymmetry also exists in the online lending market [5, 6]. The existence of information asymmetry will lead to two kinds of effects of market shrinkage: one is the “loan sparing”—the effective supply of the market is shrinking; the second is the “crowding out”—the effective demand of the market is shrinking [7].

In the research on the influencing factors of personal loan default, scholars found that there are internal and external factors, which can also be reflected in “hard information” and “soft information” [8, 9]. “Hard information” refers to personal information with strong authenticity and objective existence, such as age, gender, job grade, and credit score, and “soft information” refers to the borrower’s group, the number of friends in the social network, consumption ability level, macro-information, and other information [10, 11]. Su and Cheng [12] found that “soft information” is also valuable in their research on the factors affecting the default behavior of online lending borrowers [12]. Chi et al. [13] showed that macroeconomic factors have a certain impact on the borrower’s repayment [13].

For a long time, establishing an effective personal credit evaluation system is the core content of personal credit risk management. Scholars’ research on personal credit risk evaluation mainly includes two aspects: one is the construction of personal credit risk evaluation index system, and the other is the methods of personal credit risk evaluation. Most of the existing evaluation index systems take the indicators in customers’ historical credit transactions as the main indicators [1416]. In the early stage, the credit risk evaluation methods mainly used logistic regression and expert discrimination, and the logistic regression model in the regression analysis model is the most widely used personal credit risk evaluation model so far [1725]. The core model of FICO score is logistic regression algorithm. At present, China’s commercial banks mainly use analytic hierarchy process and fuzzy evaluation method to evaluate the credit risk of borrowing customers [26]. With the development of financial technology and artificial intelligence, machine learning algorithm can predict the default risk of credit subjects more efficiently and accurately. Verleysen and Francois [27] first applied the decision tree method in personal credit evaluation and achieved good evaluation results [27]. Houle et al. [28] improved the decision tree method based on boosting algorithm on the basis of considering the sample attributes of personal credit risk assessment [28].

However, the mainstream credit evaluation mostly uses the borrower’s historical loan information to build the evaluation model, which cannot cover all customer groups. This leads to some customers being shut out of mainstream credit services, unable to obtain financial support and subject to serious financial constraints. In order to break this constraint, it is particularly important to explore whether indicators beyond traditional credit evaluation system can identify credit risk and be used in credit risk evaluation. Bertrand and Kamenica [29] document that owning an IOS device is one of the best predictors for being in the top quartile of the income distribution [29]. Belenzon et al. [30], and Guzman and Stern [31] have documented that customers having their names in the e-mail address are 30% less likely to default [30, 31]. Digital footprints can facilitate access to credit when credit bureau scores do not exist, thereby fostering financial inclusion and lowering inequality [3236].

It can be seen that information beyond mainstream credit evaluation indicators can also achieve significant results in credit risk evaluation. Continuously mining information beyond mainstream credit evaluation indicators becomes more and more important to reduce the credit risk caused by information asymmetry and can also continuously improve the ability of risk identification [1, 3]. This study focuses on the effect of consumer information in personal credit risk evaluation. Consumer information has the characteristics of high frequency and timeliness, which is different from the traditional credit indicators. This study attempts to use empirical analysis to reveal the effect of consumption information in personal credit risk evaluation.

3. Data and Variable Description

3.1. Sample Selection

The data of this paper come from a commercial bank. Platforms like ANT and JD have certain channels to obtain customers, but they do not have enough funds, while commercial banks have lacked large-scale channels to obtain customers but have funds. Therefore, the joint loan between the platform and the bank came into being in China. We obtained the borrowers’ information data from a Chinese platform which cooperated in joint loan from a commercial bank. As a large life service platform, the platform involves food, hotel, tourism, film, and group purchase. As of December 31, 2020, the number of active merchants and annual transaction users of the platform had increased to 6.8 million people and 510 million RMB, respectively. Therefore, it is very persuasive to select the borrowers’ information data of the platform as a sample for empirical research. This study constructs an evaluation index system by obtaining the borrowers’ behavior of repayment, consumption information of the platform, transaction information of the platform, information of credit card use, credit information of credit card, personal information, and loan information. According to the borrowers’ repayment performance on the platform, the customer is defined as “good” or “bad” to explore the effect of consumption information in personal credit evaluation. We selected 185600 customer information data of the credit granted from April 2020 to August 2020, and 50000 pieces of customer information were randomly selected as samples for analysis.

3.2. Variable Description

The core variables involved in the empirical design of this study are as follows.

3.2.1. Behavior of Repayment

Through the vintage analysis of the behavior of repayment of customers on the platform, the results show that when the borrower has overdue for more than 30 days, there is a 93.75% probability that it will continue to be overdue. Therefore, if the overdue days of the borrower are more than 30 days, the customer is defined as “bad customer” (default = 1). On the contrary, if the customer has no overdue performance or the overdue days are not more than 30 days, it is defined as “good customer” (default = 0).

3.2.2. Consumption Information of the Platform

The consumption information of the platform comes from the consumption records of the borrower on the platform. The following three indicators are formed for the processing of the borrower’s consumption records on the platform within one year: consumption ability, consumption frequency, and consumption scene.

3.2.3. Transaction Information of the Platform

The transaction information of the platform comes from the borrower’s transaction records on the platform within 90 days, including days of transaction, amount of successful transaction, number of successful transaction, and active days.

3.2.4. Information of Credit Card Use

The information of credit card use comes from the borrower’s personal credit information, including average quota utilization and maximum quota utilization.

3.2.5. Credit Information of Credit Card

The credit information of credit card also comes from the borrower’s personal credit information, including average quota, number, and total amount.

3.2.6. Control Variable

There are two types of control variables: borrower’s personal information and loan information. Personal information includes age, gender, and state of marriage and job; loan information includes amount and rate.

The definitions of main variables and descriptive statistics are shown in Tables 1 and 2 respectively.

4. Evaluation Model and Empirical Results

4.1. Construction of Model
4.1.1. The Effect of Consumer Information in Personal Credit Risk Evaluation

In order to explore the effect of consumer information in personal credit risk evaluation deeply, we use logistic regression model for empirical analysis. The dependent variable is customer behavior of repayment. Independent variables include platform consumption information, platform trading information, and credit card usage information; credit card credit information and control variables include the borrower’s personal information and loan information.

In order to study the recognition ability and effect of consumer information in personal credit risk evaluation, we constructed five logistic regression models. Model 1 discusses the relationship between traditional credit information and borrower’s credit risk. The independent variables include credit card credit information and control variables. Model 2 discusses the relationship between credit card use information and borrower’s credit risk. The independent variables include credit card use information and control variables. Model 3 discusses the relationship between platform consumption and transaction information and borrower’s credit risk. The independent variables include platform consumption information, platform transaction information, and control variables. Model 4 discusses the relationship between all consumption information obtained in this study and the borrower’s credit risk. The independent variables include platform consumption and transaction information, credit card use information, and control variables. Model 5 includes all independent variables. The formula is expressed as follows:

Among them, defaulti is the dependent variable, α is the coefficient, consumptioni, tradingi, cc_usei, and cc_crediti are the explanatory variable, and ε is a random error term. The logistic regression model was implemented by s as 14.3.

4.1.2. Model for Personal Credit Risk Prediction

LightGBM is an open-source, fast, and efficient lifting framework based on a decision tree algorithm, which supports efficient parallel training and can greatly shorten the training time. LightGBM algorithm is a kind of boosting algorithm. Boosting algorithm learns multiple classifiers by changing the weight of training samples and improving the performance of classifiers through linear combination. Boosting formula can be expressed as follows:

The objective function is

Among them, is the error function and is the regular term.

LightGBM also seeks the optimal solution by combining the error function with the regular term by constructing the objective function. The objective function is

The complexity term of the tree in algorithm includes the regular term of the total number of leaf nodes and the score of leaf nodes, which can produce the phenomenon of skin overfitting. The regular term is

LightGBM performs quadratic Taylor expansion on the cost function. Unlike GBDT, LightGBM can not only define the cost function, but also obtain the derivatives of the first and second orders at the same time. Then, the loss function of t is

The second-order Taylor expansion of (6) is carried out to obtain (7), and the first derivative and second derivative are (8) and (9), respectively:

LightGBM divides the eigenvalues into buckets and then constructs the histogram for splitting. The calculation formula is as follows:

The samples processed by bucket division not only improve the training speed, but also reduce the complexity of calculation and the occupation of computer memory. Due to the advantages of LightGBM algorithm, it is widely used in many fields. We use LightGBM algorithm to predict personal credit risk.

4.2. Empirical Analysis

By analyzing the real performance of the Chinese borrower’s credit risk of a platform integrating both online shopping and consumer loan, we measure the traditional credit information with the borrower’s credit card credit information, measure the consumption information with the borrower’s consumption information, transaction information, and credit card use information on the platform, and further explore the effect of consumer information in personal credit risk evaluation.

4.2.1. The Effect of Consumer Information on the Borrower’s Credit Risk

The results of logistic regression are shown in Table 3. Model 1 discusses the relationship between traditional credit information and borrower’s credit risk. The regression coefficient of the average quota is −0.13, which is significant at the 5% significant level. It indicates that the larger the average quota of the borrower, the smaller the credit risk. The regression coefficients of the amount of credit cards and total amount of credit cards are 0.08 and 0.11, respectively, which are significant at the significant level of 5% and 1%, respectively. It indicates that the more credit cards, the greater the total amount of credit card and the greater the credit risk. Model 2 discusses the relationship between credit card use information and borrower’s credit risk. The regression coefficient of average quota utilization is 0.22, which is significant at the 1% significant level. It shows that the greater the average quota utilization, the greater the credit risk. The maximum quota utilization of credit card has no significant impact on credit risk. Model 3 discusses the relationship between consumption information of the platform, transaction information of the platform, and borrower’s credit risk. Consumption ability, days of transaction, amount of successful transaction, and number of successful transaction have no significant impact on credit risk. The regression coefficients of platform consumption frequency, consumption scene, and active days are −0.14, −0.20, and −0.20, respectively, which are significant at the 1% significant level. It shows that the larger the three variables, the smaller the credit risk. Model 4 explores the relationship between all consumption information obtained and the borrower’s credit risk, and the empirical results are consistent with the above analysis. Model 5 discusses the relationship between all independent variables and the borrower’s credit risk. State of marriage and rate are positively correlated with the borrower’s credit risk at a significant level of 1%. Gender is negatively correlated with the borrower’s credit risk at a significant level of 1%-free network, and the peak of the associated credit risk appears between t = 5 − 10, which reduces the transmission speed of the associated credit risk. It can be seen that the existence of exposed subjects has a significant delaying effect on the arrival of the peak of the related credit risk infection. When the coefficient of latent transformation α = 0.2 and α = 0.5, the contagion scale of the related credit risk is less than 0.6. The smaller the coefficient of latent transformation is, the smaller the contagion scale is. It can be seen that the existence of latent subjects has a significant inhibitory effect on the contagion scale of the related credit risk. The smaller the coefficient of latent transformation, the stronger the inhibition. In conclusion, when there are exposed subjects in the network, the latent transformation coefficient has a significant delaying and inhibiting effect on the peak period and the scale of infection of the risk of the associated credit.

4.2.2. Robust Test

In the above empirical analysis, customers overdue for more than 30 days are selected as the definition condition of “bad” customers. In order to verify whether the above empirical analysis results depend on the definition conditions of dependent variables, a robust test will be carried out. Taking “overdue days more than 5 days” and “overdue days more than 60 days” as dependent variables, empirical tests are carried out for the above models 1, 4, and 5, respectively.

Table 4 shows the logistic regression results of different dependent variables. Consistent with the results of the above main regression, platform consumption information, platform transaction information, and credit card use information are significantly correlated with the borrower’s credit risk. Credit card credit information is also significantly related to the borrower’s credit risk. Based on the above analysis, the introduction of borrower consumption information on the basis of traditional credit information can better identify credit risk.

4.2.3. The Promotion Effect of Consumption Information on the Model

LightGBM algorithm can predict personal credit risk effectively and rank the importance of indicators. ROC curve can predict the accuracy of the model effectively [3740]. AUC is the area under the ROC curve, which can explain the prediction accuracy of the model. In order to compare the fitting degree of the two models, we use AUC index to evaluate the model quality [41, 42]. In this study, LightGBM algorithm is used to predict the model results of the above models 1 and 5, respectively, and draw the ROC curve. The AUC index results are shown in Figures 1 and 2. Figure 1 shows the prediction results of model 1 test samples. When there is only traditional credit information, the AUC value is 0.70. Figure 2 shows the prediction results of model 5 test samples, with AUC value of 0.81. It can be seen that the introduction of consumption information on the basis of traditional credit information can improve the accuracy of the model effectively, and the addition of consumption information improves the prediction ability of the model significantly. LightGBM algorithm is implemented by Python 3.7.

4.2.4. Feature Importance and SHAP Value of Consumption Information

In order to explore the importance of borrower consumption information and traditional credit information in personal credit risk evaluation, we use LightGBM algorithm and Shapley additive explanation to build a model to rank the feature importance and the impact of consumption information of the platform, transaction information of the platform, and information of credit card. The 12 variables are sorted by importance to generate an importance SHAP value plot, as shown in Figure 3.

According to the ranking results of feature importance and SHAP value, among the top five indicators in the ranking of importance, five consumption information indicators are in 5 places. This also confirms the value and effect of consumer information in personal credit risk evaluation. In the SHAP value, we can see consumption ability and average quota utilization have a positive impact on credit risk, and consumption ability has a strong correlation with credit risk. Active days, days of transaction, and maximum quota utilization have a negative impact on credit risk, and maximum quota utilization has a strong correlation with credit risk. It not only supplements the indicators beyond the traditional credit information, but also improves the accuracy of personal credit risk evaluation.

5. Discussion

In this study, we investigate the previous studies on personal credit risk evaluation. We also categorize the evaluation of personal credit risk from the perspective of indicators and models. The findings are as follows. First, in many evaluation methods literature, personal credit risk evaluation is considered as a modeling problem. Second, in the selection of personal credit risk evaluation indicators, personal consumption information is rarely or often ignored. Thirdly, in many cases, machine learning method can be used as a supplement to logistic regression.

Our findings also suggest the effect of consumer information in personal credit risk assessment. First, personal consumer loans will affect customers’ intertemporal consumption behavior. Second, “soft information,” which is different from customer’s basic information, is equally important for identifying personal credit risk. Finally, through the empirical evaluation of personal credit risk model, we can find more accurate consumer credit value. Of course, the prevention of personal credit risk not only stays in the evaluation, but should be able to provide clues on how to avoid default.

This study has several limitations. We do not cover all the consumer information of borrowers. Information that may affect customer consumption is not considered. Although we sort out the literature on personal credit risk assessment, we do not cover all the literature, but introduce the representative research in each aspect. Through this concentration, we come to meaningful findings and insights related to this topic.

6. Conclusions

Through the empirical analysis of personal credit risk evaluation, this paper describes the characteristics and influencing factors of consumption information and proves the value and effect of consumption information in personal credit risk evaluation. In order to alleviate the long-term credit constraints, it is necessary to promote the development of the credit market on the premise of meeting the loan needs of tail customers. With the strong promotion of financial technology, Internet credit products came into being. However, the current credit investigation system cannot cover all the information of loan demanders, and the credit risk evaluation results are inaccurate. Therefore, how to evaluate credit risk effectively has become an urgent problem to be solved. This requires financial institutions to seek information that can widely cover all loan demanders objectively. Different from the traditional credit information, consumer information has the characteristics of easy access and prominent preference, which can be used as an effective supplement to the traditional credit information. Evaluation of the effect of consumer information effective has become the focus of attention in personal credit risk evaluation.

This paper selects the borrower information from a Chinese platform that has both online shopping and consumer loan as a sample. We use logistic regression model, LightGBM algorithm, and Shapley additive explanation to analyze the value and effect of traditional credit information and consumption information in personal credit risk evaluation. The conclusions are as follows: the information of credit card use can predict the borrower’s repayment behavior effectively and provide effective support for personal credit risk evaluation. The consumption information of the platform and the transaction information of the platform can predict the borrower’s repayment behavior effectively and provide effective support for personal credit risk evaluation. The predictive model accuracy of ROC curve has proved that adding consumption information to the model can improve the accuracy of the model effectively. The importance of model variables is ranked, and there are 5 consumption indicators in the first 5 indicators, which further verifies the value and effect of consumption information in personal credit risk evaluation. Shapley addition explanation also confirmed the influence and contribution of consumption information to personal credit risk. This study not only reveals the effect and value of consumer information in personal credit risk evaluation effectively, but also provides new ideas for the development of consumer financial market.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.