Data Science and AI-based Optimization in Scientific ProgrammingView this Special Issue
Optimizing the Borrowing Limit and Interest Rate in P2P System: From Borrowers’ Perspective
P2P (peer-to-peer) lending is an emerging online service that allows individuals to borrow money from unrelated person without the intervention of traditional financial intermediaries. In these platforms, borrowing limit and interest rate are two of the most notable elements for borrowers, which directly influence their borrowing benefits and costs, respectively. To that end, this paper introduces a BP neural network interval estimation (BPIE) algorithm to predict the borrowers’ borrowing limit and interest rate based on their characteristics and simultaneously develops a new parameter optimization algorithm (GBPO) based on the genetic algorithm and our BP neural network predictive model to optimize them. Using real-world data from http://ppdai.com, the experimental results show that our proposed model achieves a good performance. This research provides a new perspective from borrowers in exploring the P2P lending. The case base and proposed knowledge are the two contributions for FinTech research.
Recent research about P2P lending is mainly focused on two aspects. The first one is the empirical research of investor’s behavior in the online loan platform to clarify the impact factors of investor’s risk preference  and investor’s choice [2–4]. The second aspect is borrowers’ credit scoring and their probability of default [5–8], which is important to control the risk for P2P lending. For the borrowers, however, there are few efforts to investigate their borrowing benefits and costs or discuss the important principles of utility enhancing. This question is valuable since borrowers’ experiences have the same significant influence as lenders’ in ensuring the proper functioning of the P2P lending system. Meanwhile, a good guidance for borrowers will improve their benefits and further increase the platform’s cash flow.
However, information asymmetry is also a universal phenomenon in P2P lending. It not only brings low efficiency to transaction but also brings more cost to borrowers. Specifically, the existence of information asymmetry makes borrowers overwhelmed by behavior choices since the corresponding evaluation mechanism in P2P lending is opaque for them. In that case, the key challenge is to examine profitable behaviors to optimize borrowers’ borrowing parameters under limited information. Therefore, the purpose of this paper is to propose a comprehensive method to help borrowers predict and optimize their borrowing limit and interest rate simultaneously.
To build a target model, there are many methods to choose, for example, traditional regression model based on statistics and popular machine learning technology (e.g., association analysis, decision tree, and neural network model) derived from artificial intelligence. It is well known that the regression model is a widely applied mathematical model in data science. But there are also some limitations (e.g., high sensitivity to outliners and poor predictive accuracy) which are particularly remarkable in the case of complex and nonlinear problems [9, 10]. Therefore, we will give priority to the machine learning method in our context for two reasons. First, P2P lending is a data-intensive application which contains many nonnumeric contents. The second reason is the ability and robust of machine learning to process unstructured data. Given the situation of information explosion, we propose to adopt the three-layer BP neural network since it performs well in processing complex nonlinear functions.
Before predicting the borrowers’ borrowing limit and interest rate, it is worth mentioned that obtaining a confidence interval of borrowing parameters is more credible and practical than an exact number for both researchers and borrowers. For this reason, we employ the three-layer BP neural network model to fit the complex relationship among different features in P2P lending and to further deduce the required confidence intervals. Specifically, we develop a BP neural network interval estimation (BPIE) method inspired by Duan and Xie’s work , which obtains two-sided confidence interval for the real estate blessed price based on the BP neural network, to generate one-sided and two-sided confidence intervals for both maximum borrowing limit and lowest interest rate.
To optimize the borrowers’ borrowing limit and interest rate, we construct a new parameter optimization algorithm (GBPO) based on the genetic algorithm (GA) and BP neutral network. Significantly, the most popular research topic of the BP neural network mainly involves sensitivity analysis, which aims to measure the effect of input parameters on output and acquire the most powerful input parameters [12, 13]. However, there is hardly any research attempted to identify appropriate combination of input to acquire better output, especially for P2P lending in perspective of borrowers. So, this paper reports the first work to obtain best behavior characteristics for borrowers to help them optimize their borrowing limit and interest rate. That is, the target of optimization analysis is obtaining improvement directions of behavior parameters within their adjustable range and also examining the most important parameters to maximize the borrowing limit or minimize the interest rate.
As mentioned previously, our proposed method (GBPO) is a combination of GA and BP algorithms, which is a common combination form in recent researches. A classic example of such a combination is the GA-BP algorithm, which is designed to overcome the limitations of the BP neutral network, that is, time-consuming in data training and undesired local optimization. In the GA-BP algorithm, the genetic algorithm performs a global search on weight ranges and finds out more reliable initial weights for the BP neural network . In contrast to GA-BP algorithm, our method is designed to find fluent parameter combination of input from large number of available parameters for better experimental result. In specific, the BP neutral network is the basic construction of our optimized model; the optimal output of the BP neutral network was taken as the fitness function in the GA.
By taking the advantage of our proposed method, we conduct a systematic analysis in terms of both prediction and optimization for borrowers’ borrowing parameters based on real-world data from http://ppdai.com. Experimental results show that our proposed method performs well and discovers more meaningful characteristics. Also, this paper provides a different perspective for research effort of the P2P lending system.
2. Data Preparation
In this section, we first give a brief look at basic knowledge of our adopted data. We then describe the selected parameters from target dataset. For a more convincing preference, we further conduct a relationship analysis for selected parameters.
2.1. PPdai Lending Platform
Our primary dataset is the publicly available data in the PPdai lending platform. The PPdai lending platform is the first online lending platform in China, and it was built in August 2007. As an unsecured credit lending platform for individuals, users in this system can be divided into two groups, one is borrowers and the other is investors. Borrowers post their loan information in the platform to borrow money at lower interest rates, while investors choose which borrowers to lend to by considering the borrower’s loan materials and credit rating to earn higher returns. In general, the PPdai platform aims at small loans, and the loan limit is usually spanned from 100 to 30,000 yuan [15, 16].
To reduce the risk of the borrower defaulting, the PPdai platform also develops a credit rating system which is called the Magic-Mirror System. The Magic-Mirror System is a big data analysis system that simultaneously collects and processes more than 2000 kinds of characteristics of borrowers in order to precisely evaluate their credit score and risk of default. The main characteristics concerned by the Magic-Mirror System include personal information, third-party data, repayment history, personal debt, and credit history. Based on this credit rating system, each borrower will get a different level of credit rating ranging from AAA to F, whose risk is rising in turn.
2.2. Selected Parameters
To prepare our dataset, we carefully observe the information structure in the PPdai lending platform and obtain a total of five dimensions of 17 parameters (as shown in Table 1) as our research materials. The selected data are extracted by the following principles: (1) collecting transactional pages by replacing the id number of URL (i.e., replacing 37215007 with a specific number in “http://invest.ppdai.com/loan/info?id=37215007”), (2) obtaining corresponding user pages through collected transactional pages, and (3) crawling and saving each user’s basic information and borrowing information from user pages. In addition, considering that user information is constantly updating with the completion of transactions, we filter out the outdated information and only select the latest transaction data of each users as our target dataset.
Through this process, we finally obtain our valid dataset containing 10192 documents. To get a general idea about our dataset, we give descriptive statistics of the numeric parameters, binary parameters, and categorical parameter of MMR, and the details can be seen in Table 2, Figure 1, and Table 3, respectively.
From Table 2, we can get the knowledge of statistics characteristics of each numeric parameter in our sample, which will help us understand the approximate distribution of each parameter. Figure 1 denotes the ratio of each category for each binary parameter. And in Table 3, we calculate the proportion of each category for the categorical parameter of MMR to clarify the user credit distribution in the PPdai lending platform.
Besides, in order to facilitate the parameter optimization, here we clarify the available adjustable direction for parameters and summarize the applicable adjustment of each parameter in Table 4.
2.3. Relationship Analysis
To establish an accurate predictive and optimized model, it is necessary to clarify the relationships of various parameters in the PPdai platform. There are several official statements about their determined relationships: (1) the Magic-Mirror rating mainly depends on repayment history, credit history, personal debt, personal information, third-party data, and others. (2) The effects of Magic-Mirror rating on users’ borrowing can be seen in the following aspects: (a) whether your borrowing can be approved; (b) interest rates (the higher the rating, the lower the interest rate); and (c) borrowing limit (the higher the rating, the higher the borrowing limit).
Given the official statements, we further explore the relationship of parameters we selected in the PPdai platform as shown in Figure 2. From this figure, we can find that RRI (i.e., Magic-Mirror rating) of borrowers is mainly determined by parameters in dimensions of IDI, CFI, and HTI and in turn affects the borrowers’ BRI (AMT and RAT included). It is worth mentioning that parameter dimension of BRI is not completely determined by RRI. RRI has a real impact on borrowers’ borrowing limit and their interest rate, while IDI, CFI, and HTI also have an effect on them to some extent. Furthermore, borrowers freely choose their AMT (i.e., borrowing amount) within their borrowing limit. Significantly, since the borrowing limit of each borrower is not publicly accessible in the PPdai platform, we assume that each borrower uses the maximum value when borrowing and thus we take the borrowing amount as their borrowing limit in the following analysis.
Even though borrowing limit and interest rate of borrowers are available for oneself, borrowers in the PPdai platform have no access to the specific evaluation approach. It is therefore not possible for borrowers to take more targeted actions to optimize their borrowing parameters (i.e., borrowing limit and interest rate). To this end, we propose to develop a reasonable model to obtain and optimize borrowing parameters for borrowers using the available characteristics.
3. Proposed Prediction Model
3.1. BP Neural Network Model
Here, we build a BP neural network model as our basic model to predict borrowers’ borrowing limit and interest rate. Based on the analysis above, we use 14 parameters that belong to four different dimensions as our independent materials and make the LMT (i.e., borrowing limit) and RAT (i.e., interest rate) as our target variables. The constructed architecture of our model is shown in Figure 3.
As shown in Figure 3, we adopt a three-layer neural network model which contains the input layer, hidden layer, and output layer. The input layer includes 14 neurons and a neuron bias, the hidden layer includes neurons and a neuron bias, while the output layer includes 2 neurons. This is a flexible method, which can smoothly process arbitrary nonlinear functions . To achieve the desired result, the activation function of the input layer and hidden layer is tansig (i.e., ), and the activation function of the output layer is pure linear function (i.e., ). Moreover, the error function is defined as follows:where is the sample size of the dataset, is the confirmatory dependent variable values, and is the predicted dependent variable values. Then, we use gradient descent training method as the learning algorithm to train the neutral network.
3.2. Predictive Quality
To examine the predictive quality of our algorithm, we randomly assign the collected data into three parts which are called TRD (train data), TSD1 (test data-1), and TSD2 (test data-2), with the overall proportion of their size being 80 percent, 10 percent, and 10 percent, respectively. To process the model, we first set the value of to 12, and other parameters in the process of training can be set as shown in Table 5 [18, 19]. On the basis of these preparations, we run the BP neutral network and get the performance of our predictive model, which is shown in Table 6. Here, we use the correlation coefficient to measure the accuracy of the prediction.
From Table 6, we can find that the predictive accuracy of LMT is not very high, while the predictive accuracy of RAT is especially high (up to 98 percent). Through analysis, we find that the RAT is determined, to a great extent, by the parameters shown in Figure 3 and borrowers are not authorized to change the value. LMT (i.e., AMT) is not just determined by the parameters but also can be freely chosen by borrowers within a specific limit. So, it is not difficult to understand why the predictive accuracy of the borrowing limit is not high.
3.2.1. Parameter Setting of
Here, we want to find a reasonable value for the parameter . There is an empirical formula for determining the number of neurons in the hidden layer, which is shown below:where denotes the number of neurons in the input layer, denotes the number of neurons in the output layer, and is a constant bounded between 1 and 10 [20–22]. Therefore, we can set in the range of 5∼14 in our predictive model. To obtain the efficient value, we also process the model and acquire the predictive performance of our model using various . The results are shown in Figure 4.
As shown in Figure 4, the predictive accuracy of RAT has no significant difference, but when the value of is closer to 12, the predictive accuracy of RAT is slightly higher than others for three datasets. Therefore, we will use value of 12 in the following analysis.
3.3. Robust Check
We note that different forms of selection biases may affect our result. Thus, we further conduct an analysis using samples from some specific credit ratings as a robust check. In this case, we choose three group of samples, which come from credit rating of AA, C and D, respectively, to validate our proposed model. After randomly assigning the collected data into three parts (i.e., train data, test data-1, and test data-2) for each sample group, we utilize the train data to train the neural network and test the result using test data-1 and test data-2. The corresponding result can be seen in Table 7, and we employ the correlation coefficient to measure the accuracy of the prediction.
As shown in Table 7, the prediction accuracy of our predictive model slightly declines in predicting LMT for most of data samples. One possible explanation is that the selected sample sizes are smaller that original one (Table 6). For samples from credit rating D, the prediction accuracy of our predictive model approaches or briefly exceeds the original setting, which means that our model has better performance in predicting LMT for borrowers in the D level. So, basic information plays an important role in predicting LMT. From Table 7, we can also find that the prediction accuracy of our predictive model declines dramatically in predicting RAT for all selected samples compared with the original one. It is not just because of reduced sample size but mainly results from the significance of credit rating to RAT. Based on the analysis above, it is necessary to consider both credit rating and basic information rather than only one of them in predicting LMT and RAT.
3.4. Prediction of Borrowing Limit
Here, we want to predict the borrowing limit (LMT) for borrowers using our proposed predictive model. As mentioned above, borrowing limit is the amount of the money an individual could borrow from others, and we will take borrowing amount as our borrowing limit here because of data deficiency. Meanwhile, it is worth noted that the expected result of our prediction is a confidence interval for borrowing limit but not an exact value since it is more credible and practical for us.
Given this knowledge, we extend Duan and Xie’s work  and introduce the BP neural network interval estimation (BPIE) algorithm to deduce a one-sided confidence interval for LMT, which is the predictive value of borrowing limit. Since there is no specific provision on the lower limit for borrowing amount, we only examine the one-sided confidence interval for borrowing limit. The derivation process is described as below:
Firstly, we define the promotion error of LMT as follows: given a test set, the confirmatory value and predicted value of LMT can be represented as , respectively. Then, the promotion error of LMT can be represented as follows:where denotes the sample size of the dataset and and denote the confirmatory value and predicted value of LMT for the individual, respectively.
Based on the promotion theory of machine learning introduced by Yan and Zhang , we can infer that approximately obeys the normal distribution . So, the probability distribution of can be defined as follows:
Then, we can get the following formula:
Finally, we can get the one-sided confidence interval with the confidence () for , which is as follows:
3.5. Prediction of Interest Rate
In this section, we conduct a predictive analysis for borrowers’ interest rate. We have already got the predicted value of RAT in the previous analysis. Now, we want to deduce a confidence interval for RAT in order to improve the robustness and reference value of our prediction.
To obtain the two-sided confidence interval of RAT, we follow the inference process of LMT. Firstly, since approximately obeys the normal distribution like , the probability distribution of can be defined as follows:
Then, we can get the following formula:
Finally, we can get the two-sided confidence interval with the confidence () for , which is as follows:
3.6. Prediction Results
Based on the analysis above, we use the train dataset TRD to train the neutral network and employ the well-trained neutral network to get the predictive value and on dataset TSD1. Then, (6) and (9) derived from the BPIE method are utilized to calculate confidence interval for and . Before that, the significant level needs to be determined so that we can obtain confidence intervals meeting the credibility. Normally, we set significant level to 0.05 [24, 25], and we get the confidence interval with a confidence of 0.95 for both and . So, the corresponding intervals are () and (), respectively.
In order to clearly justify the predictive accuracy of the BPIE method, we define the predictive accuracy rate of the BPIE method as follows:where and denote the number of samples that fall into the confidence interval of LMT and RAT, respectively, and denotes the sample size of a valid dataset.
By examining the predictive accuracy of the proposed BPIE method using the dataset TSD2, we get the predictive accuracy for LMT and RAT with and , respectively, which indicates that our BPIE method has a high predictive accuracy.
3.7. Comparable Analysis
It is well known that the BP neural network has better performance than linear regression in fitting nonlinear relationship. In order to explore the performance of both methods in our context, here we conduct a comparable analysis.
In keeping with our proposed predictive model, we also build a linear regression model to predict LMT and RAT and corresponding confidence intervals (see Appendix A). To compare two methods, we use the CCDR (correlation coefficient difference rate) metric  and LDR (length difference rate) metric . As shown in (11), CCDR is used to compare the prediction accuracy of both methods in predicting individual values:where and denote the correlation coefficient (between confirmatory value and predicted value) of the BP neutral network model and linear regression model, respectively. means that prediction accuracy of the BP neural network model is higher than that of the linear regression model in individual value prediction.
LDR is used to compare the prediction precision of both methods in predicting confidence intervals, which can be noted as follows:where and denote the length of the confidence intervals with a confidence of 0.95 derived from the linear regression model and BPIE method, respectively. means that confidence interval derived from the BPIE method is more efficient than that from linear regression model.
Based on the works above, we calculate the related indicators and present them in Tables 8 and 9. From the results, since the values of CCDR and LDR are greater than 0, we can conclude that our predictive model outperforms the linear regression model in predicting both individual values prediction and confidence intervals.
4. Proposed Optimization Model
In this section, we propose to use the GBPO algorithm for optimizing the borrowers’ borrowing limit and interest rate based on the previous analysis. We firstly introduce the GBPO algorithm we developed and then describe the single-target optimization method and double-target optimization method to get improvement directions of behavior parameters for borrowers to maximize the borrowing limit and minimize the interest rate, respectively.
4.1.1. Optimization Target
Before introducing the optimization model, the optimization target needs to be clarified so that we could determine what our optimization measures are. Referring back to the prediction stage in Section 3, we predict the confidence interval of maximum borrowing limit (i.e., ) and lowest interest rate (i.e., ) for borrowers using the prediction system (i.e., the well-trained BP neutral network). Based on these works, we propose to explore the promising striving direction for borrowers to acquire higher borrowing limit or lower interest rate given a set of behavior characteristics. That is, we process prediction for each generated sets of explanatory parameters to obtain corresponding and and find the best explanatory parameters collection. Therefore, the desired target in our optimization stage can be denoted as follows:
4.1.2. Explanatory Parameters
Given the optimization target, finding a reasonable way to optimize them is a natural demand. To reach this, we firstly determine the adjustable range of all explanatory variables. By analyzing the feature of each independent parameter, we obtain the adjustable ranges of them as shown in Table 10.
As shown in Table 10, these ranges are divided into two categories. The first category is applied to all borrowers, including all numeric variables and MMR. This category can be divided into two subtypes. One is used to add to initial values and another is used to replace the initial values. The second category is applied to individual borrowers, mainly including binary variables. Since the value of these independent parameters (e.g., certification information) only goes from 0 to 1, the value adjustment will just be suitable for borrowers whose corresponding parameters are zero. Besides, it is worth mentioning that we do not consider all the binary variables in our optimization algorithm for two reasons. First, some variables, such as gender, cannot be changed in reality for a specific borrower. Second, the sample size of some independent variables varies greatly between categories, while the predictive results are extremely sensitive to such disparities.
4.2. GBPO Algorithm
It is worth mentioning that our optimization system based on the BP neutral network is a mix-integer nonlinear programming (MINLP) problem for two reasons. First, the predictive system (i.e., BP neutral network) we got in prediction stage is a complex nonlinear function. Second, several integer variables are included in target explanatory parameters . Common solution for this problem is the branch-and-bound approach proposed by Land and Doig . However, it always takes an amount of time that increases exponentially with problem size. To alleviate this, we proposed to use the genetic algorithm, a typical evolution algorithm to solve our discrete and combinatorial optimization problem. Actually, the genetic algorithm is proved to be an effective approach for MINLP problems by many works [30–32], and it is extensible and easy to combine with other algorithms . Therefore, we propose a parameter optimization algorithm (GBPO) based on the BP neutral network we constructed above and the genetic algorithm to obtain optimal results. The idea of GBPO algorithm can be listed as follows:(a)For a specific borrower, the set of his/her characteristics (i.e., independent variables) can be denoted as according to the order of independent variables in Figure 3. While the set of adjustable independent variables are represented by according to the order in Table 10, is the number of adjustable variables.(b)Encode adjustable variables: we here adopt a binary code in optimization algorithm while most of our parameters are numeric or categorical. So, recoding is necessary for further analysis. We use the following formulas to get the required binary code length:where denotes the required length of each variable in the set , denotes the total length of each chromosome, denotes the upper limit of the range of variable , and denotes the lower limit of the range of variable .(c)Initialize the population: randomly generate chromosomes, which can be denoted as . The notation represents a part of chromosome that corresponds to the variable in chromosome.(d)Decode the chromosomes: for each chromosome, we encode it using the following formula:where denotes the random number which we generated for chromosome of variable . After that, we add a specific to the initial variable with symbol in Table 10 and reset the other initial variables by corresponding . Then, we can get a new set of independent values corresponding to chromosome, which is denoted as ,(e)Evaluating the fitness of each chromosome in the population: for each new set of independent values, we can calculate the corresponding target values which are introduced in Section 4.1.1 using our predictive model, noted as . To evaluate them, our fitness function can be defined as follows:where denotes the fitness function for the parameter of LMT, and denotes the fitness for the parameter RAT.(f)Perform the selection operation using the roulette method.(g)Perform the crossover operation using the 1-point crossover method, where crossover probability is set as .(h)Perform the mutation operation using a simple mutation method, where mutation probability is set as .(i)Repeat steps (c) to (h) until the iteration times meet the given maximum iteration threshold .
4.3. Single-Target Programming GBPO Algorithm
In order to observe the optimization effect, we randomly choose four groups of independent variables from the dataset TSD2, which are No. 107, No. 377, No. 422, and No. 455 individuals, respectively. In the first stage, we optimize the borrowing limit and interest rate separately; that is, we conduct a single-target programming analysis.
We use and defined before as our fitness function of LMT and RAT. To perform fair comparisons, we use the same parameter settings for all combinations as shown in Table 5. For the remaining parameters, is set to 60 and is set to 100 (discussed below). By processing the selected data using our model, we obtain the evaluation results as shown in Table 11.
From Table 11, we can find that both LMT and RAT have a better optimized result than the initial value. Taking the individual of No. 107 as an example, his/her borrowing limit has increased by 12793 yuan and his/her interest rate has decreased by 4 percent. These results show that our proposed optimization algorithm is effective.
To get more detail about the result, we represent the optimized input variables of No. 107 as shown in Table 12. Line 2 denotes the original value of each variable, and line 3 and line 4 denote the optimized value of each variable where aiming at LMT and RAT, respectively. From this table, it is easy to obtain improvement directions for No. 107 to optimize his/her LMT. First, No. 107 should conduct the certification of EDC and MPC. Second, No. 107 needs to pay the bills on time for four times and also have two clear chances to delay loans and to adjust the value of TAS and TLS to 4 and −4, respectively. Third, the Magic-Mirror rating of No. 107 should be adjusted to B. However, this credit rating is not altered for users but determined by P2P lending based on the user’s behavior. So, No. 107 should commit to improve his/her credit rating by adjusting his/her most crucial adjustable behavior (see Appendix B). In this case, readers may arise another question that our proposed model is meaningless, since users could provide as much as personal information in order to optimize their borrowing parameters. Actually, there is a trade-off between privacy protection and better borrowing parameters. Our study is proposed to obtain better LMT and RAT by providing as little personal information as possible given existing stage.
4.4. Parameter Setting of and
Even though the increased value of and might get better optimization results, but at the same time, it spends more time to convergence. In order to make a trade-off between better optimization result and time-consuming question, we conduct a comparative analysis using different threshold of and as shown in Table 5. We also use the materials of No. 107 and the experiment result is shown in Figure 5.
As shown in Figure 5, when , there is barely any major fluctuation for optimal values of the LMT and RAT, which means that degrees of optimization are similar when . Therefore, we choose the parameters combination with minimum time consumption; that is, is set to 60 and is set to 100.
4.5. Double-Target Programming GBPO Algorithm
In this section, we conduct an optimization analysis for both LMT and RAT, that is, double-target programming problem. Different from the single-target programming method, the double-target programming method is dedicated to optimizing two borrowing parameters simultaneously if their directions of optimization are different.
To execute this procedure, we firstly examine the changing trend of one target parameter as another parameter is getting better. We also take No. 107 as an example here and the experiment results are shown in Figure 6.
As shown in Figure 6, as one target parameter is getting better, another parameter does not show a clear changing trend. So, it is hard for us to intuitively acquire the best results for both parameters. Besides, Figure 6(b) shows that when RAT is getting better and better, the value of LMT always stays at a low level. Thus, if we want to optimize one of the parameters solely, another parameter will be uncontrollable and undesirable. A comprehensive analysis about two target parameters is imperative. Besides, this figure also demonstrates that we may just achieve relatively satisfactory results but optimal results for both parameters.
To optimize two target parameters simultaneously, there are many methods that can be chosen. This paper introduces a simple and understandable method named the linear weighting method to solve our double-target programming problem . The linear weighting method is implemented by giving different weights to both goals based on the different importance of them. And our goal is to obtain the maximum weighted sum of two target parameters. In our context, since the importance of the two target parameters varies from borrower to borrower, we will examine a series of weight combinations of two parameters in our analysis. And stakeholders are free to choose the weight combination according to their actual requirement.
To match our proposed GBPO algorithm with the linear weighting method, we need to modify the fitness function and the details are shown below:(a)Normalize the value of and , which we obtain in GBPO Algorithm. and are predicted values generated by GA. The formulas are defined as follows:where and denote the standardized value of LMT and RAT, respectively. and denote the maximum value of LMT and RAT, respectively. and denote the minimum value of LMT and RAT, respectively.(b)Modify the fitness function as follows:where and denote the weights of and , which represent the importance of LMT and RAT, respectively.
Here, we examine the optimization results for different weight combinations of two parameters. We select nine different weight combinations in our case and the details are shown in Table 13.
We also take No. 107 as our analysis materials and use the proposed double-target programming GBIO algorithm to check the effects of different weight combinations. The experiment results are shown in Figure 7.
From Figure 7, we can find that with the declining weight of RAT and the rising weight of LMT, the optimal value of LMT and RAT are both showing a rising trend, which is consistent with our expectations. In the meantime, all the optimal values of LMT and RAT are meeting the satisfactory level, which shows that our double-target programming GBPO algorithm is effective. Moreover, it is noted that borrowers can select different weights for two parameters to obtain their desired optimization results based on their actual requirements.
In this paper, we firstly build a three-layer BP neural network to predict the borrowing parameters LMT (borrowing limit) and RAT (interest rate). Based on the constructed BP neural network, we develop a BPIE method to obtain the confidence intervals of the LMT and RAT, which represent the prediction ranges of borrowing limit and interest rate for borrowers. Using the real-word data from http://ppdai.com, we conclude that the predictive accuracy of the proposed BFIE method are 94.80% and 97.84% for borrowing limit and interest rate, respectively. After that, we propose to optimize the borrowing parameters, LMT and RAT. By considering the number of target parameters, we transform our problem into a single-target programming problem and a double-target programming problem. To solve the problems, we introduce a new GBPO algorithm based on the BP neural network predictive model and genetic algorithm. Using the randomly selected data from valid dataset, the experiment result shows that our proposed algorithm is effective in optimizing the target parameters.
Different from priori studies, this paper provides a new perspective from borrowers to predict and optimize the borrowing limit and interest rate given the limited information. The proposed method and findings of our experiment study have practical implications for researchers and borrowers in the P2P system.
A. Building Linear Regression Model
We first build linear regression models for individual value prediction, which can be written as follows:where denotes the intercept vector; , , , and are coefficient vectors; and denotes the residual.
Second, corresponding confidence intervals are derived from constructed linear regression models. The process is as follows.(1)Taking LMT as an example, obeys the following distribution according to its statistics :where denotes the confirmatory value of the individual, denotes the predicted value of the individual, denotes the sample size, and denotes the number of independent variables. represents the estimation of standard deviation of prediction error, and its calculation formula is as follows:where represents the estimation of standard error, represents the value vector of individual’s dependent variables, and represents the value matrix of the dependent variables.(2)Then, it is easy to get the one-sided confidence interval for , that is:where denotes the significance level and denotes the quantile of t-distribution.(3)Similarly, two-sided confidence interval for is as follows:where denotes the significance level and denotes the quantile of t-distribution.
B. Ranking the Importance of Parameters on MMR
In order to simply compare the importance degree of each parameter on MMR, we established the following multinomial logistic regression:where indicates the conditional probability of given ; , and denote the corresponding coefficient vector; and denotes the intercept of the model.
By using SPSS Statistics on our samples, we can get the following result as shown in Table 14.
In Table 14, −2LL (i.e., log-likelihood) is an index which is commonly used to measure the fitting degree of models; the smaller the value, the better the fit . value reflects the significance of each parameter. From Table 14, we can find that almost all of values are less than 0.05 except for MPC, which means that all parameters except MPC have significant impact on MMR.
In order to rank the importance of each parameter on MMR, we calculate the statistical adequacy, which is related to the explanatory value of each predictor relative to the entire set . The adequacy is the proportion of the total explained variation in the outcome that is explained by the individual predictor . Then the values of adequacy can be calculated for each parameter which is significant. The results are shown in Figure 8.
From Figure 8, we can obtain the ranking of importance degree of each parameter on MMR. And it is easily found that TAS, PRS, and ORC have the greatest influence on MMR.
Data, as supplemental material, are available at https://pan.baidu.com/s/1BGrENUwFILVkMqDsyGroVQ.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This research was supported by the National Science Foundation of China (Project no. 71571073), Guangdong Natural Science Foundation (Project no. 2014A030313243), and the Twelfth Five-year Planning Project for Guangdong Philosophy and Social Science (GD14CGL09).
Y. Li, Y. J. Gao, Z. N. Li et al., “The influence of borrower's description on investors' decision-analysis based on P2P online lending,” Economic Research Journal, no. S1, pp. 143–155, 2014.View at: Google Scholar
Y. Miao, G. Chen, W. Wang, and X. Gong, “Application of gray-tone difference matrix-based features of pavement macrotexture in skid resistance evaluation,” Journal of Southeast University, vol. 31, no. 3, pp. 389–395, 2015.View at: Google Scholar
Y. H. Duan and J. H. Xie, “Price estimation of real estate market-based on BP neural network interval estimation method,” Statistics and Decision, no. 5, pp. 18-19, 2004.View at: Google Scholar
D. J. Montana and L. Davis, “Training feedforward neural networks using genetic algorithms,” in Proceedings of International Joint Conference on Artificial Intelligence, pp. 762–767, Detroit, MI, USA, August 1989.View at: Google Scholar
D. Y. Chen, “Research on credit transaction trust of P2P network based on social cognition theory,” Nankai Business Review, vol. 17, no. 3, pp. 40–48, 2014.View at: Google Scholar
D. Y. Chen, H. Zhu, and H. C. Zheng, “Risk, trust and willingness to lend–an empirical study based on the registered users on the ppdai platform,” Management Review, vol. 26, no. 1, pp. 150–158, 2014.View at: Google Scholar
J. N. Fan, Z. L. Wang, and F. Qian, “Research progress of hidden layer structure design for BP artificial neural network,” Control Engineering, vol. 12, no. S0, pp. 109–113, 2005.View at: Google Scholar
Y. S. Ding, Computational Intelligence: Theory, Technology and Application, Science Press, Beijing, China, 2004.
L. M. Zhang, Artificial Neural Network Model and Its Application, Fudan University Press, Shanghai, China, 1993.
P. F. Yan and T. S. Zhang, Artificial Neural Network and Simulated Evolutionary Computation, Tsinghua University Press, Beijing, China, 2nd edition, 2005.
D. Groebner, Business Statistics: A Decision Making Approach, China Machine Press, Beijing, China, 2007.
X. Y. Li and C. Yuan, Application of Economic Statistics, Peking University Press, Beijing, China, 2015.
S. Raschka, Python Machine Learning, Packt Publishing, Birmingham, UK, 2015.
Z. Sheng, Probability Theory and Mathematical Statistics, Higher Education Press, Beijing, China, 3rd edition, 2001.
F. F. Leimkuhler, Introduction to Operations Research, McGraw-Hill, New York, NY, USA, 2005.
F. E. Harrell Jr., Regression Modeling Strategies, Springer, New York, NY, USA, 2001.
D. Thompson, Ranking Predictors in Logistic Regression, SAS Institute, Cary, NC, USA, 2009.
D. J. Mclernon, E. W. Steyerberg, E. R. T. Velde et al., “Predicting the chances of a live birth after one or more complete cycles of in vitro fertilisation: population based study of linked cycle data from 113 873 women,” BMJ, vol. 355, Article ID i5735, 2016.View at: Google Scholar