Abstract

Recent frequent “thunderstorm incidents” of P2P lending industry have caused the panic of industry investors. To predict the investment risk of P2P lending, we should scientifically and rationally analyze the key influencing factors of P2P lending investment risk. Existing key influencing factors selection methods mainly involve traditional statistical approaches and artificial intelligence methods. The traditional statistical approaches cannot deal with the high-dimensional nonlinear problems, and it cannot find the exact key influencing factors of the P2P lending investment risk. The artificial intelligence methods cannot recognize and learn the application background, and the selected attributes without active thinking and personal perception may not be the key influencing factors of P2P lending investment risk. To address the above issues, a novel key influencing factors selection approach of P2P lending investment risk is proposed by combining the proposed fireworks coevolution binary glowworm swarm optimization (FCBGSO), multifractal dimension (MFD), probit regression, and artificial prior knowledge. First, multifractal dimension combined with the proposed FCBGSO is used to select the preliminary influencing factors of the investment risk; second, the nonsignificant relevant attributes in the preliminary influencing factors are removed using the probit regression, and we add the influencing factors extracted from the original dataset of P2P lending using the artificial prior knowledge into the retaining influencing factors after removing one by one. A small and reasonable number of influencing factor subsets are achieved. Finally, we evaluate each influencing factors subset using extreme learning machine (ELM), and the subset with the best classification accuracy is efficiently achieved, i.e., it is the key influencing factors of P2P lending investment risk. Experimental results on the real P2P lending dataset from the Renrendai platform demonstrate that the proposed approach performs better than other state-of-the-art methods and that it has validity and effectiveness. It provides a new research idea for the key influencing factors selection of P2P lending investment risk.

1. Introduction

Peer-to-peer (P2P) lending is a new financial model that integrates Internet platforms and private lending. Both lenders and borrowers can directly complete the transactions through P2P lending platforms without going through financial intermediaries [1, 2]. P2P lending is one of the most important modes of Internet finance. On one hand, it can serve the real economy; on the other hand, the recent frequent occurrences of P2P lending “thunderstorm incidents” have damaged the earnings of investors and hindered the healthy development of P2P lending industry. According to preliminary statistics, as of January 15, 2019, there are 2,746 transferred or closed platforms in China’s P2P lending industry and 2,663 problematic platforms in total. Since June 2018, the risk incidents of P2P lending platforms have been continuously exposed. The large-scale “thunderstorm incidents” in the P2P lending industry have caused a strong impact on the healthy development of this industry. Meanwhile, it has attracted great attention of the Chinese government. In the 2018 report on the work of the Chinese government, it was definitely pointed out that “strengthen the overall coordination of financial supervision and improve the supervision of Internet finance.” Hence, Internet finance has been written into the report on the work of the government five times in a row. From the initial “promoting development—standardizing development—being vigilant of risk” to the “improving Internet financial supervision” in 2018 indicates that the standardization control of Internet financial investment risk is imperative. Therefore, it is urgent to explore the investment risk of P2P lending. To research the investment risk, we should analyze the key influencing factors of P2P lending investment risk, which can provide high-quality data for the prediction of P2P lending investment risk.

A key influencing factors selection approach for P2P lending investment risk is essential to reduce irrelevant attributes with the investment risk in an original dataset of P2P lending and retain key influencing factors. In fact, there is a large amount of noisy or irrelevant features with investment risk in the real datasets of P2P lending. Existing key influencing factors selection approaches usually use traditional statistical methods and attribute selection algorithms based on artificial intelligence [3, 4]. The statistical methods are commonly used in the selection of key influencing factors of P2P lending investment risk, while there is no application of artificial intelligence methods, at least to our knowledge. The traditional statistical methods are only limited to the discussion of the impact of a single factor on the borrower for the order default risk, but it ignores the fusion and the crossover of multiple information. For instance, the credit rating of each loan has an important impact on investment risk, which is presented by Guo et al. [5]. Larrimore et al. examined the relationship between language use and investor decision-making [6]. The soft information in loan titles has a significant influence on whether the loan is successful. The results also suggest that investors do not invest blindly based on returns [7]. Xiao et al. proposed a visual analysis method which analyzes and detects risk in P2P lending deals [8]. Perceived age of P2P lending orders shows a strong signal of ability and experience, and more mature perceived age is more attractive to investors [9]. Chen et al. investigated the amount of punctuation used in loan descriptions can influence the investment default risk using data from Renrendai (one of the largest P2P lending platforms in China) [10]. To sum up, the traditional statistical methods such as regression analysis have small calculations and simple operations when they analyze the influencing factors of P2P lending investment risk.

For feature selection based on artificial intelligence, there are two main points: one is the evaluation criterion selection and the other one is the search strategy. With respect to evaluation criterion, various evaluation methods are used to evaluate feature subsets. Different evaluation methods have great relationship with the optimal subset. For example, information theory [1113], distance analysis [4], rough sets [1417], and fractal dimension [1820]. Fractal dimension is treated as an evaluation criterion, which attracts many scholars’ attentions. It has two advantages [21]: on one hand, the number of an optimal feature subset can be determined by calculating its fractal dimension, which can dramatically reduce computational amount; on the other hand, the fractal dimension performs well when it comes to solving high-dimensional datasets and nonlinear problems. Most existing feature selection approaches based on fractal dimension use only a single fractal dimension, which may not precisely describe the original datasets [20] because of their complicated distribution. In contrast, multifractal dimension (MFD) can describe the distribution of dataset in different aspects [19], which is regarded as the evaluation criterion of feature subsets in this work. In regard to searching strategy, finding an optimal feature subset of an original dataset is a combinatorial optimization problem [17]. Therefore, heuristic algorithms provide good searching strategies for the feature selection methods, for example, genetic algorithm (GA) [22, 23], ant colony optimization (ACO) [2426], particle swarm optimization (PSO) [27, 28], and artificial fish swarm algorithm (AFSA) [29]. However, the complex coding process of GA is hard to be implemented [30, 31]; ACO has the disadvantages of the blindness search in the early stage, slow convergence speed, and huge computing resource consumption; PSO easily traps into local optimal solution [30, 31]; AFSA has the weaknesses of lack of population diversity and slow convergence rate in the later stage. In contrast, glowworm swarm optimization (GSO) has the advantages of simplicity of implementation, strong robustness, and good and fast global convergence [32], which can be used as a searching strategy for solving a feature selection problem [33]. We attempt to propose a fireworks coevolution binary glowworm swarm optimization (FCBGSO) as the searching strategy in this work.

Based on the above analysis, first and foremost, the traditional statistical methods cannot solve the high-dimensional nonlinear problem, and the analysis is one-sided, so it is difficult to exactly analyze the key influencing factors of the P2P lending investment risk. In addition, the artificial intelligence methods perform well when it comes to coping with high-dimensional and nonlinear datasets, but it cannot recognize and learn the application background, lack of active thinking and personal perception, and the selected attributes may not be the key influencing factors of P2P lending investment risk. Therefore, we proposed a novel approach to find the key influencing factors of P2P lending investment risk, which combines MFD, FCBGSO, the probit regression, and the artificial prior knowledge. The mission is attained in four steps: in the first step, we take the proposed FCBGSO as a search strategy and treat MFD as an evaluation criterion for feature subsets. Then, the preliminary attribute subset extracted from the original dataset of P2P lending is attained using the combination of FCBGSO and MFD. In the second step, the nonsignificant relevant attributes with the default risk are removed from the preliminary subset using the probit regression. In the third step, a small and reasonable number of attribute subsets are achieved by combining the retaining attributes after removing and the attributes obtained by the artificial prior knowledge. In the final step, considering the advantages of extreme learning machine (ELM) such as good generalization ability and the extremely fast learning speed, ELM is used to assess the classification accuracies of these subsets, and the attribute subset with the best accuracy is the key influencing factors of P2P lending investment risk.

The contributions of the proposed approach are presented as follows:(1)A novel approach for key influencing factors selection of P2P lending investment risk is proposed using the combination of FCBGSO, MFD, the probit regression, and the artificial prior knowledge(2)The proposed FCBGSO works well with respect to searching for the optimal solution in a binary space(3)Experiments on the real dataset of P2P lending from Renrendai platform demonstrate that the proposed method significantly performs better than traditional statistical approaches and artificial intelligence methods and that it has validity and effectiveness(4)It provides a novel research idea for the key influencing factors selection of P2P lending investment risk

The rest of this paper is organized as follows. In the next section, we briefly review the basic concept of a GSO, and then FCBGSO is proposed. The key influencing factors selection method of P2P lending investment risk and how to use it are presented in Section 3. Experimental results are shown in Section 4. In Section 5, the conclusions and the future work are presented.

2. Fireworks Coevolution Binary Glowworm Swarm Optimization (FCBGSO)

Swarm intelligence algorithms combined with MFD can be applied in attribute selection. Swarm intelligence algorithms are used as searching strategies. GSO has some advantages such as simplicity of implementation, strong robustness, and good global convergence. So, it can be used as a searching strategy, but there are still drawbacks, e.g., insufficient diversity, low convergence precision, and searching efficiency. To address the above drawbacks, FCBGSO is proposed, which significantly improves its convergence speed and precision. The preliminary influencing factors can be efficiently achieved. The outline of FCBGSO is presented as follows.

2.1. Glowworm Swarm Optimization (GSO)

GSO is a relatively novel swarm intelligence algorithm proposed by Krishnanand and Ghose [3436], which is a bionic swarm intelligent algorithm by imitating the luminous behavior in the process of foraging and courtship of glowworms in nature [37]. In GSO, each glowworm represents a solution, and it is randomly distributed in a solution space. The higher brightness the glowworm individual has, the more attraction it gains [38]. The glowworms move forward to their neighbors with higher luciferin, and these individuals can be updated. Thus, the global optimal solution is attained. The basic steps of GSO are listed as follows:(1)Updating luciferin of the glowworm at the th iteration is given by equation (1). The luciferin renewal depends on the objective function value of the glowworm:where is the luciferin level of at the th iteration, represents the luciferin decay constant , and indicates the luciferin enhancement constant.(2)The glowworms in the dynamic decision domain of whose luciferin is greater than can be used to make up its set of neighbors , and it is expressed as equation (2). The probability of moving to neighbor in a set of neighbors is described as equation (3):where is the dynamic radial range and is the radial range of the luciferin sensor.(3)Each glowworm selects a objective glowworm with a higher luciferin at a probability . Then, the position of can be updated as the following equation:where is a moving step, set by the user.(4)After updating the positions of all the glowworms, the dynamic radial range of local-decision domain is noticed using the rule given as the following equation:where is a constant parameter and is a parameter to control the number of neighbors.

2.2. Position Updating Modification Based on Dynamic Inertia Weight

Dynamic inertia weight strategies are categorized into four classes: linear decreasing inertia weight, nonlinear decreasing inertia weight, adaptive inertia weight, and stochastic inertia weight [3941]. Consider that the stochastic inertia weight (SIW) in the position updating equation can balance the relationship between the local and the global search. It can obtain stable optimization results and quickly jump out of the local optima. Therefore, we use SIW to solve the drawback of slow convergence speed of basic GSO. The SIW is defined as follows:where denotes the lower limit value of SIW, indicates the upper limit value of SIW, shows a random number which follows the normal distribution, expresses a random number of uniform distribution, and represents the deviation between inertia weights and their mean value.

The SIW is mainly used to update the positions of glowworms, and it is updated as follows:

To solve a binary combinational optimization problem, the positions of glowworms are mapped into 0 or 1 using a sigmoid function. The mapping process is presented as equations (8) and (9):where , is the dimension of the solution space of the problem, and is a sigmoid function.

2.3. Coevolution Mechanism

To overcome the weakness of slow convergence speed in GSO, a coevolution mechanism is introduced into GSO, which can promote the process of evolution. To avoid invalid crossover caused by the excessive similarities between glowworms, the initial population is divided into three equal subpopulations by the proportion 1 :1 : 1 according to their fitness values. They are elite subpopulation , excellent subpopulation , and common subpopulation , respectively. Each subpopulation evolves independently and synchronously and keeps dynamic updating during the search process. The most excellent glowworm individual is selected from the elite subpopulation, and it performs a crossover with the optimal individual of and , respectively. Then, four new offspring are generated, which keeps the diversity of the population.

We introduce a competitive factor into this work. The coevolution mechanism can be denoted as follows:

If , thenwhere are randomly generated variables bounded between 0 and 1, are different glowworms in , , and , respectively. , , , and are the four new offspring. will be replaced by the best glowworm selected from , , , and if the best individual performs better than . The architecture of the coevolution mechanism is presented in Figure 1.

2.4. Fireworks Evolution Strategy

To effectively avoid the defects of the premature convergence and the insufficient diversity of population in GSO, a fireworks explosion operation [42] is introduced. The current glowworm produces multiple offspring by explosion with a certain probability. The best individual extracted from the multiple offspring can be retained to the next generation. We introduce a probability factor , and the scale of the individual glowworms produced around is formulized as follows:

If , thenwhere is the number of newly generated glowworms, shows the maximal fitness value of glowworms at the current iteration, denotes a constant to adjust the amount of glowworm offspring, and is a small constant which can avoid zero division error.

The rth dimension in is randomly selected to perform Gaussian mutation operation, namely, it is changed from 0 to 1 or 1 to 0:where and indicates the Gaussian distribution with a mean value of 1 and a variance value of 1.

The glowworm offspring are produced by the fireworks evolution strategy, and their fitness values can be achieved. If the optimal glowworm in the generated offspring performs better than , then is replaced by it.

3. Key Influencing Factors Selection Method

3.1. Multifractal Dimension (MFD)

Mandelbrot first proposed the concept of fractal in 1983 [43], which is used to describe the irregular geometry of the nature. There are two properties with respect to the fractal object: one is the self-similarity and the other one is the scale invariability, namely, there is a similar appearance when the fractal object is viewed in indifferent scales. Fractal theory is used in a wide variety of fields.

There are often two kinds of dimensions on datasets, i.e., the embedding dimension and the intrinsic dimension. The embedding dimension indicates the number of the original dataset’s features; the intrinsic dimension represents the number of irrelevant features. Generally speaking, the intrinsic dimension is less than the embedding dimension. If all features are irrelevant with each other, the intrinsic dimension is equal to the embedding dimension. The fractal dimension can represent the intrinsic dimension, and the upper bound of the fractal dimension is the number of key features required to characterize the original dataset. Traina et al. [44] showed that most of the datasets have fractal characteristic, and the fractal dimension can be regarded as an evaluation criterion for feature selection.

Fractal feature selection approaches were first proposed by Traina et al. [44]. The fractal dimension is taken as an evaluation criterion, which can measure the importance of features. The advantage of fractal feature selection algorithms is that the number of the selected features can be determined, but the fractal dimension needs to be recalculated after removing some features. To improve computational efficiency, GA [22, 23], ACO [2426], PSO [27, 28], AFSA [29], and so on are employed as searching strategies to enhance efficiency of the fractal feature selection methods.

However, most existing fractal feature selection methods only take a single fractal dimension such as information dimension or correlation dimension. A single fractal dimension may not precisely describe a dataset [45]. In contrast, MFD can describe the dataset’s distribution in different aspects, which can be calculated as the following equation:where stands for the probability of a data point dropped into the ith grid, indicates the grid size, denotes the scale-free interval of a dataset, and is an integer.

When , shows the void distribution of a fractal dataset; when , indicates the aggregation degree of a fractal dataset. Fractal dimension (FD) can just describe the distribution of a dataset in a single aspect. In contrast, the MFD can describe the distribution in many aspects. Hence, MFD is regarded as an evaluation criterion of feature subsets in this work.

3.2. Construct the Objective Function

By comparison with a single fractal dimension, MFD can accurately describe datasets. So, the objective function can be expressed as the following equation:where represents the qth-order fractal dimension of a feature subset, and illustrates the qth-order fractal dimension of the original dataset.

We regard the difference between the MFD of a feature subset and the original dataset as the objective function. According to the definition of the objective function, we can see that the smaller the value of the objective function is, the better the solution is. is specified with five fractal dimensions , respectively [19].

3.3. Extreme Learning Machine (ELM)

ELM was first proposed by Huang et al. [45], which was developed for single hidden layer feedforward networks (SLFNs). By comparing with traditional neural networks, it requires great efforts in the adjustment of hyperparameter [46], ELM can provide good generalization ability and extremely fast learning speed. ELM contains input, hidden layers, and output nodes, and only hidden layer nodes required to be set in ELM. For given different samples , the model of ELM can be expressed as follows:where , , denotes hidden nodes, indicates a hidden layer activation function, illustrates the weight vector connecting the th hidden node and input nodes, and is the threshold of th hidden nodes. For all samples, equation (15) can be written aswhere , , and , shows the hidden layer output matrix. The ELM theory states that the hidden node learning parameters and can be randomly assigned regardless of input data.

Therefore, the system equation (15) becomes a linear model. By finding the least squares solution of the linear system (15), the output weights can be analytically determined as follows:where indicates the Moore–Penrose generalized inverse of the hidden layer output matrix H [47].

3.4. Key Influencing Factors Selection Model Construction

The effective integration of FCBGSO, MFD, probit regression, and artificial prior knowledge is applied to the key influencing factors selection of P2P lending investment risk. Firstly, the MFD is treated as an evaluation criterion for a feature subset, and FCBGSO is used as a search strategy. The combination of FCBGSO and MFD (FCBGSO + MFD) is used for reducing the redundancy attributes in the original dataset, and the preliminary subset is attained. Secondly, we analyze the correlation between the selected attributes and the default risk of P2P lending investment using the probit regression, and those attributes that are nonsignificantly correlated with the investment risk will be removed. Finally, the attributes that have a significant impact on the investment risk are selected from the original dataset using the artificial prior knowledge, which are added into the retaining attributes one by one. Then, a small and reasonable number of attribute subsets are achieved, and we assess their classification accuracies using ELM. The attribute subset with the highest classification accuracy is the key influencing factors of P2P lending investment risk.

Inputs: the initial parameters, the initial data of P2P lending, and MFD computing system.
Outputs: the key influencing factors of P2P lending .
(1)Initialize the parameters.
(2) glowworms are generated randomly, and compute their MFD using equation (14).
(3), .
(4).
(5)while do
(6)for to do
(7)  Select the objective glowworm in the radial range local-decision domain of the glowworm .
(8)  Move a step to using equations (6)–(9).
(9)  Update the luciferin and the radial range local-decision domain .
(10)  if do
(11)    glowworms are divided into three subpopulations according to their MFD.
(12)   Perform the coevolution mechanism to create offspring glowworms and update their parent glowworms.
(13)  end if
(14)  if do
(15)   Perform the fireworks evolution strategy to create new glowworms and update the current glowworm.
(16)  end if
(17)end for
(18), .
(19)end while
(20)Obtain the preliminary attribute subset which corresponds to .
(21)Get the attribute subset by eliminating those attributes that are not significantly related to the default risk in using the probit regression.
(22)Form an attribute subset extracted from the original dataset of P2P lending using the artificial prior knowledge.
(23)Generate a small and reasonable number of attribute subsets by adding the attributes in into .
(24)Get the classification accuracies by evaluating each subset in using ELM.
(25)Achieve the key influencing factors of P2P lending with the highest classification accuracy.
(26)return

The pseudocode of Algorithm 1 is presented as follows.

The main steps of the model construction are as follows:Step 1: calculate the MFD of the original dataset of P2P lending and obtain the number of attributes in the preliminary subset ; the objective function , Step 2: search the preliminary attribute subset of P2P lending orders with the minimal objective function value using FCBGSOStep 3: eliminate attributes that are nonsignificantly related to default risk in using the probit regression and get the attribute subset Step 4: select the attributes extracted from the original dataset that have a significant influence on the investment risk and do not belong to using the artificial prior knowledge and form the attribute subset Step 5: add the attributes in into one by one and get a small and reasonable number of attribute subsets Step 6: calculate the classification accuracy of each attribute subset in using ELM, and then obtain their classification accuracies Step 7: assume is the highest classification accuracy in , and then the attribute subset is the key influencing factors of P2P lending investment risk

4. Experimental Results

In this section, to assess the performance of the proposed approach, the experiments are implemented in MATLAB 2017a. The algorithm is tested on a computer running 64-bit Windows 10 with 2.81 GHz processor and 8 GB memory. Experimental parameters are set as follows: the population size , the maximum number of iterations , luciferin volatile factor , luciferin renewal rate , dynamic decision domain update rate , neighborhood threshold , and the remaining parameters are analyzed in Section 4.4.

4.1. Data Preprocessing and Indicator System Construction

Renrendai platform is one of the earliest P2P lending information intermediary service platforms in China, which has been steadily operating since its establishment. It has been ranked in the top 100 Internet companies in China twice. Hence, we used the P2P lending datasets of Renrendai as the empirical data in this work. We obtained more than 400,000 P2P lending transaction orders from the Renrendai platform, and 396, 993 of them are valid. Then, the outlier orders and 295, 589 orders of unsuccessful fundraising are removed. Finally, 99, 469 orders are available for the key influencing factors selection of P2P lending investment risk. After the above procedure, the retaining dataset is an imbalanced dataset, and then the balanced dataset of P2P lending investment risk is achieved using the undersampling and the stratified sampling methods. On the basis of the relevant knowledge of the Internet finance and the research results on the key influencing factors of P2P lending investment risk [5, 6], its index system is shown in Figure 2. We take the default risk of the borrowers as the decision attributes in this work.

4.2. Experimental Results

The proposed key influencing factors selection method, using the combination of FCBGSO and MFD (FCBGSO + MFD), selects the preliminary attribute subset from the original dataset of P2P lending orders. The four attributes are retained after selection, i.e., they are H1 (interest rate), H4 (number of investors), H7 (age), and H15 (occupation). The FCBGSO + MFD greatly reduces the redundant attributes in the original dataset. While, there is a question to discuss, that is, whether the retained four attributes are significantly related to the default risk. We use the probit regression model to assess the significance between the four attributes and the default risk.

We take the default state as the explained variable and regard interest rate, number of investors, age, and occupation as the explanatory variables. The probit regression model is established as follows:where denotes default risk, indicates explained variable, and demonstrates control variable.

As reported in Table 1, the regression coefficient of interest rate is 0.0573 and the marginal utility is 0.0221, which reveal that there is a positive significance between the interest rate and the default risk at 1% significance levels. Age and occupation are also significantly positive at the 1% level. But, the number of investors has no significant impact on the default risk in comparison with other three factors. Therefore, when analyzing the key influencing factors selection of P2P lending investment risk, H4 should be removed and H1, H7, and H15 are retained.

Considering that FCBGSO + MFD cannot recognize and learn the application background, lack of active thinking and personal perception, we extract the attributes with a significant impact on default risk using the artificial prior knowledge in this work. Credit rating plays an important role in the process of investors making investment decisions, as illustrated in Table 2. In the P2P lending industry, investors need to consider on whom the funds are invested in and the specific amount allocated for each order, so as to maximize the expected investment income and reduce the return risk. Credit rating is an important input to solve such combinatorial optimization problem, so it has important reference value for the key influencing factors selection of P2P lending investment risk [5, 51]. In addition, the borrower’s historical information is a nice complement to the credit rating. The higher the repayment rate of historical borrowings on time, the lower the ratio between historical overdue times and historical borrowing times, which indicates the borrowers convey a message to investors that the borrowers are trusted and welcomed by the market. The lower the default risk perceived by investors, the smaller the risk compensation. Therefore, H10 (historical borrowings) and H11 (historical overdue times) of borrowers are of great significance in the analysis of key influencing factors selection of P2P lending investment risk [48, 49].

In summary, the results achieved by the key influencing factors selection method of P2P lending investment risk are shown in Table 3. The attributes selected by the artificial prior knowledge are H6, H10, and H11, which are added into the attribute subset (H1, H7, and H15) one by one. Then, a small and reasonable number of attribute subsets are achieved, which are shown in Table 4. We use ELM to calculate the classification accuracy of each attribute subset, and the subset with the highest accuracy is the key influencing factors of P2P lending investment risk. Because the higher the classification accuracy of the subset is, the more relevant between the subset’s attributes and the default risk.

The maximal and average classification accuracies of combinations 1–10 are displayed in Table 4. In Table 4, combination 1 is the original dataset, combination 2 is the preliminary attribute subset attained by FCBGSO + MFD, combination 3 is the retaining attributes after removing the nonsignificant correlation variable in combination 2 using the probit regression method, and combinations 4–10 are the attribute subsets by adding H6, H10, and H11 into combination 3 one by one.

The maximal and average classification accuracies of the attribute subsets (combinations 4–10) are markedly higher than that of combination 2, which indicates the proposed approach can achieve a better result than the FCBGSO + MFD, namely, the combination of the artificial intelligence method, the traditional statistical method, and the artificial prior knowledge performs better than every single one of them. After removing H4 in combination 2 by the probit regression, the accuracy of combination 3 is slightly lower than that of combination 2, but the decrease is within the acceptable range. It implies that H4 is not a key influencing factor of P2P lending investment risk. The maximal and average accuracies of combination 9 are higher than the other combinations. Therefore, H1, H7, H10, H11, and H15 in combination 9 are the key influencing factors of P2P lending investment risk. It indicates that the proposed approach dramatically reduces the redundant attributes. The key influencing factors of P2P lending investment risk are exactly achieved, which provides high-quality data for the prediction of P2P lending investment risk.

4.3. Comparison Analysis

To verify the effectiveness and credibility of the proposed approach, we compare it with the following methods in literatures [19, 29, 50, 52]. Literatures [19, 50] adopt swarm intelligence algorithms combined with MFD for the key influencing factors selection. The literature [29] uses a rough set theory combined with artificial fish swarm algorithm for attribute selection. The literature [52] employs the statistical method and the artificial prior knowledge to extract the key influencing factors. In Table 5, the maximal and average classification accuracies of the proposed approach are superior to that of other algorithms, which denotes its validity and effectiveness. Moreover, in comparison with the literatures [19, 29, 50, 52], the maximal classification accuracies achieved by the proposed approach are increased by 19 percentage points, 18 percentage points, 23 percentage points, and 4 percentage points, respectively. The average accuracies are raised by 19 percentage points, 18 percentage points, 21 percentage points, and 2 percentage points, respectively. Given the above, the key influencing factors selected by the proposed method perform the best, followed by literature [19, 29, 52] and literature [50] is the worst. It also illustrates that the proposed key influencing factors selection approach by combining qualitative and quantitative analysis is more reasonable and scientific.

4.4. Parameter Analysis

In the proposed selection method of key influencing factors of P2P lending investment risk, FCBGSO is employed as a search strategy. To improve the performance of FCBGSO, its main parameters should be analyzed, including iterations, population size, initial local-decision range, and maximal local-decision range.

To verify the performance of FCBGSO, it is compared with GSO [53], IGSO [54], DGSO [55], and BGSO [56] as shown in Figure 3(a). As the iterations increase, the MFD difference curves between attribute subset selected by the five algorithms and the original dataset of P2P lending go down first and level off (the smaller the MFD difference is, the better the algorithm performs). Additionally, the convergence speed and precision of FCBGSO are significantly better than GSO, IGSO, DGSO, and BGSO. We advise to set the maximum of iterations at 20.

In Figure 3(b), with the increasing of population size, the MFD difference decreases continuously. When the size of the population reaches 30, the performance of FCBGSO tends to be stable. So, the population size should be set at 30.

Figure 3(c) analyzes the relationship between the initial local-decision range and the performance of FCBGSO. If the initial local-decision range is undersize, it may affect its convergence speed. If the initial local-decision range is oversize, the algorithm easily traps into local optima. As there are 17 attributes in the P2P lending dataset, the radius of the initial local-decision range varies from 1 to 17. When the initial local-decision range is 8, the algorithm performs at its best. We advise to set the initial local-decision range at 8.

Figure 3(d) investigates the relationship between the maximal local-decision range and the performance of FCBGSO. The maximal local-decision range should be greater than or equal to the initial local-decision range, so the range of maximal local-decision range varies from 8 to 17. The algorithm achieves the best result when the maximal local-decision range is 12 or 13. Therefore, the maximal local-decision range should be set at 12 or 13.

5. Conclusion

To exactly predict the investment risk of P2P lending, we need to scientifically and rationally analyze its key influencing factors. But, existing traditional statistical approaches cannot find the exact key influencing factors of the P2P lending investment risk, and the attributes achieved by artificial intelligence methods may not be the key influencing factors of P2P lending investment risk. To tackle the above issues, a key influencing factors selection approach of P2P lending investment risk is proposed using the combination of FCBGSO, MFD, probit regression, and artificial prior knowledge. On one hand, the proposed FCBGSO with a high searching efficiency combined with MFD tends to perform well when it comes to dealing with the high-dimensional original dataset of P2P lending, and the preliminary attribute subset is achieved. On the other hand, the nonsignificant relevant attributes with the default risk in the preliminary attribute subset are removed using the probit regression method. After that, a small and reasonable number of attribute subsets are attained by combining the retaining attributes and the attributes achieved by the artificial prior knowledge. The attribute subset with the best accuracy assessed using ELM is efficiently achieved from the attribute subsets, namely, it is the key influencing factors of P2P lending investment risk. Finally, the experimental results on the real P2P lending dataset of Renrendai demonstrate the validity and effectiveness of the proposed approach. In addition, the proposed FCBGSO performs better than other binary heuristic algorithms with respect to the convergence speed and precision.

In future work, we will attempt to use an ensemble classifier of ELMs with a high classification ability to predict the investment risk of P2P lending. We believe that promising results can be achieved, which can provide new research ideas for the investment risk prediction of P2P lending.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Anhui Provincial Natural Science Foundation under grant nos. 1908085QG298 and 1908085MG232, the National Nature Science Foundation of China under grant nos. 91546108 and 71490725, the National Key Research and Development Plan under grant no. 2016YFF0202604, the Fundamental Research Funds for the Central Universities nos. JZ2019HGTA0053 and JZ2019HGBZ0128, and the Open Research Fund Program of Key Laboratory of Process Optimization and Intelligent Decision-making (Hefei University of Technology), Ministry of Education.