Mathematical Problems in Engineering

Mathematical Problems in Engineering / 2019 / Article

Research Article | Open Access

Volume 2019 |Article ID 1902970 | https://doi.org/10.1155/2019/1902970

Guotai Chi, Shijie Ding, Xiankun Peng, "Data-Driven Robust Credit Portfolio Optimization for Investment Decisions in P2P Lending", Mathematical Problems in Engineering, vol. 2019, Article ID 1902970, 10 pages, 2019. https://doi.org/10.1155/2019/1902970

Data-Driven Robust Credit Portfolio Optimization for Investment Decisions in P2P Lending

Academic Editor: Emilio Gómez-Déniz
Received24 Oct 2018
Accepted24 Dec 2018
Published02 Jan 2019

Abstract

Peer-to-Peer (P2P) lending has attracted increasing attention recently. As an emerging micro-finance platform, P2P lending plays roles in removing intermediaries, reducing transaction costs, and increasing the benefits of both borrowers and lenders. However, for the P2P lending investment, there are two major challenges, the deficiency of loans’ historical observations about the certain borrower and the ambiguity problem of estimated loans’ distribution. In order to solve the difficulties, this paper proposes a data-driven robust model of portfolio optimization with relative entropy constraints based on an “instance-based” credit risk assessment framework. The model exploits a nonparametric kernel approach to estimate P2P loans’ expected return and risk under the condition that the historical data of the same borrower is unavailable. Furthermore, we construct a robust mean–variance optimization problem based on relative entropy method for P2P loan investment decision. Using the real-world dataset from a notable P2P lending platform, Prosper, we validate the proposed model. Empirical results reveal that our model provides better investment performances than the existing model.

1. Introduction

Peer-to-peer lending, as an emerging online micro-finance, provides services that bring borrowers and lenders together virtually and help them to lend to and borrow from each other directly. P2P lending platforms play roles in removing traditional financial intermediaries, reducing transaction costs, and increasing the benefits of both borrowers and lenders; therefore, they improve the efficiency of financial market. However, due to the absence of traditional financial intermediaries which can use collateral, certified accounts, and other means to enhance the creditworthiness of borrowers, the information asymmetry between borrowers and lenders severely exist and the credit risk of P2P loan investment is very high.

Credit risk of P2P lending refers to the potential monetary loss arising from the default of a borrower to a loan. Efficient and reasonable investment in P2P loans needs to be based on the reliable credit risk distribution assessment. It is very challenging to estimate the credit risk distribution of P2P loans for the difficulty of obtaining the historical returns (or losses) data of the loan waiting for investment. In other words, the historical yield data about the same borrower is usually unavailable. Moreover, even the distribution of loans’ returns (or losses) is approximated from the limited available data or the expert knowledge, the approximation is usually not accurate, and it is also known as the distribution ambiguity (probability measure uncertainty) problem. In this paper, we formulate a data-driven robust portfolio optimization model based on an “instance-based” credit risk assessment method for investment decisions in P2P lending.

To help personal lenders mitigate the risk, the current online P2P lending platforms have taken some risk-reducing measures, such as filtering out the high-risk borrower whose FICO score is lower than a threshold, making a preliminary rating on each loan and providing investors with risk level of each loan. Thus, each loan is marked as a grade, like AA, A, B, C, D, E, or NR, and the loans with the same grade are considered to have the same risk level. These rating-based models are more suitable for traditional banks and lending institutions, since they have the capability to grant large amounts of loans to diversify their investments. However, the individual investors just possess small amount of funds; they need more refined risk assessment methods and investment strategies.

Similar to bond investment, P2P investors can fund a portion, not the whole, of each loan. Therefore, investors can decide which loans to invest and, meanwhile, determine the amount of investment for each loan. This mechanism allows investors to construct a credit portfolio to mitigate risk.

Markowitz [1] proposes the famous mean-variance model, which is still widely used in portfolio selection and risk management. From then on, researchers propose a variety of mean-risk models, such as mean-downside risk model [2], mean-VaR model [3], mean-CVaR model [4], and so on. In practice, the distribution of the assets needs to be estimated firstly, and then the optimal portfolio can be identified by the optimization model.

For P2P lending investment, as mentioned above, such procedures face at least two major challenges, i.e., the deficiency of loans’ historical observations and the ambiguity problem of estimated loans’ distribution (probability measure uncertainty problem). Thus, this paper proposes a data-driven robust model of portfolio optimization based on relative entropy constraints combined with an instance-based credit risk assessment method.

Specifically, we use the “instance-based” credit risk assessment method proposed by Guo et al. [5] to evaluate the return and risk of each loan without sufficient historical data of loans for each individual borrower. In this instance-based framework, the expected return of each loan is predicted as a weighted average of historical loans of other similar borrowers, where the optimal weights are learnt based on kernel regression. Furthermore, using the moment information (mean and variance) of the new loans, we formulate the robust portfolio optimization model with relative entropy constraints, which could obtain an optimal portfolio under the worst scenario and has the ability of reducing the potential loss caused by the uncertainty of loans distribution.

Our work is somewhat related to the paper by Guo et al. [5] and the paper by Yam et al. [6]. Guo et al. [5] introduce the instance-based framework into credit risk assessment of P2P loan and use the classical mean-variance model to obtain the optimal allocation. Yam et al. [6] derive a robust mean-variance optimization model with relative entropy constrains on the uncertainty of the interaction between the returns of different assets and discuss its mathematical and financial properties in portfolio selection. Although some other scholars have contributed novel insights into credit risk assessment of P2P lending and robust optimization, to the best of our knowledge, few have taken both into consideration synthetically. The main contribution of this paper is that we propose a data-driven robust portfolio optimization model based on relative entropy constraints combined with instance-based risk assessment framework for P2P loan investment and obtain superior performance in numerical experiments.

The rest of this paper is organized as follows. Section 2 provides the literature review. Section 3 introduces the instance-based model for credit risk assessment, as well as the mathematical framework of kernel regression approach. In Section 4, we elaborate the robust optimization model based on relative entropy method and formulate a robust mean-variance optimization model for P2P lending investment. The empirical results on the effectiveness of our model is reported in Section 5. Finally, Section 6 concludes this work.

2. Literature Review

In order to assess risk and assist investment decisions making in P2P lending, researchers have done many studies: Emekter et al. [7] explore the dominated factors that explain the funding success and credit risk and, meanwhile, measure the performance of P2P loans. They find that credit grade, debt-to-income ratio, FICO score, and revolving line utilization play an important role in loan defaults; furthermore, loans with lower credit grade and longer duration may result in high mortality rate and higher interest rates charged on low credit grade borrowers are not sufficient to cover the potential loss for the higher likelihood of loan defaults. Thus, the authors suggest that investors should invest more to high grade loans. Similarly, Berkovich’s [8] study finds that high quality loans offer excess return.

The above researches investigate the factors determining the credit risk and analyze the performance of P2P loans; however, they do not propose a mechanism which assist individual investors in allocating loans effectively and making optimal investment decisions.

To help personal lenders mitigate the risk, the popular online P2P platforms, like Lending Club and Prosper, have developed credit scoring systems to assess the creditworthiness of each borrower based on data mining or machine learning techniques. There is a large body of existing literatures concerned with credit rating using data mining techniques, for example, linear discriminate analysis (LDA) [9], k-nearest neighbors [10], logistic regression [11], classification and regression trees (CART) [12], Markov chains [13], survival analysis [14], artificial neural network (ANN) [15], genetic methods [16], support vector machine (SVM) [17, 18], lasso-probit [19], and so on.

In the portfolio selection problem, full knowledge of the assets’ distribution is usually assumed to determine the optimal portfolio. In most real-life applications, we need to approximate the assets’ distribution. However, the approximations are not necessarily accurate, and it is known as the distribution ambiguity (probability measure uncertainty) problem.

The robust optimization algorithm is an attractive way to solve the portfolio selection problem under distribution ambiguity. As the exact parameters are unavailable, Natarajan et al. [20] use a set of parameters (which represent different distributions or scenarios) rather than a point estimation of the parameters to formulate the asset allocation problem. Following this idea, there are different ways to model ambiguity by using a set of parameters. Chen et al. [21] take the lower partial moments and CVaR as two risk measures and consider a tight bound which are likely to cover the possible parameters. Epstein [22] considered intervals that may include the actual parameters. Natarajan et al. [23] use a piecewise-linear concave utility function to derive accurate and estimated optimal strategies for the expected utility model in the portfolio optimization issue under the worst-case scenarios. Paç and Pinar [24] use an ellipsoidal uncertainty set to represent the distribution ambiguity to identify the optimal portfolio.

Since relative entropy has the ability to measure the difference between two probability distributions (probability measures), it can be used to construct the uncertainty set for robust optimization. In the studies of Hansen and Sargent [25] and Calafiore [26], relative entropy is used to model uncertainty and obtain the optimal investment decision. Yam et al. [6] derive a robust mean-variance optimization model with relative entropy constrains on the uncertainty of the interaction between the returns of different assets and discuss its mathematical and financial properties in portfolio selection.

In recent years, research on data-driven methods has been well studied. In this framework, it is assumed that investors only possess the information about history data of asset return. Bertsimas et al. [27] use KS test, χ2 test, Anderson-Darling test, and some other testing tools to construct uncertainty sets and take the worst case of each set to formulate the robust optimization. They assume that the uncertainty sets are defined by certain structures and sizes based on the data points available. While the structure of uncertainty set in our study is not predefined, we consider the uncertainty of mean, covariance, and distribution synthetically. Kang et al. [28] propose a data-driven robust mean-CVaR portfolio selection model under the condition of distribution ambiguity and adopt a nonparametric bootstrap approach to calibrate the levels of ambiguity. Their work is based on the mean-CVaR framework with data of stock indices, while our work is based on the mean-variance framework with data of P2P loans.

3. Instance-Based Model for Credit Risk Assessment

Using historical data to evaluate future performance and potential loss is a convention. However, unlike bonds or stocks investment, the historical yield data about the same P2P borrower is usually unavailable. Thus, the risk assessment of new loan is very challenging. In this section, we briefly introduce the instance-based credit risk assessment model proposed by Guo et al. [5].

3.1. Instance-Based Assessment Framework

In this instance-based assessment framework, the expected return of each loan is estimated as a weighted average of historical observations of other borrowers’ closed loans. Specifically, for a new loan i, using n past loans, each with an historical return (j = 1, 2,..., n), we can calculate the expected retrun of loan i, , based on a weighted average of past loans’ actual returns:where denotes the weight of loan j for predicting the expected retrun of loan i. The weight depends on the similarity between loan i and loan j. Intuitively, the more the similarity, the greater the weight. The calculation of the weight will be introduced in Section 3.2.

The weighted returns of the past loans are assumed as historical observations of a new loan. According to this line of thought, taking variance as the risk measure, weighted variance of past loans are used to assess the new loan’s risk, that is,where , , and have the same meanings as (1).

The absolute deviation between two loans’ default probabilities is used to measure the similarity; the smaller the absolute deviation, the more the similarity, and, therefore, the larger the weight. In particular, absolute deviation of default probabilities between loans i and j is defined as follows: dij = pi - pj, where pi and pj are the default probabilities of loans i and j, respectively. Kernel regression is exploited to investigate the nonlinear relationship between the absolute deviation and the weight. This process will be introduced in the next subsection.

3.2. Kernel Regression of Return and Risk

Kernel regression is a nonparameter statistical method to investigate the nonlinear relation between random variables, which is based on the kernel density estimation. First of all, the preliminaries of kernel estimation are introduced.

Given n realizations zj, j = 1,..., n, of random variable z, the kernel estimation of the probability density function p(z) is defined bywhere K(·) is a kernel function and h is a smoothing parameter.

Kernel function K(·) is nonnegative and bounded and, meanwhile, satisfies the following properties:

(a) ; (b) ; (c) .

There are a range of commonly used kernel functions, such as uniform, triangular, biweight, triweight, and Gaussian [29]. Because the kernel estimation is insensitive to the choice of kernel function, we use the Gaussian kernel function due to its convenient mathematical properties, which is written as .

The smoothing parameter h=h(n) is also called the bandwidth that depends on the sample size n. Specifically, h(n) and n·h(n) decrease to 0 as n tend to ∞.

Many literatures reveal that the choice of kernel function does not affect the estimation significantly; however, the choice of the bandwidth is a vital issue [30, 31]. The determination of the bandwidth will be shown in detail in Section 5.3.

In the following, we introduce the kernel regression model proposed by Nadaraya [32]. Theoretically, we assume that each observation is denoted as (X, Y) which is a random vector R2-valued. With the sample set, xj, yj) j = 1, 2,..., , the kernel estimator of the target y given its predictive observation x is defined aswhere K(·) is a kernel function and h is the bandwidth.

For the instance-based credit risk modeling, the set of historical observations is represented as pj, Rj) j = 1, 2,..., , where pj and Rj are the default probability and return rate of the jth loan, respectively. Thereby, the estimation of the ith loan’s return could be written as Note that the determination of loans’ default probability will be introduced in Section 5.1.

Comparing (1) to (5), we can represent the optimal weight as Using the optimal weight and the expected return derived from (5), (2) can be rewritten as

4. Robust Investment Decision Model

Similar to bond investment, P2P lenders can invest a portion of each loan. Thus, P2P loan investment decisions can be transformed into a credit portfolio optimization problem. This section introduces the portfolio optimization model for investment decisions in P2P lending, which accounts for the uncertainty of the distribution of the loans. We start from the classical mean-variance optimization model proposed by Markowitz [1] to its tractable robust counterpart.

4.1. Robust Optimization Model Based on Relative Entropy Constraints

In the classical mean-variance optimization model, the optimal asset allocation strategy is identified by solving the tradeoff between risk and return according to investors’ risk preference. A portfolio that invests in n assets is represented as a vector of weights, λ∈ Rn, where each weight denotes the proportion of wealth allocated to an asset. Then the return and risk of the portfolio become and , respectively, where μ∈ Rn and V∈ Rn×n are the expected return and the covariance matrix of the assets’ returns under the probability measure (or probability distribution) P, respectively. Here, P represents the ideal estimated market condition where μ and V estimated by using all available information, including historical observations, news, expert knowledge, and so on, are assumed as the actual expected return and covariance matrix. Thus, the classical mean-variance portfolio selection problem (MV) can be formulated as where Ω ⊆ Rn denotes the set of feasible portfolios and is the required return rate specified by the investor.

In reality, the assumption that the expected return μ and covariance matrix V are known with certainty is less reasonable. It is quite possible that the estimated parameters are different with the actual ones. Thus, the optimal portfolio identified by using the estimated inputs parameters μ and V directly may be inappropriate. Robust optimization seeks for portfolios that are insensitive to the uncertain in the parameters and the solutions that must be feasible no matter what the actual value of the parameters is.

The investors might consider a set of probability measures, i.e., an uncertainty set, to cover a range of scenarios based on their assessments, and then use robust optimization to obtain approximate optimal strategies for the worst scenarios within the uncertainty set. In this paper, we define as the set of probability measures representing the possible scenarios, and as the expected return and covariance matrix estimated under the probability measure . Mathematically, the robust counterpart of the classical mean-variance optimization problem (RMV) can be written as It is rational to assume that the actual value of the parameters is in the neighborhood of the estimator. Thus, we can generate the uncertainty set based on the assumption that the measures in the set should be not far from the ideal measure P. Relative entropy, also known as the Kullback–Leibler divergence, can be used to measure the difference between probability measures. The relative entropy of the measure in with respect to the measure P iswhere and are the probability density functions (pdf) of the loans’ returns under probability measures P and , respectively. In the context of mean-variance analysis, relative entropy can be rewritten as where , V, , and carry the same meaning as in (8) and (9); tr(V), , and V be the trace, the determinant, and the transpose of V, respectively; n is the amount of assets in the portfolio.

Let denote the set of parameters (, ) under the measure Q in . Using the constraint of relative entropy, we can rewrite the robust optimization model, (9), aswhere K is a positive constant and determines the size of uncertainty set. Parameter K measures the level of uncertainty and reflects the investors’ confidence in and V estimated under probability measure P, i.e., the greater K’s value, the less confidence.

Yam et al. [6] prove that the robust mean-variance portfolio selection model based on relative entropy method (RMV-RE) can be formulated as quadratic optimization problem, which is a tractable formulation and can be efficiently solved. That is,Herein, =ζμ, V=V+ζ(1-ζ)μ and is related to K in (12) closely, which reflects the level of confidence in μ and V estimated under measure P. For example, ζ=1 means that investors believe the estimated μ and V are the true parameters. And as ζ decreases, the investor’s confidence is weaker. The details of the proof are referred to by Yam et al. [6].

4.2. Robust Mean-Variance Portfolio Optimization Model in P2P Lending

In the Section 3.2, we estimated each loan’s expected return and variance of return, i.e., and , using the instance-based credit risk assessment model. Let and denote the expected return vector and the covariance matrix of the loans’ returns under the probability measure P. Here, we assume that the correlation between P2P loans is negligible. Now we can rewrite (13) as The feasible region Ω of our problem is defined by the following constraints:(1)The value of the portfolio remains at its initial value, i.e., .(2)Short-selling is forbidden; thus .(3)For each loan, the amount that lender can invest is no more than the borrower request, mi; thereby, M ≤ mi, where M is the total investment amount and investor has available.

5. Empirical Analysis

In this section, we investigate the validity of the robust mean-variance portfolio optimization model in P2P lending using the real-world dataset from a notable P2P lending platform, Prosper. All numerical experiments are performed by using MATLAB on PC.

5.1. Data Description and Preprocess

The dataset for empirical study is from a notable P2P lending platform in the United States, Prosper. It consists of 17,001 loans including 3039 default loans and 13908 completed loans, whose issue dates within the period from November 2005 to March 2014.

Using the data, a credit scoring model is learnt to transform the loan attributes into the default probability. The loan attributes are as follows: the borrower’s FICO score which reflects borrower’s creditworthiness, the borrower’s number of inquiries in the past six months, the monetary amount of the loan, the homeownership status of the borrower, the debt-to-income ratio of the borrower, the borrower’s current delinquencies representing the number of accounts delinquent, and the borrower’s number of public records in the past 10 years (Row 1-7 in Table 1). The target variable is a binary variable (0 represents completed and 1 represents default), as described in Row 8 of Table 1.


VariableDescription

X1FICO score of the borrower
X2The number of inquiries of the borrower in the last 6 months
X3The monetary amount of the loan
X4The homeownership status of the borrower (0 = rent, 1 = own)
X5The debt-to-income ratio of the borrower
X6The number of accounts delinquent
X7The number of public records in the past 10 years
YDependent variable (0 = completed, 1 = default)

There exist many credit scoring models to predict the default probability of a loan, such as: Xgboost model [3335], hybrid KMV model [36], credit scoring based on genetic algorithms [37, 38], and so on. However, discussing how to choose and construct the optimal credit scoring model is beyond the scope of this study, and we use the most popular model, logistic regression, to make the prediction in this preprocessing step.

We randomly divide the dataset into two parts, one containing 40% of all loans for determining the optimal bandwidth h in (5), which will be described in detail in Section 5.3 and the second part containing 60% of the loans. Moreover, using k-fold cross-validation, we randomly divide the second part into 20 subsets, each of which contains approximately 510 loans. In each round, one of the subsets is used as the testing set which consists of loans waiting to be invested, thus their pay-back statuses are unknown, and all other subsets are taken as a training set which consists of historical loans with known yield.

5.2. Model Description

In this paper, we propose a robust credit portfolio optimization model for investment decisions in P2P lending. In order to show its effectiveness, we compare it with a benchmark model proposed by Guo et al. [5]. In the following, we describe models in detail.

IOM is the instance-based model proposed by Guo et al. [5]. Each loan is assessed using kernel weights and the historical performance of similar loans. Then use the classical mean-variance model (8), to identify the optimal allocation strategy. The performance of this model outperforms some rating-based models, as the results of Guo et al. [5] show.

RIOM is the robust instance-based model in this study. Expected return and risk of each loan are also assessed based on the “instance-based” assessment framework. However, we use the robust model of credit portfolio optimization based on relative entropy method, Equation (15), to obtain the optimal investment decision.

We compare the two models by the following procedure:(1)Train the credit risk assessment model with the training set, and use the trained model to predict the expected return () and variance () of each loan in the testing set. Thus, the expected return vector and the covariance matrix, μ and V, can be obtained.(2)For each model, feed the predicted expected return vector μ and the covariance matrix of the testing loans into the portfolio optimization algorithm, and compute the performance of investment on the optimal portfolio.(3)Compare the return rate of the two models.

5.3. Analysis of Results

As mentioned before, we select the Gaussian kernel, , as the kernel function. And the important parameter in the kernel regression model, bandwidth h, is optimized by the following leave-one-out cross validation:where is the leave-one-out estimation of expected return rate , specifically,The curve of CV(h) is exhibited in Figure 1. The shape of the curve clearly shows a minimal point and h corresponding to the minimal point is the optimal bandwith for the model.

To apply the robust credit portfolio optimization method to obtain the optimal investment strategy in problems (13), we select the parameter ζ=0.75, the investment amount M = 15 thousand dollars, and the required rate of return = 0.05. We also set the risk-free return rate as 0.025, which is about equivalent to the average yield of T-Bills over the same period. And we use the MATLAB built-in solver “quaprog” to solve the two portfolio optimization problems.

Table 2 summarizes investment return rate of each test subset and the average performance of the Prosper dataset. It shows that the two portfolios are almost always efficient and feasible, except subset 16. The results also show that the actual performances of the optimal portfolio derived from RIOM always outperform the optimal portfolio from IOM. And the Sharpe ratio shows that median-based optimal portfolio performs better as well.


SubsetIOMRIOM

10.05010.0566
20.05500.0633
30.05400.0618
40.05640.0696
50.06270.0714
60.05430.0629
70.05320.0688
80.06050.0711
90.05930.0706
100.05460.0664
110.06370.0701
120.05670.0640
130.04680.0569
140.05190.0663
150.05440.0620
160.03570.0472
170.05880.0710
180.06070.0774
190.05440.0655
200.06250.0808

Average0.05530.0662

In order to test and verify that the conclusions obtained from the above experiments are stable, we consider different investment amounts and required returns as input parameters for portfolio selection and keep other conditions unchanged. As summarized in Table 3, we consider nine parameters pairs about required return rate and investment amount M.


SetInvestment amount MRequired rate

1$10,0005.0%
2$10,0005.5%
3$10,0006.0%
4$15,0005.0%
5$15,0005.5%
6$15,0006.0%
7$20,0005.0%
8$20,0005.5%
9$20,0006.0%

The computational results for each parameters pair are summarized in Table 4. Table 4 shows performance comparison of the two optimal portfolios from the perspectives of actual return rate of investment. The more intuitive results are shown in Figure 2, which shows the actual return rate comparison of the two models. The first 9 numbers of the horizontal axis in Figure 2 represent the corresponding parameters combinations (sets 1 through 9 from Table 3), and the number 10 shows the average. We can find that the RIOM model outperforms the IOM model comprehensively.


Subset = 5%, = 5.5%, =6%, = 5%, = 5.5%, = 6%, = 5%, = 5.5%, = 6%,
M = 10,000M = 10,000M = 10,000M = 15,000M = 15,000M = 15,000M = 20,000M = 20,000M = 20,000
IOMRIOMIOMRIOMIOMRIOMIOMRIOMIOMRIOMIOMRIOMIOMRIOMIOMRIOMIOMRIOM

10.05980.07270.06010.07620.05020.07220.05010.05660.05580.08390.05200.08330.05940.07740.05440.06890.06910.0649
20.05000.05920.06010.07790.06750.08510.05500.06330.05170.06200.05510.06480.05040.05840.06640.08950.06610.0769
30.04410.05710.04910.06980.07350.09340.05400.06180.05980.07780.06310.06460.05030.05680.05540.07370.06470.0869
40.05250.06020.06580.09030.06360.07540.05640.06960.05120.06290.05530.08470.05660.06480.06170.08560.05180.0635
50.05320.06200.06310.08290.05130.07310.06270.07140.05660.07210.06160.08960.05760.07090.05470.06920.06100.0771
60.06340.07470.05640.07620.07170.11050.05430.06290.05700.07720.05850.08740.05840.06830.05280.07010.05160.0385
70.06130.07360.05470.07540.05510.08840.05320.06880.05280.07270.06200.05660.04810.05490.04850.06310.04600.0759
80.05290.05940.05050.06160.06850.08580.06050.07110.05450.07680.06450.08550.05450.07010.06280.08230.05920.0845
90.05480.06470.05500.07360.05590.04930.05930.07060.05070.05760.05740.12140.05350.05860.05610.07640.05740.1038
100.04740.05740.04720.06340.04990.07940.05460.06640.05280.06340.06220.06310.05140.05970.05820.06830.06890.0532
110.05970.07300.06020.07950.06610.10900.06370.07010.05620.07560.04980.06620.05310.05840.05690.06570.05720.1141
120.06440.07680.05410.06730.06240.10420.05670.06400.05290.06770.05740.09830.05510.06780.05360.07340.06180.0687
130.06350.07850.07090.08950.05320.06620.04680.05690.06370.08800.05040.09130.05550.06950.06360.08240.06160.1157
140.05930.07440.06260.07510.06340.12040.05190.06630.05680.07160.06140.11620.05770.06740.05410.06360.05720.0818
150.05230.06360.04850.06090.05710.09870.05440.06200.05770.07640.06330.08020.05970.07750.05360.07060.05950.0704
160.05490.07050.06840.08930.05080.12640.03570.04720.06420.08510.05730.05490.05930.06980.06160.08070.05510.0748
170.05490.06660.05490.07570.05380.06770.05880.07100.06740.08670.06150.04960.05350.06410.04870.06360.06960.0915
180.05460.06290.05120.06150.05600.06100.06070.07740.05850.07320.06870.08240.05990.07290.05760.07460.05070.1069
190.04920.05550.05720.06850.06570.04360.05440.06550.04340.06330.05890.07590.05810.06730.04720.06380.06230.1148
200.05540.06450.04130.05040.05960.03660.06250.08080.05620.06870.06980.09780.05180.07090.06010.07340.06380.0744

Average0.05540.06640.05660.07320.05980.08230.05530.06620.05600.07310.05950.08070.05520.06630.05640.07290.05970.0819

In conclusion, the optimal portfolio identified from the robust optimization model in this study is more efficient than the existing model. And the performance of our model is more robust and stable.

6. Conclusions

In this paper, we formulate a data-driven robust model of portfolio optimization with relative entropy constraints based on an instance-based credit risk assessment framework for investment decisions in P2P lending. This P2P lending investment decision model has at least three advantages. Firstly, it provides a more refined measure of P2P loans’ risk and reveals a more intuitive and quantized risk estimate to investors, instead of just labelling each loan with a credit grade. Secondly, this model can estimate each loan’s expected return and risk when the historical observation of the same borrower is unavailable. Finally, this model considers the loans’ distribution ambiguity (probability measure uncertainty) problem and uses relative entropy to model parameter uncertainty to ensure the optimal allocation strategy efficient and feasible under various actual scenarios. Numerical experiments imply that the P2P lending investment decision model using the robust optimization with relative entropy constraints provides better performance than existing model.

Data Availability

The data this paper used is downloaded from the website of Prosper: https://www.prosper.com/invest/download.aspx.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.”

Acknowledgments

The research is supported by the National Natural Science Foundation of China (Grants nos. 71471027, 71731003, and 71873103), the National Social Science Foundation of China (Grant no. 16BTJ017), National Natural Science Foundation of China Youth Project (Grant no. 71601041), Liaoning Economic and Social Development Key Issues (Grant no. 2015lslktzdian-05), and Liaoning Provincial Social Science Planning Fund Project (Grant no. L16BJY016). The authors acknowledge the organizations mentioned above.

References

  1. H. Markowitz, “Portfolio selection,” The Journal of Finance, vol. 7, no. 1, pp. 77–91, 1952. View at: Publisher Site | Google Scholar
  2. H. M. Markowitz, Portfolio Selection: Efficient Diversication of Investment, Wiley, New York, NY, USA, 1959. View at: MathSciNet
  3. N. Larsen, H. Mausser, and S. Uryasev, “Algorithms for optimization of Value-atRisk,” in Financial Engineering, ECommerce and Supply Chain, Applied Optimization, P. M. Pardalos and V. K. Tsitsiringos, Eds., vol. 70, Kluwer Academic Publishers, Dordrecht, 2002. View at: Google Scholar
  4. R. T. Rockafellar and S. Uryasev, “Conditional value-at-risk for general loss distributions,” Journal of Banking & Finance, vol. 26, no. 7, pp. 1443–1471, 2002. View at: Publisher Site | Google Scholar
  5. Y. H. Guo, W. J. Zhou, C. Y. Luo, C. R. Liu, and H. Xiong, “Instance-based credit risk assessment for investment decisions in P2P Lending,” European Journal of Operational Research, vol. 249, no. 2, pp. 417–426, 2016. View at: Publisher Site | Google Scholar | MathSciNet
  6. S. C. P. Yam, H. Yang, and F. L. Yuen, “Optimal asset allocation: Risk and information uncertainty,” European Journal of Operational Research, vol. 251, no. 2, pp. 554–561, 2016. View at: Publisher Site | Google Scholar
  7. R. Emekter, Y. Tu, B. Jirasakuldech, and M. Lu, “Evaluating credit risk and loan performance in online Peer-to-Peer (P2P) lending,” Applied Economics, vol. 47, no. 1, pp. 54–70, 2014. View at: Publisher Site | Google Scholar
  8. E. Berkovich, “Search and herding effects in peer-to-peer lending: evidence from prosper.com,” Annals of Finance, vol. 7, no. 3, pp. 389–405, 2011. View at: Publisher Site | Google Scholar
  9. E. I. Altman, “Financial ratios, discriminant analysis and the prediction of corporate bankruptcy,” The Journal of Finance, vol. 23, no. 4, pp. 589–609, 1968. View at: Publisher Site | Google Scholar
  10. S. Chatterjee and S. Barcun, “A nonparametric approach to credit screening,” Publications of the American Statistical Association, vol. 65, no. 329, pp. 150–154, 1970. View at: Publisher Site | Google Scholar
  11. J. C. Wigintor, “A note on the comparison of logit and discriminant models of consumer credit behavior,” Journal of Financial and Quantitative Analysis, vol. 15, no. 3, pp. 757–770, 1980. View at: Publisher Site | Google Scholar
  12. L. Breiman, J. H. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees, Wadsworth, Belmont, Calif, USA, 1983.
  13. M. M. So and L. C. Thomas, “Modelling the profitability of credit cards by Markov decision processes,” European Journal of Operational Research, vol. 212, no. 1, pp. 123–130, 2011. View at: Publisher Site | Google Scholar
  14. G. Andreeva, J. Ansell, and J. Crook, “Modelling profitability using survival combination scores,” European Journal of Operational Research, vol. 183, no. 3, pp. 1537–1549, 2007. View at: Publisher Site | Google Scholar
  15. D. West, “Neural network credit scoring models,” Computers & Operations Research, vol. 27, pp. 1131–1152, 2000. View at: Publisher Site | Google Scholar
  16. J. J. Huang, G. H. Tzeng, and C. S. Ong, “Two-stage genetic programming (2SGP) for the credit scoring model,” Applied Mathematics and Computation, vol. 174, no. 2, pp. 1039–1053, 2006. View at: Publisher Site | Google Scholar | MathSciNet
  17. C. L. Huang, M. C. Chen, and C. J. Wang, “Credit scoring with a data mining approach based on support vector machines,” Expert Systems with Applications, vol. 33, no. 4, pp. 847–856, 2007. View at: Publisher Site | Google Scholar
  18. P. Danenas and G. Garsva, “Selection of support vector machines based classifiers for credit risk domain,” Expert Systems with Applications, vol. 42, no. 6, pp. 3194–3204, 2015. View at: Publisher Site | Google Scholar
  19. G. Sermpinis, S. Tsoukas, and P. Zhang, “Modelling market implied ratings using LASSO variable selection techniques,” Journal of Empirical Finance, vol. 48, pp. 19–35, 2018. View at: Google Scholar
  20. K. Natarajan, D. Pachamanova, and M. Sim, “Constructing risk measures from uncertainty sets,” Operations Research, vol. 57, no. 5, pp. 1129–1141, 2009. View at: Publisher Site | Google Scholar
  21. L. Chen, S. He, and S. Zhang, “Tight bounds for some risk measures, with applications to robust portfolio selection,” Operations Research, vol. 59, no. 4, pp. 847–865, 2011. View at: Publisher Site | Google Scholar | MathSciNet
  22. L. G. Epstein, “A paradox for the "smooth ambiguity"' model of preference,” Econometrica, vol. 78, no. 6, pp. 2085–2099, 2010. View at: Publisher Site | Google Scholar | MathSciNet
  23. K. Natarajan, M. Sim, and J. Uichanco, “Tractable robust expected utility and risk models for portfolio optimization,” Mathematical Finance, vol. 20, no. 4, pp. 695–731, 2010. View at: Publisher Site | Google Scholar
  24. A. B. Paç and M. Ç. Pınar, “Robust portfolio choice with CVaR and VaR under distribution and mean return ambiguity,” TOP, vol. 22, no. 3, pp. 875–891, 2014. View at: Publisher Site | Google Scholar
  25. L. P. Hansen and T. J. Sargent, “Robust control and model uncertainty,” The American Economic Review, vol. 91, no. 2, pp. 60–66, 2001. View at: Google Scholar
  26. G. C. Calafiore, “Ambiguous risk measures and optimal robust portfolios,” Society for Industrial and Applied Mathematics, vol. 18, no. 3, pp. 853–877, 2007. View at: Publisher Site | Google Scholar | MathSciNet
  27. D. Bertsimas, V. Gupta, and N. Kallus, “Data-driven robust optimization,” Mathematical Programming, vol. 167, no. 2, pp. 235–292, 2018. View at: Publisher Site | Google Scholar
  28. Z. Kang, X. Li, Z. Li, and S. Zhu, “Data-driven robust mean-CVaR portfolio selection under distribution ambiguity,” Quantitative Finance, pp. 1–17, 2018. View at: Google Scholar
  29. Q. Li and J. S. Racine, Nonparametric Econometrics: Theory and Practice, Princeton University Press, 2007. View at: MathSciNet
  30. O. Scaillet, “Nonparametric estimation and sensitivity analysis of expected shortfall,” Mathematical Finance, vol. 14, no. 1, pp. 115–129, 2004. View at: Publisher Site | Google Scholar
  31. H. Yao, Z. Li, and Y. Lai, “Mean–CVaR portfolio selection: A nonparametric estimation framework,” Computers & Operations Research, vol. 40, no. 4, pp. 1014–1022, 2013. View at: Publisher Site | Google Scholar
  32. E. A. Nadaraja, “On non-parametric estimates of density functions and regression,” Theory of Probability & Its Applications, vol. 10, no. 1, pp. 186–190, 1965. View at: Google Scholar | MathSciNet
  33. T. Chen and T. He, “Higgs boson discovery with boosted trees,” in Proceedings of the NIPS 2014 Workshop on High-energy Physics and Machine Learning, pp. 69–80, 2015. View at: Google Scholar
  34. Y. Xia, C. Liu, Y. Li, and N. Liu, “A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring,” Expert Systems with Applications, vol. 78, pp. 225–241, 2017. View at: Publisher Site | Google Scholar
  35. H. He, W. Zhang, and S. Zhang, “A novel ensemble method for credit scoring: Adaption of different imbalance ratios,” Expert Systems with Applications, vol. 98, pp. 105–117, 2018. View at: Publisher Site | Google Scholar
  36. C.-C. Yeh, F. Lin, and C.-Y. Hsu, “A hybrid KMV model, random forests and rough set theory approach for credit rating,” Knowledge-Based Systems, vol. 33, no. 3, pp. 166–172, 2012. View at: Publisher Site | Google Scholar
  37. S. Oreski, D. Oreski, and G. Oreski, “Hybrid system with genetic algorithm and artificial neural networks and its application to retail credit risk assessment,” Expert Systems with Applications, vol. 39, no. 16, pp. 12605–12617, 2012. View at: Publisher Site | Google Scholar
  38. V. Kozeny, “Genetic algorithms for credit scoring: Alternative fitness function performance comparison,” Expert Systems with Applications, vol. 42, no. 6, pp. 2998–3004, 2015. View at: Publisher Site | Google Scholar

Copyright © 2019 Guotai Chi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder
Views1192
Downloads693
Citations

Related articles

We are committed to sharing findings related to COVID-19 as quickly as possible. We will be providing unlimited waivers of publication charges for accepted research articles as well as case reports and case series related to COVID-19. Review articles are excluded from this waiver policy. Sign up here as a reviewer to help fast-track new submissions.