Abstract

With the rapid development of the Internet, the traditional Internet financial risk prediction methods can no longer meet the needs of individuals and enterprises, so the concept of cloud computing arises at the historic moment. Cloud computing has subverted the traditional financial risk prediction method and has been widely studied and applied for its distributed, dynamic and autonomous characteristics. How to efficiently and reasonably schedule the resources of cloud data center and improve the accuracy of financial risk prediction is the focus of current research. How to quantify financial risk and financial risk early warning is one of the urgent problems to be solved. Under the framework of cloud computing, this paper combines the feature extraction and data weighting to study the user’s basic attribute data and a large number of downloaded APP types. After that, linear regression with penalty is used to construct the prediction model to improve insolvency. The accuracy of customer default judgment can realize local optimization, so as to improve the prediction and control of hidden risks of customer commercial bank loans and greatly reduce the default risk of bank loans.

1. Introduction

Cloud computing is developed on the basis of parallel computing, distributed computing, grid computing, and other technologies. It is a means of efficient resource utilization that can adapt to future large-scale computing needs. Parallel computing is a computing technique that uses multiple computer resources simultaneously to solve the same computing problem [1]. Different from parallel computing, distributed computing divides task modules into independent modules, and the failure of a single module does not affect other modules. Grid computing is a special computing model for complex scientific computing [25].

With the high-speed development of Internet, the network increase the amount of data, the quantity of server cluster, user software and computing power increases the elasticity of demand, how to manage the effective use of idle computing resources and into a service resources, both in finance and research fields have become a very popular topic. The emergence of cloud computing is not achieved overnight, but is generated after several generations of technological revolution to adapt to the requirements of the new information age. To sum up the dynamic force of its emergence, there are the two main reasons [6, 7]: (1) the rapid development of various technologies, virtualization technology, distributed computing is the basis of the development of cloud computing, in addition to automatic deployment and management, big data storage technology, powerful Map/Reduce mode and (2) in the field cost energy consumption equipment cost management and other aspects of the enterprise need to spend a lot of human resources and capital costs, this extremely dispersed highly closed machine room construction, thousands of enterprises need to face a major problem and stubborn disease [8].

Computing provides users with a brand new mode of resource possession. Users can apply for the use of resources according to their needs. This service has strong flexibility and can save users’ expenses to the maximum extent, so it has a strong price advantage. In addition, the cloud system has special disaster recovery and multicopy backup mechanisms to ensure the normal operation of the system even when some nodes in the cloud fail. Therefore, the cloud system has more stable and reliable performance advantages than enterprise and personal computers. According to the different types of services provided, the system model of cloud computing can be divided into three layers: application layer, platform layer, infrastructure layer, each layer corresponds to a subservice set [911]: Software as a Service, SaaS Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), which is shown in Figure 1.

As a new science in computer science and artificial intelligence, cloud computing is attracting scientific attention for its power and ability to explain the nature of hidden data rule templates [12]. The development of this technology has profoundly affected many fields and achieved success in more and more fields. Among them, financial risk management is an important application scenario. Most financial institutions use traditional machine learning algorithms to predict the development of financial markets, predict the solvency of borrowers or make credit approval decisions, so as to predict and identify high-risk customers [1316].

With the change of the world economic environment, the causes of financial crisis and financial risks are also changing constantly. Researchers have improved their understanding of financial risks and found the causes of financial crises from different aspects. In recent years, as financial stability is becoming more and more important for the security of a country, the method of constructing financial risk indicator system can bring good effects for the state to regulate financial risks, which makes the research in this aspect continuously deepened. Now, both the government and scholars at home and abroad have been enriched in this research. In addition to the government’s enthusiasm for studying financial risk evaluation indicators, many scholars also have great attention in this area. D.c. Hardy, a senior economist of the International Monetary Fund, believes that financial risk indicators are composed of the index system of the banking sector and the index system of the macro-economy sector. The authors of [17] constructed a financial risk early-warning indicator system mainly from 34 indicators reflecting the real economy indicators, such as operating soundness indicators. Paper [18] considered that the financial risk index system suitable for China is constructed from the following five aspects: general operation of national economy, finance, finance, foreign economy, and bubble economy. A total of 24 indicators are selected. Recently, Chen et al. [19] used factor analysis to reduce the dimension of financial risk indicators, uses VaR model to measure financial risk on the basis of public factor data, and finally studies China’s financial risk early warning system by establishing early warning lights. In addition to the regression and classification model of overdue loans, some domestic scholars use other perspectives to study overdue risks. Guo and Shen [20, 21] combined with game theory and information economics methods such as refined Bayesian equilibrium, the moral hazard supervision model is established to measure and control default risk, and the optimal probability is deduced by using optimization mathematical method [4, 22]. In view of the overdue risk of the four major characteristic factors (borrower characteristics, property characteristics, loan characteristics, and regional characteristics) affecting the construction of housing mortgage loan, Wang et al. [23] adopted logistic regression, factor analysis and discriminant analysis to analyze the overdue risk, and successfully applied the theoretical results to the prediction of financial risk. Nowadays, the research on economic problems focuses on qualitative and quantitative analysis, and the statistical measurement methods of financial risks are also being constantly improved. Dong et al. [24] used the analytic hierarchy process to construct the financial risk index system, and then uses the support vector machine prediction method to establish the financial risk early warning system.

From the above analysis, we know that the above methods have studied risk prediction model of Internet finance to some extent, some problem still exists. On the other hand, no scholar has applied cloud computing to this field till now, so the research here is still a blank, which has great theoretical research and practical application value [25, 26].

The following are the contributions of this paper:(1).The proposed algorithm provides a new solution idea for solving the risk prediction problems of Internet finance, realizes feature extraction, and data weighting according to the user’s basic attribute data and a large number of downloaded APP types.(2).Then, linear regression with penalty is used to construct the prediction model to improve insolvency. The accuracy of customer default judgment can realize local optimization, so as to improve the prediction and control of hidden risks of customer commercial bank loans, and greatly reduce the default risk of bank loans.

This paper consists of five parts. Sections 1 and 2 give the research status and background. Section 3 is the risk prediction model of Internet finance based on cloud computing. Section 4 shows the experimental results and analysis. The experimental results of this paper are introduced and compared and analyzed with relevant comparison algorithms followed. Finally, Section 5 concludes the full paper.

3. Risk Prediction Model of Internet Finance Based on Cloud Computing

3.1. Cloud Computing Architecture

Cloud computing is another revolutionary change in information technology following the shift from mainframe computers to client/server (C/S) model in the 1980s. On August 9, 2006, Eric Schmidt, CEO of Google, spoke at the Search Engine Conference (SES San) Cloud computing is the fusion of traditional computer and network technologies such as grid computing, distributed computing, parallel computing utility technology, network storage virtualization, and load balancing. Its purpose is to organize and integrate shared software/hardware resources and information to be used by computers and other systems on demand through web-based computing, which may include the following layers [27].

Display Layer: this layer of most data center cloud computing architectures is mainly used to present the content and service experience required by users in a friendly manner and leverages a variety of services provided by the middleware layer below. Middle layer: the middle layer is the link between the top and bottom, providing services such as caching services and REST services on top of the resources provided by the infrastructure layer below, which can be used to support the display layer or be invoked directly by users. Management layer: the management layer serves the horizontal three layers and provides them with a variety of management and maintenance skills. There are three horizontal layers in cloud computing architecture, namely display layer, middleware layer, and infrastructure layer. These three layers can provide very rich cloud computing capabilities and friendly user interfaces. Cloud computing architecture also has a vertical layer, called the management layer, to better manage and maintain the horizontal three layers. Hardware infrastructure layer: the hardware abstraction layer is the interface layer between the operating system, kernel, and hardware circuit, which aims to abstract the hardware. It hides the hardware interface of the details of a particular platform, provide virtual operating system hardware platform, make its have hardware independence, can be transplanted on multiple platforms. From the point of view of hardware and software testing, software and hardware of the test work will be performed, respectively, based on the hardware abstraction layer, making it possible to parallel of the hardware and software testing work. The detail of cloud computing architecture is shown in Figure 2.

3.2. Financial Credit Risk Analysis Process of the Internet of Things

Today, Internet of Things (IoT) technology is more widely used, and the growing number of IoT devices from health and medical sensors to factory monitors, tracking systems to smart grids can create even more vulnerabilities [28]. Billions of physical devices around the world are now connected to the public Internet. IoT is a central component of many digital transformation plans, but like any technological innovation comes with unique digital risks as well as benefits. In the Internet of things, the connection of equipment often produces a large amount of data, the data will be other areas of the organization’s IT infrastructure used to send or stored. In this way, it effectively have a domino effect in the field of the whole risk, including the network security risk of third party compliance and business flexibility. Therefore, Internet security is not just an equipment management. Whether it is the discovery of new endpoints, identification and classification of additional requirements for further compliance checks, or authentication updates, enterprises may need to transform their security approach to effectively manage IoT risks [29].

Based on Internet of Things technology, the financial risk analysis process of the Internet of Things is given in Figure 3.

3.3. Implementation of Risk Prediction of Internet Finance

Linear regression model is a statistical method used to study the correlation between independent variables and dependent variables. Consider that there are p independent variables , and the dependent variable is . The following relationship is satisfied:where is the random error of observation and is the unknown parameter to be determined. Write it in matrix form:

And the corresponding the linear regression model is

According to the observed data, the regression coefficient is calculated by using the generalized least square method, which is characterized by minimizing the square of vector length, as shown in

The sum of the squares of residuals is

Then, the estimated value of the parameter and the predicted value of the dependent variable are, respectively:

The estimated expectation and variance are as follows:

The random interference term considered by the generalized least square method has heteroscedasticity:

The least square is used to estimate, and the expectation and variance are

4. Experimental Results and Analysis

4.1. Introduction to Experimental Environment and Data Set

The hardware configuration used in the experiment is as follows: I3 quad-core processor, 8G DDR memory, and 500G hard disk software environment. The computer used in this experiment is configured as Windows10, and the software version used is Python3.6.

In order to better illustrate, verify, and analyze the loan prediction method proposed to predict bank loan defaults, this paper will use open source real data set to verify the validity of the prediction method proposed in this paper [12]. This data set is customer desensitization data from a branch of a commercial bank in China [4]. Its statistical characteristics are shown in Table 1:

In the original data available to us, there are many dimensions of customer basic attribute names. In the original data set shown in this table, the dimensions and sample sizes are relatively large, and this original data set does not contain the identification symbols of categories. This means that the raw data cannot be applied directly to the bank credit risk analysis, therefore must have the pretreatment process, and also can be seen in table, which most of the attribute values are missing data, the biggest jump rate as high as 35%, but direct delete these values of missing items will have a significant impact on the overall integrity of records. Therefore, we must use some appropriate data populating method for each missing attribute value. In this article, we used the property fill average method to populate the average of the original existing values in each property column to populate the missing values in each property column.

4.2. Experimental Results Analysis

The experimental data are all internal open source data of the bank, and sensitive privacy information has been desensitized. The data are divided into two parts. One part is the basic attribute data of loan users, including user number, user type, number of calls, duration of mobile phone use, and monthly phone usage fee. Though description of the customer is in many ways, but there is still a part of the data field is missing, missing data are more fields, discard missing data missing more fields, so as not to affect the subsequent forecast accuracy, when the missing data in the data set are less, with missing data, the basic principle of complementary manually complete it to ensure data integrity.

Since the data need feature selection and the distribution of data is extremely unbalanced, using L2 penalty coefficient can prevent the phenomenon of over-fitting of the model. However, L1 regularization can generate a matrix of sparse weights, that is, a sparse model, so it pays more attention to feature selection. Therefore, through comprehensive analysis, it can be seen that L2 penalty coefficient is the optimal solution for this experiment, as shown in the pie chart of training results and test results in Figure 4.

Due to the large amount of data, R is slow in calculating large matrices. In this experiment, 10% of the training set data and 90% of the test set data are randomly sampled by using the SAMPLE function of R language and the model. Matrix function is used to generate dummy variables by classifying variables into type variables. Considering the inconsistency of data prediction effect of different methods, five methods are considered to compare logistic regression model, Lasso penalty regression model, Elastice-Net (α = 0.5) penalty regression model, ridge penalty regression model, and linear discriminant analysis. When solving regression with penalty terms, considering that different parameter values have great influence on solving regression coefficient, 10-fold cross-validation is used to determine the respective values of Lasso, Elastice-Net, and Ridge regression. The 10-fold cross-validation error curve is given in Figure 5.

From Figure 6, comparison between logistic and Lasso methods: when the threshold was between 0.2 and 0.25, there was no statistical difference between lasso and logistic classifiers (α 0.05). When the threshold exceeded 0.3, there was a significant difference between the two methods. Comparison with the Elastice-Net and RIDGE methods: there was no significant difference between the Elastice-Net and Ridge methods in the 0.35 and 0.45 intervals, but there was significant difference in the remaining intervals. Comparison between Lasso and Elastice-Net methods: when the threshold is near 0.15, there is no significant difference between the two methods. When the threshold exceeds 0.2, there is a significant difference between the two methods.

In order to more comprehensively test the normality of risks in various dimensions and overall financial risks, In this paper, five normality test methods such as Shapiro–Wilk, Jarque–Bera, Cramer–von Mises, Lilliefors (KS), Shapiro–Francia were selected. In Figure 7, we can clearly see the risk of each dimension and the comprehensive evaluation score of the overall financial risk of the five normality tests. From the -value of each test value, we can see that the value of the Shapiro–Francia test is less than 0.05 in the comprehensive evaluation score of the risk of the debt dimension, and the value of other test methods is in each sequence. All values are greater than 0.05, which indicate that the score series of the comprehensive evaluation of risk in each dimension and the overall financial risk are basically nonnormal.

The experiment compares the performance of random algorithm and our algorithm in this chapter in prediction accuracy, as shown in Figure 8. From the figure, As can be seen from the figure, the method proposed in this paper has good prediction accuracy under different sample numbers.

From Figure 9, the VaR value of macroeconomic dimension risk is between [0,3], indicating that its volatility is relatively minimal. The Va R value of bank and currency dimension risk is between [0,10], indicating that its volatility is large in five dimensions. The VaR value of bubble dimension risk is between [0,7], indicating that its volatility is also small. The VaR value of external shock dimension risk is between [0,8], indicating that its fluctuation is relatively small. The VaR value of debt dimension risk is between [0, 20], indicating that its volatility is also the largest.

The minimum VaR value of macroeconomic dimension risk was 0.185 in 2003, and the maximum value was 2.802 in 2010. On the whole, VaR of macroeconomic dimension risk fluctuated little from 1993 to 2010. Specifically, VaR value decreased gradually from 1993 to 2005, while VaR value increased rapidly to 1.922 from 1995 to 1998 VaR value in the three years of 2000 was at a high value, indicating that the risk of macroeconomic environment in this period was relatively high. VaR value in the six years from 2001 to 2006 was relatively small. In addition, the VaR value in 2003 reached the minimum, indicating that the macroeconomic environmental risk in this period was relatively low, while the VaR value was relatively high from 2007 to 2010. Although it declined slowly from 2007 to 2009, it rose to the highest value in the 18 years in 2010, indicating that the macroeconomic dimension of this period is relatively risky.

How to overcome the imbalance of dichotomous variable samples? How much data are appropriate for training set and test set? If different sample sets are randomly selected to build the model, and different test sets are used to predict classification fruit, do the calculation results of evaluation indicators of logistic Lasso Elastic-Net Ridge and linear discriminant analysis have large fluctuations? The bootstrap method can be considered for testing, and boxplot can be used to plot the position of indicator quartiles, as shown in Figure 10. The training set was randomly sampled 50 times, and the estimated regression coefficients were relatively stable overall for the quad of lasso and ridge coefficients under specific weight sampling.

The accuracy rate, recall rate, accuracy, and other indicators are calculated. Each indicator in a certain range represents the performance effect of the model results. Is it possible to establish a large indicator that includes the information of the above indicators? For example, take a weighted average of the evaluation indicators, as shown in Figure 10.

5. Conclusions

With the development of network technology, the degree of data information in the Internet finance industry is more abundant. The borrower of the loan standard effectively rejects the loan application to avoid the concealment of information by the borrower (moral hazard). Credit risk in whether the borrower will be overdue again, risk early warning measurement combined with Internet information of lending data exist variable dimension high strong and weak variable information is not clearly distinguished.

After empirical analysis, the VaR values of the macroeconomic dimension risk, the bank dimension risk, the currency dimension risk, and the external shock dimension risk are still in the higher region, which indicates that China will still be in the situation of high risks in the three dimensions of macroeconomic bank, currency, and external shock in the next five years.

The financial risk index system constructed for bad debt ratio and return on equity is not the best representative.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by the National Natural Science Foundation of China (Grant Nos. U1636107 and 61972297), the Science and Technology Project of Henan Province (China) (Grant Nos. 182102210215 and 192102210288), and the Soft Science Project of Henan Province (China) (Grant no. 182400410482).