A Financial Risk Early Warning of Listed Companies Based on PCA and BP Neural Network
Financial risk, as one of the most influential and destructive risks in business, will make enterprises unable to escape the fate of bankruptcy if not warned and prevented in time. In the paper, we conducted research on the financial risk early warning of listed companies. A total of 250 companies were randomly selected from the Chinese A-share market from 2019 to 2021. By building the 26 financial indicators of listed companies and constructing the PCA-BP neural network, we compared the financial risk early warning effects among PCA-BPNN, SVM, and Logistic. It is found that the financial data processed by PCA can better adapt to the financial risk early warning model. The PCA-BPNN model improved the prediction accuracy of the financial risk early warning, which has strong generalization ability for the prediction of financial risk. Research findings have certain reference significance for precise judgment on the financial risk of companies.
The good operation of the capital market cannot be done without open, transparent, and true information disclosure. Financial information is an important basis for the business activities and future development of enterprises. The quality of financial information will also have a great influence on the operation and development of enterprises. Therefore, enterprises should pay attention to the quality of financial information and strengthen the management of financial information through effective means. In the complex market economy environment, every decision of the manager can be regarded as a turning point in the operation of the enterprise. At present, due to the downturn in the global economic market and the intensified competition among countries, the market volatility is particularly severe. The uncertainty of the macro environment has reduced the financial growth ability of enterprises . In addition, because of the influence of comprehensive factors such as the epidemic, the business and financial risks faced by enterprises are also increasing simultaneously . Because of this, once a company has financial problems, it may whitewash the financial data, and the result of accumulating financial risks will cause the company to eventually go bankrupt. According to a report from the China Securities Regulatory Commission , financial fraud has become a “hardest hit area” in 20 typical illegal cases in 2021. These companies have inflated huge income through fictitious purchases and sales , compiled false financial book sets, etc., and provided users with wrong financial statements. Wrong market information not only deceived investors but also made financial information users make wrong judgments about the national macroeconomic situation, which seriously affected the normal social and economic order.
Therefore, reducing the financial risk in the operation of enterprises is a crucial basis to ensure their stable development. However, enterprise financial risk management is a continuous, loop, and dynamic process throughout the enterprise decision-making and management of each link. Figure 1 shows the enterprise financial risk management framework, including the source of enterprise financial risk, financial risk early warning, financial risk assessment, financial risk response, supervision, and inspection. The wrong identification of risk will lead to the subsequent assessment and response plan being wasted, so the early warning and identification of financial risk is the most important link.
The purpose of this paper is to build an effective financial risk early warning system, extract valuable financial information, and use the model proposed in this paper to identify enterprises with potential financial risks. Through empirical analysis and comparison with existing models, the effectiveness of our proposed model is proven.
2. Related Work
Financial risk mainly refers to the loss of an enterprise due to various risk factors, or the uncertainty of its operating efficiency . Serious financial risk will have a deep impact on the development direction of an enterprise and the realization of its development goals. From the point of view of the current business situation, a large amount of information and data related to finance will be generated every day, and the generation of this information and data provides the possibility for financial risk prediction. However, in fact, in order to prevent delisting due to continuous losses, enterprises in a financial crisis have the motivation to whitewash their financial data, so the financial statements of most listed companies with potential financial risks are often suspected to be false [5–7]. The more serious the deterioration of the financial condition of listed companies, the stronger the motivation to manipulate financial data. The identification of the quality of these financial data requires the establishment of effective mechanisms. Financial indicators are the core basis of an enterprise’s financial risk early warning indicator system , and the authenticity of financial data is a key factor in determining the predictive ability of an enterprise’s financial risk early warning model. Therefore, the construction of an index system and an early warning model should comprehensively consider financial indicators and their data quality .
With the in-depth study of machine learning methods, more and more scholars apply related methods in machine learning to financial risk management. Qi et al. used the random forest method to analyze Internet finance and try to find the factors that affect the risk of Internet finance . Liu et al. build a data-driven approach calibrated neural network to enable artificial neural networks to calibrate financial asset price . Wu and Wu used genetic algorithms, neural network, and PCA to process data and build a risk assessment model for extracting relevant hidden information from financial enterprises . Wang builds a supply chain financial system through blockchain technology, uses fuzzy neural network algorithms for financial data processing and risk assessment . Tavana et al. used ANN and Bayesian network to measure liquidity risk . Du et al. used the BP neural network algorithm to measure Internet credit risk early warning .
In the study field of financial early warning, Huang et al. used the logistic regression method to predict and supervise the systemic risks of China’s financial system, trying to find ways to improve the financial status indicators to measure the degree of fiscal austerity . Since the financial risk of a business is a complex and gradual process, its reasons may be manifold. Businesses face financial risks or difficulties, and there are more and more bankruptcy liquidations. Financial risks have seriously affected businesses and society.
The era of big data has spawned various algorithms for financial data. Based on deep learning algorithms, Cao et al. established a financial early warning model and conducted research from the perspective of building a financial risk early warning mechanism for e-commerce companies . Wei takes manufacturing companies that issue bonds as samples to build a financial early warning model based on decision tree integration, which can effectively improve the correct identification rate of companies in a financial crisis . A big data mining method based on PSO-BPNN for financial risk management of IoT deployment in commercial banks by Zhou and Zhou et al. [19, 20], The method utilizes Apache Spark and Hadoop HDFS technology to build a Nonlinear Parallel Optimization Model. Huang et al. compared the application effects of several common neural network models in the Chinese SME data set to explore enterprise credit risk factors . Shen et al. used an ensemble model based on synthetic minority oversampling technique (SMOTE) and classifier optimization technique for firm’s credit risk assessment .
In view of the logical relationship between the financial work in the enterprise, the early-warning model can provide more financial decisions and data support for the managers , locate the front-end business problems in advance, and avoid the expansion of the financial crisis in the enterprise and bring greater losses to the company [24, 25]. The existing literature has done some research on enterprise financial risk, but the research on corporate financial risk early warning is not enough. Considering the possibility of manipulation of financial data of listed enterprises and the multicollinearity of data, direct use of these data will inevitably increase noise, which will not only bring dimensional disaster to the operation but also affect the accuracy of financial prediction of enterprises. This paper introduces PCA method into the evaluation of financial data quality, constructs financial indicators and extracts principal components of financial data, introduces BP neural network model, and adopts hybrid PCA-BP neural network model to study the financial early warning of listed companies.
3. Model Design and Data Processing
3.1. Risk Indicator Selection
The construction of the financial risk early warning system will realize the monitoring of the operation process of the enterprise and the activities of the business process . Through the analysis of early warning indicators, we can get an in-depth understanding of the current business situation of the enterprise. Therefore, in the construction of a financial risk early warning model, it is necessary to select the financial indicators that can reflect the running status of enterprises as explanatory variables. In the preliminary selection of indicators, these indicators can not only comprehensively reflect the operational state of the enterprise but also reflect the financial state of the enterprise. Moreover, all of those selected indicators must be accessible with the corresponding financial data. In addition, much of the corporate financial risk early warning literature often chooses financial ratio indicators as explanatory variables. Referring to the existing research literature [27–29], this paper builds 26 secondary financial indicators from the five aspects of profitability, solvency, development capability, operating ability, and cash flow to establish an enterprise financial risk early warning indicator system.
The analysis is carried out on the basis of the five main indicators. In view of the following reasons:(1)Profitability: Profitability refers to the ability of an enterprise to obtain profits. Profit is a long-term and stable source of funds for an enterprise, and good profitability can help an enterprise better defend against financial risks. Indicators such as return on equity, return on assets, and operating profit ratio are all representative indicators of profitability.(2)Solvency: Solvency refers to the ability of an enterprise to repay its debts. Low solvency can easily make an enterprise fall into financial crisis or even go bankrupt. Therefore, the measure of solvency directly reveals the size of the financial risk of the enterprise. Indicators such as current ratio and quick ratio reflect the company’s ability to repay debts through the realization of assets.(3)Development Capability: Development capability refers to the development trend and development potential of an enterprise’s future production and operation activities. It is usually evaluated from the perspective of the company’s financial status and operating results. The main representative indicators are the net profit margin ratio, net asset growth rate, and so on.(4)Operational ability: Operational ability refers to the ability of an enterprise to manage and use inventory, accounts receivable, fixed assets, total assets, and other assets for turnover operations to earn profits. The unbalanced financial status of enterprises due to poor operational management will increase the financial risks faced by enterprises.(5)Cash flow: Cash is the most liquid asset among all the assets of an enterprise, and it is also the basis for an enterprise to maintain its daily operating and production activities. The shortage of cash flow will directly affect the production and operation of the enterprise and easily lead to a financial crisis.
The selected detailed financial indicators are defined as shown in Table 1.
3.2. Data Processing
Due to the different nature of the indicators, normalization is performed before the indicator data is input into the BP neural network model to unify the parameter range for subsequent modeling and analysis. The specific operation formula is as follows:
In the formula, is the financial profitability indicator of the listed company; is the minimum value of the financial company’s risk indicator data. is the maximum value of the risk index data of the financial company.
3.3. Principal Component Analysis
Financial indicators include the items in Table 1. While the variable indicators are strictly selected, the independence of each input variable cannot be guaranteed. PCA is used to reduce the dimension of the input variables, the new financial variables after the dimension reduction are used to replace the original multiple variables, which can basically contain all the contents of the original input variables. The process is as follows. Step 1: Calculate the covariance matrix of the original input financial variable data set, assuming that there are historical load data in the original input variable data set , and each data has influencing factors, that is , where . Then the covariance matrix of the input variable dataset can be expressed as , is a natural number; is the average of , where . Step 2: Calculate the eigenvalues of the covariance matrix and the corresponding orthogonalized eigenvector , the principal component of the original variable is , In the formula, Step 3: Determine the number of principal components , then the variance contribution rate and the cumulative variance contribution rate are respectively
Calculate the principal component load factor as , and then calculate the score of each influencing factor on principal components aswhere is the principal component matrix; is the eigenvector corresponding to the eigenvalues of the covariance matrix. Usually, when the cumulative variance contribution rate reaches 90%, the corresponding first principal components can contain most of the information in the original load impact data set; therefore, the number of principal components is
3.4. BP Neural Network
The BP neural network is trained by the error backpropagation algorithm, which is the most widely used neural network model, The structure diagram of BPNN is shown in Figure 2.
A single hidden layer BP neural network consists of an input layer, a hidden layer and an output layer. Let the training data set be .
The input layer has neuron nodes, and the input vector is a dimension vector considering the bias, that is, , where . The hidden layer contains nodes, Represents the weight between the neurons and connecting the hidden layer, . is the activation function of the hidden layer neurons.is the output of the hidden layer. The output layer contains nodes, and the input vector of the output layer is , represents the weight between the neurons and connecting the output layer, . represents the activation function of the output layer,
The error backpropagation algorithm used to train the above model consists of two processes of forward propagation of the input signal and backpropagation of the error signal. Let the training data set at step bewhere (1)Signal forward propagation process. The input signal is first input to the input layer, then passes through the hidden layer, and finally reaches the output layer. Calculate the input and output signals of the hidden layer and the output layer respectively, set the activation function of the neurons in the hidden layer, and the activation function of the output layer, then: The input vector of the hidden layer: The input signal of the hidden layer neuron : The output signal of the hidden layer neuron : The input vector of the output layer is The input signal of the output layer neuron : The output signal of the output layer neuron :(2)Error backpropagation process. From the output layer, the error of each layer is calculated in reverse layer by layer, and the weight of each layer is updated according to the gradient descent method, so that the actual output of the network is as close to the expected output as possible. First let represents the expected output of neurons in the output layer, and represents the actual output of neurons in the output layer, then the difference for each training sample is:
We can summarize the training process of the error backpropagation algorithm as follows. Step 1: Initialize the network weight to a small random number, . Step 2: Enter the training data for step :(a)Forward propagation of the signal, calculating the output of each neuron layer by layer(b)Error backpropagation, adjusting the weights of the output layer and the hidden layer Step 3: Repeat step 2 until the error of the training data set is less than the preset threshold or reaches the maximum number of iterations.
3.5. Evaluation Indicators
In order to evaluate the prediction accuracy of the prediction model proposed in this paper, the root mean squared error (RMSE) and the confusion matrix are selected as evaluation indicators. The calculation formula is:where is the predicted value; is the real value
Confusion Matrix is a two-way matrix where one axis shows the distribution of the actual classes and the other axis shows the predicted classes. The schematic diagram of confusion matrix is shown in Table 2.
The accuracy of the model is calculated based on the diagonal elements of the classification matrix. Because they represent the correct classification made by the classification model, the following formula shows the calculation of model accuracy.
The flow chart of the financial early warning mechanism can be obtained as shown in Figure 3. It is mainly divided into three stages. Phase 1: Randomly select the financial data of China A-share listed companies marked with and without ST in the past three years from 2019 to 2021; all of those financial data are from the EPS China Data, involving a total of 250 companies. When there are missing values in the same company and the same financial variable, replace them with the average value of the past three years. If there is no data in the past three years, delete the variable and other financial indicators of the company. Companies whose shares are marked with ST are identified as having potential financial problems. After standardizing the data, PCA dimensionality reduction is performed to obtain the principal component financial data with a variance contribution rate of 90%, and 2/3 of the data is randomly selected as training data, and the remaining 1/3 is used as test data. Phase 2: Input the training data into the BP neural network for modeling and analysis. The parameters of BP-NN are obtained by selecting different numbers of hidden layers. The optimal number of neurons in the hidden layer of the BP model is selected by the method of minimizing the error. The BP-NN model should repeatedly train the dataset to obtain the optimal parameters in case the obtained parameters are not globally optimal. Phase 3: Observe the performance of the financial data with and without PCA dimension reduction in the SVM, Logistic, and BP neural network models respectively, and record the respective prediction value on confusion matrices. Calculate and compare the correct classification accuracy of predicted corporate financial warnings and finally draw conclusions.
4. Empirical and Result Analysis
Data processing is carried out in the manner of stage one. The company is assigned the label of “Yes” according to the listed company’s being marked with ST as the dependent variable, and the normal company without potential financial risk is assigned the label of “No.” In order to improve the quality of data and increase the prediction accuracy of the financial risk early warning model, it is necessary to use PCA to extract principal components from data. After normalization the financial data, the information of each principal component and its cumulative variance contribution rate are get as shown in Table 3.
It can be seen from Table 3 that the cumulative variance contribution rate of the first 13 principal components reaches 90%, indicating that the first 13 principal components basically contain all the information of the original data, and the last 13 principal components can be ignored as noise, so we selected The first 13 principal components replace the original input data for neural network training. Before PCA processing, there were 26 input variables, and after PCA processing, there are only 13. In the case of ensuring the maximum retention of information, the dimension of the data is reduced, thereby improving the efficiency of the algorithm. The 13 principal components were analyzed by the BP neural network. By setting different numbers of neurons, different measurement results are obtained, as shown in Table 4.
According to Table 4 and Figure 4, it can be concluded that when the number of hidden neurons is 15, the training error and test error are relatively low, and the neural network fits well. This is selected as the final fitting result. Through the confusion matrix of SVM, Logistic and BP neural networks, respectively. The prediction results are shown in Table 5.
According to the results in Table 5, we can calculate the training and testing accuracy of various models with and without PCA through formula (13), and the results are shown in Table 6. From the results in Tables 5 and 6, we can clearly see that without PCA preprocessing, SVM can have very good fitting results on the training data, and its accuracy is the highest, which represents very serious overfitting, and followed by the neural network model. However, for the test data, the SVM model prediction results are not as good as logistic, but the most accurate prediction rate is the neural network model. This indicates that the prediction ability of SVM for financial risk is not good after the serious overfitting phenomenon. After the data is processed by PCA, the SVM model does not have any substantial improvement in the performance of the financial risk warning. The prediction of the training data is as high as 99.7%, and the prediction of the test data is still low. It is further shown that SVM is easy to cause overfitting. The performance of the logistic model is not as good as that of the SVM, but in general, the PCA-BPNN performs the best. Not only did the accuracy of training data achieve 95.1%, but the test accuracy was also as high as 86.6%. It shows that its generalization ability isgood, and PCA-BPNN can accurately capture financial risk problems of listed companies.
Financial data manipulation exists in listed companies, and the distortion of financial information reduces the prediction accuracy of the financial risk early warning model. PCA is a data dimensionality reduction method, which can filter out excess noise data. In this paper, PCA is introduced into financial information data and a PCA-BPNN model is constructed for the study of financial risk early warning of listed companies. Taking 250 Chinese A-share listed companies from 2019 to 2021 as research samples, this paper empirically analyzes the impact of the PCA-BPNN model on corporate financial risk. The fitting effect and prediction accuracy are compared with SVM and logistic, respectively. The conclusions of this paper are as follows: First, the preprocessing of financial data with the PCA method can significantly improve the prediction accuracy of BP-NN financial early warning model. Second, the PCA-BPNN model has significant financial risk generalization ability. In the process of identifying enterprises with financial problems, the prediction accuracy of the PCA-BPNN model is significantly higher than that of the SVM and logistic models. This study provides new ideas for financial risk early warning, and relevant research findings have certain reference significance for regulators, financial institutions, and market investors; that is, when predicting corporate financial risks, PCA can be combined with BP-NN. The model PCA-BPNN is used in financial risk early warning to make more accurate judgments. Though our model has much better performance than other models, there is still room for improvement. First of all, the establishment of indicators is a dynamic process. It is worth thinking about how to more accurately describe risk indicators. Secondly, the selection of principal components by the PCA is often judged based on personal experience. How to reduce the dimensionality of the data and retain the original characteristics as much as possible is also a research direction.
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest.
This research was supported by the National Social Science Fund General Project of China (21BJY002), the Humanities and Social Sciences Research Project of the Ministry of Education of China (20YJA790097), and Philosophy and Social Sciences Major Tender Project in Zhejiang Province of China (20XXJC03ZD).
The Securities Regulatory Commission Inspected 20 Typical Illegal Cases in 2021, China securities regulatory commission, Beijing China, 2021, http://www.csrc.gov.cn/csrc/c100028/c2265190/content.shtml.
A. R. Razali and I. M. Tahir, “Review of the literature on enterprise risk management[J],” Business Management Dynamics, vol. 1, no. 5, p. 8, 2011.View at: Google Scholar
W. Weng, X. Zhu, N. Wang, and B Peng, “Analysis on the Whitewashing Motivation of Financial Statement of Listed Real Estate Companies in China,” in Proceedings of the 2018 International Conference on Education Science and Social Development (ESSD 2018), pp. 193–195, Atlantis Press, Paris, July 2018.View at: Google Scholar
E. Okoye and E. N. Ndah, “Forensic accounting and fraud prevention in manufacturing companies in Nigeria[J],” International Journal of Innovative Finance and Economics Research, vol. 7, no. 1, pp. 107–116, 2019.View at: Google Scholar
S. Liu, A. Borovykh, L. A. Grzelak, and C. W Oosterlee, “A neural network-based framework for financial model calibration,” Journal of Mathematics in Industry, vol. 9, no. 1, pp. 1–28, 2019.View at: Google Scholar
P. Prasetyawan, I. Ahmad, R. I. Borman, and A. P Yogi, “Classification of the Period Undergraduate Study Using Back-Propagation Neural Network,” in Proceedings of the 2018 International Conference on Applied Engineering (ICAE), pp. 1–5, IEEE, Batam, Indonesia, October 2018.View at: Publisher Site | Google Scholar
F. Shen, X. Zhao, Z. Li, K. Li, and Z Meng, “A novel ensemble classification model based on neural networks and a classifier optimisation technique for imbalanced credit risk evaluation,” Physica A: Statistical Mechanics and Its Applications, vol. 526, Article ID 121073, 2019.View at: Publisher Site | Google Scholar
S. K. Onsongo, S. M. A. Muathe, and L. W. Mwangi, “Financial risk and financial performance: evidence and insights from commercial and services listed companies in nairobi Securities exchange, Kenya,” International Journal of Financial Studies, vol. 8, no. 3, p. 51, 2020.View at: Publisher Site | Google Scholar
V. Verbraak-Kolevska, “I,” nternal Audit Effectiveness: Factors Influencing Management to Listen to the Internal Auditor’s Risk Warnings: A Multi-Method Study from a Behavioural Decision Making perspective, Erasmus University Rotterdam, Rotterdam, Netherlands, 2018.View at: Google Scholar