Abstract

The key to solving the problem of redundant financial indicators in addressing financial warning issues is to reduce the dimensionality of the original financial indicators. This paper proposes a model based on the whale optimization algorithm with mixed strategy (IWOA) combined with support vector machine (SVM), namely, the IWOA-SVM early warning model, which simultaneously performs index optimization and dimensionality reduction, and financial risk early warning identification. This paper takes a total of 302 enterprises specially treated in Shanghai and Shenzhen stock exchanges and normal enterprises of the same specification as the research samples to design the model. The results show that the improved whale optimization algorithm has better optimization speed and accuracy and improves the search ability of the original algorithm for the optimal solution. Compared with other dimensionality reduction methods, the IWOA-SVM model has the lowest index dimension after dimensionality reduction and has more excellent recognition effect. The dimensionality reduction results have certain universality for different classifiers, which provides a new idea for the selection of indicators for financial early warning.

1. Introduction

Since its inception in 1932, financial early warning models have undergone a transformation from univariate early warning models to multivariate early warning models, to logistic early warning models[1] and modern financial early warning models based on machine learning and integrated learning. When financial early warning models are built, they are prone to redundant financial indicators due to the richness and complexity of financial indicators. The presence of redundant financial indicators not only increases the computational effort of the model but also reduces the accuracy of its identification. Therefore, through the selection and dimensionality reduction of the original financial indicators, screening the optimal financial indicator combination is very important in the construction of the financial early warning model.

Fang and Yang [2] proposed the SGL-SVM method and applied it to the prediction of financial distress, which reduced the original 90-dimensional indicator variables to 24 dimensions, eliminating a large amount of noisy data while obtaining good identification results. Chen [3] and Fang [4] et al. used PCA to reduce the dimensionality of the indicator data and selected the top principal components with high variance contribution instead of the original indicators. In the face of high-dimensional indicators, Huang et al. [5] first performed independence tests on indicators to eliminate insignificant early warning indicators, after which they used random forest and XGBoost to calculate the importance of indicators, eliminated insignificant indicators, and used KPCA to reduce 14-dimensional financial indicators to 7-dimensional to construct a combined KPCA-WLSSVM model with high predictive power and generalization effect. Li et al. [6] selected the top ten features in terms of importance for feature optimization based on the feature importance evaluation of the random forest algorithm, and the results showed that the overall accuracy, sensitivity, specificity, and AUC of early warning model were all improved after feature optimization. Zhou et al. [7] used grey clustering to select valid variables in their study on early warning of credit risk for listed companies, followed by logistic regression models for prediction. Luo and Wang [8] used an improved MRMR algorithm in constructing a financial early warning model, taking into account both feature relevance and redundancy for feature preference. Xiaoyan et al.[9] used significance tests and normality tests to select features with significant differences as a way to improve the accuracy of financial early warning models.

Feature selection is an important tool for feature dimensionality reduction. When preferential dimensionality reduction is performed based on the correlation information of indicators, features that are favorable to the identification results may be excluded, and the interpretability of the model is relatively poor when the financial early warning model is constructed using factor analysis. Feature preference dimensionality reduction using metaheuristic algorithms has been widely used in areas such as behavioral recognition [10], network intrusion detection [11], and performance prediction [12]. For the field of financial early warning, metaheuristic algorithms are mostly used for hyperparameter optimization of classifiers [1315] and are less frequently applied in financial indicator preference dimensionality reduction. The whale optimization algorithm (WOA) is a novel metaheuristic algorithm proposed by Mirjalili and Lewis [16], which, like other metaheuristics, suffers from the problem of low accuracy in finding the optimal solution and how to improve the algorithm’s ability to search for the optimal solution has received much attention from scientists [17, 18]. Kaur and Arora [19]introduced a chaos mechanism to optimize the initial position of the population. Introducing adaptive weights [20] or improving the convergence factor [21] for the algorithm can balance the search ability between the early and late stages of the algorithm. Therefore, this paper uses improved whale optimization algorithm with mixed strategy (IWOA) and support vector machine (SVM) to preferentially reduce the dimensionality of the original financial indicators and construct an IWOA-SVM financial warning model.

2. IWOA-SVM Financial Early Warning Model Construction

2.1. Whale Optimization Algorithm

The whale optimization algorithm is a new intelligent optimization algorithm that has been proposed based on the special feeding behavior of humpback whales, which consists of three main parts: encircling prey, bubble-net attack, and searching for prey.

2.1.1. Encircling Prey

During the whale feeding process, the location of the prey is first observed and searched to surround it. The optimal solution to the problem is predetermined to be the location of the prey, and during the iteration of the algorithm, the fitness value calculated by the fitness function is used to evaluate the merit of each group of financial indicators. Therefore, the design of the fitness function is crucial. In order to take into account the smallest possible dimension of the financial indicators while having a high accuracy rate, the fitness function in this paper is

Among them, represents the correct rate of five-fold cross-validation of each financial indicator feature combination in the SVM classifier, is the dimension of financial indicator features included in randomly selected individuals, and represents the total dimension of financial indicator features.

In the iteration process of the algorithm, the position of the individual with the optimal fitness value of the current population is taken as the optimal position, and other individuals are close to the optimal position. The mathematical expression iswhere denotes the position vector of the optimal individual of the current population, denotes the position vector of each individual in the current population, and denotes the current number of iterations, the expressions of A and C are as follows.where is the convergence factor, and are random numbers within [0, 1], and denotes the maximum number of iterations.

2.1.2. Bubble-Net Attack

This phase consists of two types of position update, contraction bracketing, and spiral: first, contraction bracketing, in which the position is updated by adjusting the convergence factor in equation (3); and second, spiral position update, which simulates a whale spiraling up to hunt close to its prey, with the mathematical expression:where denotes the distance between an individual whale and the current optimal solution, denotes a constant in the shape of the spiral, and denotes a random number in the range [−1, 1]. At this stage, the individual position of the whale is updated by randomly selecting contraction encirclement and spiral contraction. The mathematical expression iswhere p represents a random number in the range [0, 1].

2.1.3. Searching for Prey

Individual whales search for prey by randomly swimming away. The random search strategy allows the algorithm to have a good global search performance and randomly selects whale individuals in the population to update their position when  ≥ 1. The mathematical expression iswhere denotes a randomly selected individual whale.

2.2. Improvements to the Whale Optimization Algorithm

As the whale optimization algorithm has the problems of poor global search ability, easily falling into local optimal solutions, and poor convergence accuracy in the computation process, this paper uses Gauss mapping to initialize the population to improve population diversity, introduces adaptive weights to balance global and local search ability, and uses a stochastic dimension-by-dimension variation strategy based on Cauchy mutation and reverse learning to improve the ability to jump out of local optima with a hybrid strategy to improve the whale optimization algorithm.

2.2.1. Gauss Mapping to Initialize the Population

The original whale optimization algorithm initializes the population in a random way, that is, it randomly generates a combination of financial indicators. In order to expand the population search range and improve the population diversity, this paper adopts Gauss mapping to initialize the population, which is more traversable and more uniformly distributed compared to the original algorithm, and the mathematical expression of Gauss mapping is

Among them, mod represents the remainder function, [ ] represents rounding, and represents the chaotic sequence generated by Gauss mapping.

2.2.2. Adaptive Weights

Adaptive weights can effectively balance global and local search capabilities, for which an adaptive weighting factor is introduced with the following mathematical expression.where is the current number of iterations and is the maximum number of iterations.

The equation for updating the location of individual whales after the introduction of adaptive weights is as follows.

2.2.3. Stochastic Dimension-by-Dimension Variation Strategy Based on Cauchy Mutation and Reverse Learning Cauchy Mutation

The Cauchy distribution, similar to the normal distribution, is one of the common distributions in the probability theory and is characterized by a uniform distribution of variances due to its slow decline from peak to zero values, the mathematical expression of which is

Reverse learning improves the search performance of an algorithm by solving the current solution backward in the same space and is widely used in various optimization algorithms with the mathematical formulation of

When only one variation strategy is selected in the algorithm, it will lead the algorithm into the local optimum problem, and the traditional variation approach mostly uses random variation or variation in all dimensions, while the dimension-by-dimension variation approach can avoid the influence between dimensions and fully explore the optimal solution; therefore, this paper uses a random dimension-by-dimension variation strategy based on Cauchy mutation and reverse learning to perturb the optimal individual.

Since the new solution generated is not necessarily better than the optimal position, a greedy rule is used to decide whether to adopt the new solution.

2.3. IWOA-SVM Financial Early Warning Model Construction

Due to a large number of linear and nonlinear complex relationships between the financial indicators of an enterprise, when constructing a financial early warning model, low-quality financial indicator data can lead to feature confounding and high model computation, reducing the financial early warning capability of the model. Therefore, it is necessary to reduce the dimensionality of financial indicators and select the optimal combination of financial indicators with good predictive power. The IWOA-SVM financial early warning model constructed in this paper improves the whale optimization algorithm through mixed strategies and improves the search ability for the optimal solution, that is, the combination of financial indicators with the best predictive ability, to improve the model recognition effect. The flow chart of the IWOA-SVM model is as follows Figure 1.

The key steps in the IWOA-SVM financial early warning model are as follows:(1)Data standardisation: due to the different magnitudes of the financial indicator values and the fact that SVM is typically a classification algorithm based on a distance metric, to avoid impact on the model performance, the financial indicator data is normalized by the formula as follows:where is the mean and is the standard deviation of the data.(2)Initializing the population. The number of IWOA algorithm populations is set to 20 and the maximum number of iterations is 50, i.e., 20 sets of financial indicator combinations are generated at each iteration using the Gauss mapping strategy.(3)Calculating the fitness value. Based on the fitness function, the fitness values of all current combinations of financial indicators are calculated and the individual with the lowest fitness value is taken as the current optimal solution.(4)Location update. When the probability and |A| < 1, the position is updated according to equation (9), if |A| ≥ 1, the position is updated according to equation (10); when the probability p≥0.5, the position is updated by equation (11).(5)Optimal position perturbation. The optimal solution is optimized using a stochastic dimension-by-dimension variation strategy based on Cauchy mutation and reverse learning. If the fitness value of the new solution is better than the optimal solution before optimization, the current optimal combination of financial indicators and fitness values are retained and discarded if not.(6)Iterative operations. Determine whether the maximum number of iterations has been reached, and if the condition is met, output the optimal combination of financial indicators and the optimal fitness value, if not, repeat steps 3–5 to continue the search.(7)Output SVM algorithm classification results based on the optimal combination of financial indicators.

3. Empirical Analysis

3.1. Sample Selection and Data Sources

The article selects A-share listed companies in Shanghai and Shenzhen as the empirical sample and classifies the empirical sample into normal companies and ST companies by taking whether the listed companies are specially treated (ST) during their existence as the criterion. For normal companies, continuous financial data are obtained for the years 2019–2021; for ST companies, the year in which the main body of the ST company is specially treated is recorded as year t, and continuous financial data are obtained for the previous two years. In order to eliminate the possible adverse effects of imbalanced categorical data on the model, after excluding companies with serious data deficiencies, the positive and negative samples were matched according to a 1 : 1 ratio, using the rule that companies belonging to the same or similar industries and the overall size of the companies were comparable. The final selection of 151 ST companies and 151 normal companies yielded a total of 302 valid samples and 906 sets of observations. All data were sourced from the CSMAR database.

3.2. Selection of Indicators

When constructing financial early warning models for listed companies, it is common to start with financial indicators that reflect the operating status of the company. In this paper, a total of 32 financial indicators are selected from four aspects: solvency, profitability, operational capability, and development capability, specifically five indicators are selected from solvency. In order, they are current ratio (X1), quick ratio (X2), gearing ratio (X3), equity multiplier (X4), and equity ratio (X5); 12 indicators were selected in terms of profitability, in order, they are return on assets (X6), net profit margin on total assets (X7), net profit margin on current assets (X8), net profit margin on fixed assets (X9), EBITDA (X10), earnings before interest, tax, depreciation and amortization (X11), gross operating margin (X12), operating profit margin (X13), net operating margin (X14), management expense ratio (X15), financial expense ratio (X16), and cost margin (X17); 11 indicators were selected from the operating capacity, in order of accounts receivable to revenue ratio (X18), inventory to revenue ratio (X19), operating cycle (X20), accounts payable turnover (%) (X21), current assets to revenue (X22), current assets turnover (X23), fixed assets to revenue (X24), fixed assets turnover (X25), noncurrent assets turnover (X26), capital intensity (X27), and total assets turnover (X28); and four indicators were selected in terms of growth capacity, in the following order: growth rate of total operating revenue (X29), growth rate of total operating costs (X30), growth rate of administrative expenses (X31), and sustainable growth rate (X32). Considering that nonfinancial indicators are also important in risk identification and early warning, this paper selects two nonfinancial indicators at the level of management's governance capacity, namely, the proportion of management men (X33) and the average age of management (X34) in that order. The finalized corporate financial early warning model contains a total of 34 indicators with a large number of features, which should be subject to dimensionality reduction.

3.3. IWOA-SVM Model Identification Results

The article divides the training and test sets in the ratio of 6 : 4, sets the initial population of the whale optimization algorithm to 20 and the number of iterations to 50, and runs the WOA-SVM before improvement and the IWOA-SVM model after improvement twenty times, respectively, under the same hardware conditions, and the number of financial indicators, the number of convergence generations, and accuracy curves are shown in Figure 2 after ranking them according to the accuracy rate from smallest to largest, and the distribution is shown in Table 1. After comparison, it can be seen that compared with the WOA-SVM model, the IWOA-SVM model has different effects in terms of convergence speed, convergence accuracy, and reducing the dimension of financial indicators. As can be seen from Figure 2, the IWOA-SVM model largely outperforms the WOA-SVM model in terms of the number of generations of convergence and the number of financial indicators when ranked from small to large in terms of accuracy. As can be seen from Table 1, after the improvement of the mixed strategy, the average number of convergence generations was reduced from 12 to 5.15, a reduction of 6.85; the average number of financial indicator dimensions was reduced from 14.90 to 10.95, a reduction of 3.95; meanwhile, the average accuracy of IWOA-SVM model identification was increased from 84.50% to 86.74%, an increase of 2.24%, and the highest accuracy was increased from 86.23% to 87.60%, an increase of 1.37%. It can be seen that the convergence speed and optimization-seeking ability of the mixed strategy improved whale optimization algorithm proposed in this paper are significantly improved over the original algorithm, which can eliminate redundant indicators while having good accuracy and can meet the financial warning needs of this paper.

The IWOA-SVM model identified the highest accuracy of 87.60%, while the financial indicators with an accuracy of 87.60% were obtained from two groups. The first group includes current ratio (X1), quick ratio (X2), equity multiplier (X4), equity ratio (X5), total assets net profit margin (X7), EBITDA (X11), average age of management (X34), current assets to revenue ratio (X22), fixed assets to revenue ratio (X24), noncurrent assets turnover ratio (X26), and capital intensity (X27), a total of 11 financial indicators; the second group includes current ratio (X1), equity multiplier (X4), total assets net profit margin (X7), EBITDA (X10), EBITDA (X11), cost margin (X17), the average age of management (X34), accounts receivable to revenue (X18), inventories to revenue (X19), fixed assets to revenue (X24), noncurrent asset turnover (X26), total asset turnover (X28), and sustainable growth rate (X32), a total of 13 financial indicators.

The identification accuracy of the two sets of financial indicators, the error rate of the first category, and the error rate of the second category are shown in Table 2. The first type of error rate (error I) is the proportion that the model identifies ST samples as normal samples, and the second type of error rate (error II) is the proportion that the model identifies normal samples as ST samples. In consideration of the principle of accounting prudence, error I should be avoided as far as possible, and a high error rate of St samples will lose the practical significance of financial early warning. Comparing the identification results of the first group and the second group, the second group had 27 ST samples predicted as non-ST, the first group had 25 ST samples predicted as non-ST, and the first group had a lower error I, better early warning effect. Therefore, the combination of financial indicators of the first group is selected as the optimal combination for financial early warning in this paper. A total of 156 ST samples, 162 non-ST samples were correctly identified, 25 ST samples were predicted as non-ST, and 20 non-ST samples were predicted as ST, with an error rate of 13.81% for the error I and 10.99 for the error II. The meanings of the financial indicators in the first group are shown in Table 3.

3.4. Comparison of Different Dimensionality Reduction Methods

In order to compare the effect of model dimensionality reduction, this paper selected principle component analysis (PCA), mutual information and maximal information coefficient (MIC), recursive feature elimination (RFE), and XGBoost importance ranking four kinds of dimensionality reduction, with the increase of the selected dimensionality SVM recognition accuracy changes as shown in Figure 3, respectively, and selected the combination of indicators with the highest recognition rate as the corresponding optimal combination; recognition results are shown in Table 4. It can be seen from Table 4 that, among the four dimensionality reduction methods, the XGBoost importance ranking method selects the smallest combination dimension of financial indicators and has the lowest type two error rate, but the type one error rate is also at the highest level. In contrast, the IWOA-SVM model constructed in this paper improves the accuracy by 4.68%, reduces the type one error rate by 7.18%, and reduces the type two error rate by 2.2% when the index dimension is reduced by 1 dimension. The mic dimensionality reduction model has the highest accuracy and the lowest type one error rate, but the accuracy is 4.13% lower than the IWOA-SVM model, the type one error rate is 4.97% higher, and the type two error rate is 3.3% higher. However, the feature dimension of the model is as high as 25 dimensions, and the dimensionality reduction effect is not obvious. In summary, the IWOA-SVM model constructed in this paper can achieve optimal identification results with a minimum number of financial indicators.

3.5. Algorithm Performance Comparison

To compare the performance of the IWOA-SVM algorithm, salp swarm algorithm (SSA), particle swarm optimization (PSO), flower pollination algorithm (FPA), and grey wolf optimizer (GWO) were selected. Under the same hardware conditions, keep the same parameter settings as IWOA-SVM. After running 20 times, the maximum and average values of the recognition accuracy are shown in Figure 4. It is easy to find that the IWOA-SVM model in this paper has the highest recognition accuracy and average accuracy, and the model performance is better than other metaheuristic algorithms.

3.6. Comparison of Universality of Dimension Reduction Effect

In order to verify the algebra liability and identification of the preferred dimensionality reduction effect of the IWOA-SVM financial warning model, five financial warning models based on logistic regression (LG), random forest (RF), K-nearest neighbor (KNN), decision tree (DT), and SVM were constructed simultaneously in this paper. Identify the original financial indicator combination and the optimal financial indicator combination obtained in this paper. The identification results are shown in Table 5. It can be seen from the table that the logistic regression, random forest, K-nearest neighbor, and decision tree models are obtained using this paper. After the optimal combination of financial indicators, the model recognition accuracy increased by 4.41%, 2.75%, 4.13%, and 2.2%, respectively, and the first type error rate decreased by 1.1%, 2.21%, 6.08%, and 1.66%, respectively. Compared with the SVM model for identifying the original data set, the IWOA-SVM model has an accuracy improvement of 6.88% to 87.60%, and the first type error rate decreased by 5.53% to 13.81%, and the second type error rate decreased by 8.24% is 10.99%. Collectively, it appears that the IWOA-SVM financial warning model constructed in this paper has the highest accuracy rate, the lowest type one error rate, and the best model identification effect.

4. Discussion and Conclusions

To address the issue of indicator selection in the construction of financial early warning models, this paper proposes the IWOA-SVM financial early warning model. A sample of 151 ST companies and 151 normal companies listed on A-shares in Shanghai and Shenzhen in 2019–2021 was used for the empirical analysis, and a mixed strategy is used to improve the original whale optimization algorithm for the problems of poor merit-seeking ability and slow convergence accuracy. The empirical analysis shows that first, the IWOA-SVM algorithm reduced the original 34-dimensional metrics to 11 dimensions, with a model recognition accuracy of 87.60%, an improvement of 6.88%, and a type one error rate of 13.81%, a reduction of 5.53%, the dimensionality reduction and recognition better than the four dimensionality reduction methods such as PCA. Second, compared to the original whale algorithm, the improved whale optimization algorithm with mixed strategy reduced the average financial indicator dimension by 3.95 and the average number of convergence generations by 6.85, the highest accuracy was improved by 1.37%, indicating the algorithm's convergence speed and optimization-seeking ability was enhanced. Thirdly, using logistic regression, random forest, K-nearest neighbor, and decision tree to identify the combination of indicators before and after dimensional reduction and optimization, the accuracy rate has improved to different degrees, indicating that the method of dimensional reduction and optimization of indicators has certain universality for different classifiers. Taken together, the IWOA-SVM financial early warning model proposed in this paper has the highest accuracy rate and the lowest type I error rate, providing a new way of thinking about the selection of indicators for the construction of financial early warning models, which is an accurate, efficient, and scientific financial early warning model.

Data Availability

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Authors’ Contributions

B. L., Q. H., and Z. W. conceptualized the study; B. L., M.D., and X. L. contributed to methodology; B. L., M.D., and Z. W. provided software; B. L. and M.D. contributed to data curation. All authors have read and agreed to the published version of the manuscript.

Acknowledgments

This research was funded by the Hebei Province Social Science Fund Project (grant no. HB20GL039).