Abstract

In order to extract more information that affects customer arrears behavior, the feature extraction method is used to extend the low-dimensional features to the high-dimensional features for the warning problem of user arrears risk model of electric charge recovery (ECR). However, there are many irrelevant or redundant features in data, which affect prediction accuracy. In order to reduce the dimension of the feature and improve the prediction result, an improved hybrid feature selection algorithm is proposed, integrating nonlinear inertia weight binary particle swarm optimization with shrinking encircling and exploration mechanism (NBPSOSEE) with sequential backward selection (SBS), namely, NBPSOSEE-SBS, for selecting the optimal feature subset. NBPSOSEE-SBS can not only effectively reduce the redundant or irrelevant features from the feature subset selected by NBPSOSEE but also improve the accuracy of classification. The experimental results show that the proposed NBPSOSEE-SBS can effectively reduce a large number of redundant features and stably improve the prediction results in the case of low execution time, compared with one state-of-the-art optimization algorithm, and seven well-known wrapper-based feature selection approaches for the risk prediction of ECR for power customers.

1. Introduction

With the rapid development of global energy market, smart grid [1] in power industry has been built continuously, and the scale of information data accumulated by power system is becoming larger and larger. As the main income of power companies, electric tariff plays a decisive role in the development of power enterprises. However, in the whole process of power marketing, the risk of arrears’ users has always existed, which hinders the development of power enterprises. Electric charge recovery (ECR) has always been a difficult problem that power supply enterprises need to solve urgently. It is also the most important management part of power meter reading, verification, and checking. Once the processing of ECR is delayed, it may have a bad impact on the charging result. It also causes power customers to occupy the funds of power enterprises, which is not conducive to the fund management of power companies. The main operating profit of power grid enterprises comes from ECR. Through the analysis of electric power clients’ paying behavior and customer’s characteristics, the risk factors and risk levels of ECR are predicted and evaluated, which contribute to collecting electric charge, formulating effectively preventive measures in time, reducing management risks and safeguarding economic benefits of power enterprises. Therefore, accurate and reliable forecasting of arrear risks is an important reference in terms of determining the management of ECR.

In the past, some people adopted these methods of feature extraction [24], information entropy theory [5], combined support vector machine (SVM) with VIKOR method [6], artificial immune algorithm [7], and combined trust evaluation cloud with decision algorithm [8] and established a power arrears risk model, but these results predicted by these methods were not very good. In recent years, some scholars have used quantitative analysis [9], classification [10], clustering [11], ensemble [12], and improved algorithms [13, 14] to analyse arrears behavior of electricity users and improved prediction results. However, with the increase in huge quantities of data and the difficulty in capital operating, power companies urgently need to adopt faster and more accurate data processing methods to predict the arrears risk of power users in the future.

In order to extract more information that affects customer arrears behavior, low-dimensional features are extended to high-dimensional features via the feature extraction method. However, many irrelevant and redundant features will reduce the accuracy of classification and raise the complexity of dimensions. Therefore, feature selection is a very effective solution [15]. Feature selection and classification methods are widely used in high-dimensional and multiclass data sets [16, 17], which can improve the accuracy of model prediction by removing irrelevant and redundant features. Generally, the feature selection process includes the following stages: selecting feature subsets, evaluating feature subsets, and verifying results. The purpose of this process is to remove unrelated or redundant features and generate a smaller optimal feature subset. Generally, feature selection methods are divided into filter, wrapper, and embedded [18]. For the filter algorithms, the inherent distribution characteristics of the data are used as the basis for feature selection. The process of selecting features by filter approaches is not related to the learner. The filter approaches can be further divided into single variable algorithms and multivariate algorithms [19]. The single variable algorithms are able to evaluate each feature individually, reducing the accuracy of classification. However, multivariate algorithms have the advantage of evaluating the correlation between features. The filter algorithms are independent of the classifier and have fast computation speed. However, there is no interaction between classification algorithm and features in the process of feature selection by filter approaches. The wrapper approaches rely on classification algorithms to evaluate the selected feature subsets, which can achieve a higher classification accuracy than filter methods [20]. However, wrapper algorithms have high computational complexity in feature selection of high-dimensional data sets. In embedded approaches, feature selection is directly integrated into the training process of learners [21], but embedded methods are more complex in concept because it is not easy to obtain better classification results by the improved classification model. In contrast, wrapper approaches can make use of the performance of machine learning algorithm as evaluation criteria for selecting features, making it more flexible and more effective analysis of high-dimensional data. In recent years, wrapper approaches have aroused much attention by solving feature selection problems and seeking global optimal solutions through heuristic algorithms.

In recent years, many researchers have successfully applied wrapper approaches to feature selection. Li and Yang [22] integrated OS-extreme learning machine (EOS-ELM) and binary Jaya-based feature selection to real-time transient stability assessment using the phasor measurement unit (PMU) data. Yang et al. [23] propose a novel binary Jaya optimization algorithm, which is integrated with the lambda iteration method to transform the dual objectives of economy and emission commitment into a single objective problem. Aslan et al. [24] proposed the JayaX binary optimization algorithm replacing Jaya’s solution update rules with XOR operator and compared the results with the latest algorithm, which can produce better quality results in binary optimization problems. The whale optimization algorithm (WOA) [25, 26] was proposed, using the wrapper-based method to reach the optimal subset of features and effectively improve the accuracy of classification. Houssein et al. [27] proposed an S-shaped binary whale optimization algorithm for feature selection. Tawhid and Ibrahim [28] used a feature selection based on binary whale optimization algorithm (BWOA) to solve the problem of feature selection. Rao et al. [29] used the artificial bee colony approach (ABC) based on the boosting decision tree model to improve the quality of the selected features. The method can accelerate the convergence speed and balance exploration and exploitation efficiently. Furthermore, binary artificial bee colony algorithm (BABC) [30, 31] is used to select feature subset. S. Oreski and G. Oreski [32] used the genetic algorithm-based heuristic for feature selection in credit risk assessment. In the paper, the genetic algorithm (GA) applies the neural network model to select optimal subset of features and increases the results of classification in credit risk assessment. Chen et al. [33] introduced a heuristic feature selection approach into text categorization by using chaos optimization and GA. Shukla et al. [34] proposed a new hybrid feature subset selection framework based on binary genetic algorithm and information theory. The method accelerates the search of important feature subset. Emary et al. [35] used binary grey wolf optimization (BGWO) approaches to select feature subset. Al-Tashi et al. [36] introduced the feature selection method based on GWO for coronary artery disease classification. Several studies used feature selection methods based on particle swarm optimization (PSO) algorithm to search the feature space for optimal solutions [3739]. PSO performs equally well or better than WOA, ABC, GA, and GWO in solving global optimization problems [40]. Therefore, PSO is a promising approach to many tasks, including feature selection.

Kennedy and Eberhart proposed a particle swarm optimization (PSO) algorithm in 1995 [41]. PSO has been widely applied to many fields due to its superior performance, such as neural network training [42], classifier design [43], clustering analysis [44], and network community [45]. Since many problems are discrete in practice, Kennedy and Eberhart proposed a binary particle swarm optimization (BPSO) algorithm in 1997 [46], which uses binary encoding form to solve the discrete optimal combination problem. The BPSO has also been widely used in many fields, such as knapsack problems [4648], power systems [49, 50], data mining [5155], and image processing [56]. Because the search range of particles cannot be dynamically adjusted, BPSO can easily fall into the local optimum and premature convergence with the decline of population diversity. In view of the shortcomings of BPSO, many scholars have proposed various improved methods. First, Chuang and other scholars used chaotic binary particle swarm optimization (CBPSO) algorithm [5759] to chaotic mapping the inertia weight improving the performance of BPSO. However, the behavior of chaotic mapping does not exist at fixed points, periodic or quasiperiodic orbits. Liu et al. [60] proposed BPSO with linear adaptive inertia weight, which improved the search performance of particles, but it often ignored the nearby optimal solution in the process of linear transformation of inertia weight. Wu et al. [61] proposed a feature selection algorithm based on hybrid improved binary quantum particle swarm optimization (HI-BQPSO) for feature selection. The proposed HI-BQPSO method effectively and efficiently improves the classification accuracy and introduces the strategies of crossover and mutation, compared with other feature selection approaches on nine gene expression datasets and 36 UCI datasets. The sequential backward selection (SBS) [62] is a heuristic search algorithm, that is, simple to implement, but the amount of computation is greatly affected by the initial feature set.

The traditional BWOA, BABC, BGA, BGWO, BPSO, and CBPSO algorithms have simple structure and few parameters. These algorithms are proved to be effective by using a binary mechanism to select feature subsets, but they are difficult to effectively jump out of the local optimum. For the state-of-the-art metaheuristic optimization algorithm, the HI-BQPSO algorithm enhances the diversity in the search process and can effectively search the optimal feature subset, but HI-BQPSO lacks stability in balance exploration and exploitation for searching global optimal solution. In order to effectively balance exploration and exploitation in the search process, a hybrid nonlinear inertia weight binary particle swarm optimization with shrinking encircling and exploration mechanism is proposed, and then, the SBS method is introduced, namely, NBPSOSEE-SBS, for solving feature selection tasks. NBPSOSEE-SBS can effectively reduce the number of features while maintaining the best classification effect. In order to prove the effectiveness and superiority of NBPSOSEE-SBS, the two groups of comparative experiments are set up, using the logistic regression method, to realize the risk prediction of ECR for power customers in June, July, and August 2018. NBPSOSEE-SBS can significantly reduce the feature dimension, improve the accuracy of classification, and effectively enhance the global search capability. The main contributions of this study are summarized as follows. Firstly, the proposed algorithm achieves the balance between local search and global search by nonlinearly updating the inertia weights and enhances the diversity of particles for searching the optimal solutions. Secondly, two dynamic contraction factors are introduced into the update of velocity and position, which can not only effectively enhance the inheritance ability, self-recognition ability, and social cognitive ability of the particles but also improve the quality of the particle position. Furthermore, a novel position updating approach is proposed to get rid of the trap of local optimum and shrinking encircling, and exploration mechanism is introduced. Finally, the SBS algorithm is used to remove individual redundant features separately and help to find the potential optimal solutions.

2. Methodology

2.1. Binary Particle Swarm Optimization

PSO is a random search algorithm based on group cooperation developed by simulating the foraging behavior of birds [41]. Assuming that the feature dimension of a target search space is , the size of the population is m, where is a D-dimensional vector and represents the ith particle, , and is the number of particles. is the position of particle i in the target search space. is the velocity determining the direction and distance of each particle flight in dimension . and are the optimal position of the ith particle and the optimal position for the whole population, respectively. The velocity and position of particles are updated by the following equations, respectively:where and . is the number of particles. and are learning factors. The range of and is [0, 1]. is the number of iterations.

The PSO is only suitable for continuous function solving. Kennedy and Eberhart proposed the BPSO [46] based on binary encoding form. In the BPSO, the position of each particle is represented by binary strings, and the velocity vector is not. The positions of particles are updated according to the following equation [63]:where .

2.2. Fitness Function

The purpose of feature selection is to get the best classification results with the least features. The fitness function is shown in the following equation:

The fitness value is the F1-measure average value of predicting the high-risk, medium-risk, and low-risk levels of customer ECR, and its range is between [0, 1]. In equation (4), , , and represent the F1-measure value of predicting the high-risk, medium-risk, and low-risk levels of customer ECR, respectively. In order to make an objective evaluation of the performance of the model, this paper introduces four evaluation criteria: accuracy, precision, recall, and F1-measure. The definitions are shown in the following equations:

TP, FP, FN, and TN represent true positive, false positive, false negative, and true negative, respectively, in equations (5)–(8). In theory, the higher the values of accuracy, precision, recall, and F1-measure, the higher the fitness value, and the better the predictive performance of the model.

3. Improved Hybrid Feature Selection Algorithm Based on NBPSOSEE-SBS

The framework for the risk prediction of ECR includes three processes: data preprocessing, selecting the optimal feature subset based on NBPSOSEE-SBS, and classification and evaluation using the logistic regression method. The framework for feature selection of ECR risk based on NBPSOSEE-SBS algorithm is shown in Figure 1.

3.1. Data Preprocessing

The characteristic dimension of the original power data is low, which cannot adequately express the arrears behavior of power users. In order to solve this problem, data are expanded from low-dimensional features to high-dimensional features by feature extraction. According to the electricity data with 21 consecutive months, the features of training set and test set are extracted, respectively. The training set contains the power data for 20 months, and the test set contains the power data for one month. For the data of each month, firstly, the categorical features are transformed based on the method of one-hot encoding and, secondly, adding characteristics of the last six months to the encoded data and calculating the maximum, minimum, average, median, variance, and standard deviation; finally, the original 34-dimensional features are extended to 748 dimensions.

The processing of feature extraction is shown in Algorithm 1, where m is the total number of months, is the power data of the kth month, and dataSet is the data set after the features have been extracted.

(1)Input: the original data set
(2)Output: the data set after feature extracted
(3)Get training set;
(4)Get test set;
(5)Create an empty new data table dataSet;
(6)for k = 7 to m//m is the total number of months
(7) ← Get power data for the kth month;
(8) One-hot encoding of categorical features in ;
(9)for j = 1 to 6 //k-6th to k-1th month
(10)   ← Get power data for the k-jth month;
(11)   ← Add the numeric features in to ;
(12)end for
(13) ← Calculate the maximum, minimum, mean, median, variance, and standard deviation;
(14) //Merge the extended datasets to genera
(15)end for

After all the features have been calculated, the min-max standardization method is used to transform the features and map the values to [0, 1]. The min-max standardization function is shown in the following equation:

In equation (9), presents the feature be used for transforming, and and are the maximum and minimum of each feature, respectively.

The expansion of features can reflect the information of users’ historical electricity consumption behavior, as shown in Table 1 (only one of the features is taken as an example due to too many features).

These 26 features of electricity consumption behavior in the current month are expanded by this method. Then, the statistical features such as maximum value, minimum value, mean value, variance, and standard deviation of these historical electricity consumption features are calculated. Take the expansion of “jfjsl” (payment timeliness rate) as an example, and the specific statistical analysis is shown in Table 2 (only one of the features is taken as an example).

In addition, one-hot coding method is used to transform the category features into numerical features. Finally, the original 34-dimensional features are extended to 748 dimensions.

4. Improved NBPSOSEE Algorithm

Inertial weight is a very important parameter in the adjustable parameters of the BPSO algorithm. The value of plays an important role in the performance of the algorithm. The small value is convenient for local search of the current search area, and a more accurate solution can be obtained to facilitate the convergence of the algorithm, but it is not easy to jump out of the local extremum point; a large value of is convenient for global search, but it is not easy to get an accurate solution. Literature [60] shows that linear optimization of inertia weight can improve the performance of the algorithm, but this strategy cannot effectively satisfy the optimization process of the algorithm. Therefore, in order to get closer to the actual evolutionary state of the algorithm, this paper performs nonlinear incremental optimization of inertia weight. In each iteration, the inertia weight is calculated as follows:where and represent the current iteration number and the maximum number of iterations, respectively. As the number of iterations increases, the inertia weight exhibits a nonlinear incremental state. It can be known that the improved algorithm has a smaller in the early stage of searching for the optimal solution, and the particles have a stronger local search capability. In the later stage of finding the optimal solution, there is a large , and the particles have strong global search ability.

Furthermore, in order to enhance the optimal performance of the PSO, two new contraction factors are introduced into the position updating equation in the NBPSOSEE algorithm. Clerc and Kennedy [64] proposed a particle swarm optimization algorithm with contraction factor (CFPSO) in 2002, which is intended to improve the convergence speed while getting rid of the local optimal value. The algorithm flow of CFPSO is similar to the original PSO, but the velocity updating formula of particles is different. CFPSO uses the contraction factor to compress the particle velocity after each update, which not only changes the influence ability of its own historical velocity but also changes the influence ability of the historical optimal position on the particle velocity, so as to improve the convergence speed of the population. However, CFPSO has some drawbacks. Too large value of the contraction factor results in poor convergence performance and makes the PSO close to random search optimization. If the value of the contraction factor is too small, the PSO will easily converge earlier and reduce the accuracy of classification. In order to solve this problem, the two dynamic contraction factors and are used to compress the velocity and position of each update, respectively, in the equations (11) and (12). On adding a contraction factor to the velocity, NBPSOSEE can improve the inheritance ability, self-recognition ability, and social cognitive ability of the particles. NBPSOSSE introduces a contraction factor, which will improve the quality of the particle position when updating the position. In NBPSOSSE, the two dynamic contraction factors can enhance the performance of exploration and exploitation and improve the convergence speed:

These two dynamic contraction factors reveal a nonlinear convert of the particle velocity and position. Parameters and are calculated as follows:where is position value of the particle, is the current iteration, and is a constant ().

Then, in NBPSOSEE, using the two mechanisms of shrinking encircling and exploration can improve the search ability of the population. Firstly, the moving position of the particle is determined between the current position and the target position via using shrinking encircling operation, which can shorten the search range of the particles and achieve the purpose of enhancing the local search ability of the population. In addition to shrinking encircling strategy, NBPSOSEE refers to random search mechanism to improve the diversity of particles. When updating the particle position, it is based on the change of coefficient . If exceeds the range of [−1, 1], distance coefficient will be updated randomly. In order to find the target, the particles will traverse the original target direction, making the population have the performance of global search. In NBPSOSEE, the particle position is updated by equation (18). NBPSOSEE adopts dynamic contraction strategy, shrinking encircling operation and exploration mechanism with some probability, which not only can get rid of the local optimal solution but also can accelerate the convergence speed:

In equations (16) and (17), represents a coefficient variable, and is a target that is randomly searched (). In equation (14), represents the current number of iterations, represents a coefficient variable, is defined as the optimal position of the current target , and represents a random value between [0, 1].

The coefficient variables, namely, and , are calculated separately as follows:

In the above formulas, is a variable with a range of values between [0, 2] and presents a linear decreasing trend. It is updated in the form of , where is the current number of iteration, represents the maximum iteration number, and is the random value between [0, 1].

4.1. SBS Algorithm

Since the feature subset selected by NBPSOSEE still contains redundant features, the importance of features is different, and the order of arrangement is confusing. Therefore, this paper firstly uses the feature selection method of random forest to sort the importance degree of the features selected by NBPSOSEE and then uses SBS algorithm to delete features with low importance in turn.

For each node on the random forest decision tree, features are usually randomly extracted from the d-dimensional feature set. Then, according to the Gini gain maximization principle [65], a feature is selected to divide the data on the node into two left and right child nodes, which means that the data on the parent node is divided into its child nodes and . The Gini gain maximization is to maximize the following equation:

Here, is the Gini index of node , is the proportion of class samples in node , while and are the rates of data divided into left and right child nodes and by parent node , respectively. The importance of features on nodes is shown in the following equation:

If the set of nodes in which the feature appears as a node partitioning attribute in the kth decision tree is , the importance of the feature on the decision tree is calculated as shown in the following equation:

Assuming that there are trees in the random forest, the importance of the feature in the random forest can be calculated as follows:

Here, is the number of decision trees in the random forest. After the NBPSOSEE obtains the optimal feature subset, the SBS starts searching, calculates the fitness value corresponding to each feature when it is deleted separately, and then selects the feature subset with the best fitness value to enter the next iteration. The iterative steps of the algorithm in the SBS stage are as follows:Step 1: determine the optimal feature subset of the SBS stage.Step 2: delete a feature in the current feature subset and satisfies the following equation:where denotes the feature subset after removes feature , is the number of iterations, and is the fitness value. The larger the fitness value in this paper, the better the selected feature subset.Step 3: update the optimal feature subset and the number of iterations:Step 4: repeat steps 2 and 3 until the termination condition is met.

The feature selection processing of NBPSOSEE-SBS is shown in Algorithm 2. Where maxIterations is the maximum number of iterations, swarmSize is the number of particles in the population, dimension is the dimension of each particle, and fitness is the fitness value.

(1)Input: the data set after feature extracted
(2)Output: best feature subset
(3)Get training set and test set after the features have been extracted;
(4)Initialize population, such as initial position, velocity and fitness values of particles, as well as local optimal pbest and global optimal of particles;
(5)for k = 1 ⟶ maxIterations do
(6)for i = 1 ⟶ swarmSize do
(7)  Update inertia weight using equation (10);
(8)  for j = 1 ⟶ dimension do
(9)   Update and ;
(10)   Update and , respectively, using equations (19) and (20);
(11)   Update and , respectively, using equations (13) and (14);
(12)   Update velocity of particle using equation (11);
(13)   Update position of particle using equation (18);
(14)  end for
(15)end for
(16)for i = 1 ⟶ swarmSize do
(17)  Update fitness value ;
(18)  Update local optimal pbest;
(19)end for
(20) Update global optimal ;
(21)end for
(22)Getting the optimal feature subset selected by NBPSOSEE;
(23)Delete a feature in the current feature subset using equation (26);
(24)Update the optimal feature subset using equation (27);
(25)Repeat steps 23 and 24 until the termination condition is met;

5. Computational Complexity of NBPSOSEE-SBS

The time computational complexity of the BPSO algorithm can be expressed as , where indicates the number of iterations, is the population size, and represents the computing time for the logistic regression model training and predicting. The time computational complexity of the CBPSO algorithm can be expressed as , where is the computing time of logistic map sequence. The time computational complexity of the CBPSO algorithm can be expressed as , where is the computing time of logistic map sequence. The time computational complexity of the HI-BQPSO algorithm can be expressed as , where is the computing time for correction factors.

The time computational complexity of the proposed NBPSOSEE algorithm can be expressed as , where is the computing time for calculating inertia weight; is the computing time for updating coefficient variables, namely, and ; and is the computing time for dynamic contraction factors and . The time computational complexity of the proposed NBPSOSEE-SBS algorithm can be expressed as , where is the computing time of the SBS method. It can be seen that the computational complexity of NBPSOSEE-SBS algorithm is obviously highly compared with the BPSO and CBPSO algorithms. However, , , and are the only simple numerical operations according to equations (10), (13), (14), (19), and (20). Furthermore, a large number of redundant or irrelevant features are deleted by the NBPSOSEE algorithm. SBS method will spend less time deleting the remaining unimportant features. Therefore, the proposed NBPSOSEE-SBS algorithm does not significantly improve the computational complexity.

6. Results

6.1. Data

In this paper, the data set from January 2017 to September 2018 is provided by a power enterprise, including the electricity consumption data of 11,860 high-voltage users who have had arrears. According to the users’ past payment behavior, the power enterprise divides the users into high-risk, medium-risk, or low-risk arrears ones. A total of 34 features are used for data processing and model training. There are 8 category features that represent basic information for these users, while 26 statistical features describe these users consuming electric information.

Experimental environment: operating system Centos 7.0, CPU Intel Xeon E5-2620 V3, memory 128G, development language Python 3.5.

6.2. Experimental Results

In order to prove the effectiveness and superiority of the proposed algorithm, two groups of comparative experiments are set up and use logistic regression model [66, 67] to realize the risk prediction of ECR for power customers in June, July, and August 2018. The first group of experiments verifies the effectiveness of the NBPSOSEE. The second group of experiments proves the superiority of the proposed hybrid feature selection algorithm based on NBPSOSEE with SBS, called NBPSOSEE-SBS.

The relevant parameters selected for NBPSOSEE-SBS are listed in Table 3. Population size 20 to 40 is the optimal size for optimization problems [68]. Generally, population size is set to 20 [46, 58, 60]. Hence, 20 particles are selected to form a particle swarm in this paper. The maximum iteration number is 100; maximum value is 0.9, and minimum value is 0.4 for the inertia weight; learning factor .

The logistic regression model is trained by training set with the optimal feature subset selected by the NBPSOSEE-SBS algorithm. The logistic regression model outputs the default probability of users through the test set, . Then, setting the appropriate threshold value , according to the threshold value , users are divided into three levels: high risk, medium risk, and low risk. The specific division principles are shown in Table 4.

The users with the default probability of are divided into the high risk level; the users with the arrears probability of are divided into the medium risk level; and the users with the arrears probability of are divided into the low risk level.

6.3. Experimental Results of NBPSOSEE

Figures 24 show the fitness values and the number of features calculated by BWOA, BABC, BGA, BGWO, BPSO, CBPSO, HI-BQPSO, LBPSOSEE, and proposed NBPSOSEE algorithms for the risk prediction of ECR in June, July, and August 2018. In these pictures, a represents the number of iterations vs. fitness value, and b represents the number of iterations vs. the number of selected features.

Figure 2 shows the test results of the improved NBPSOSEE for ECR risk in June 2018. Compared with the other eight feature selection algorithms, NBPSOSEE gets the maximum fitness value with the least number of features. In terms of number of features and fitness value, the performance of HI-BQPSO, LBPSOSEE, and NBPSOSEE is significantly better than the other six algorithms, of which NBPSOSEE is the best, LBPSOSEE is the second, and HI-BQPSO is the third. In the process of calculating fitness, BABC obtains the least fitness value. The fitness values calculated by HI-BQPSO, LBPSOSEE, and NBPSOSEE are 5.77%, 7.34%, and 11.61% higher than BABC, respectively. However, NBPSOSEE increased by 5.52% and 3.98% in HI-BQPSO and LBPSOSEE, respectively. Furthermore, in the process of selecting feature subset, BABC selects the most features; HI-BQPSO, LBPSOSEE, and NBPSOSEE select 242, 216, and 205 features, respectively. NBPSOSEE selects the fewest features and deletes 190, 179, and 153 fewer features in HI-BQPSO, LBPSOSEE, and BABC, respectively. NBPSOSEE selects approximately one-third of the features from the original feature set, removing a total of 543 redundant or unrelated features. In addition, BWOA, BABC, BGA, BGWO, BPSO, CBPSO, HI-BQPSO, and LBPSOSEE search for optimal fitness values after 97, 37, 61, 84, 68, 35, 48, and 43 iterations, respectively. The NBPSOSEE obtains the optimal fitness value until 15 iterations and can still search for the best fitness value after 17, 20, and 98 iterations. In the initial search, NBPSOSEE quickly increases the initial fitness value to a very high level, which reflects the convergence performance of the proposed algorithm. The search results show that the proposed NBPSOSEE still has effective global search ability after falling into the local optimal state and can continue to search for a better feature subset in later search.

Figure 3 shows the test results of the proposed NBPSOSEE and other feature selection algorithms for ECR risk in July. The fitness values calculated by the proposed NBPSOSEE and these comparison algorithms show an increasing trend with the increase in the number of iterations. Moreover, the number of selected features by NBPSOSEE is decreasing. LBPSOSEE and NBPSOSEE get higher fitness values and select fewer features than the other seven algorithms. The fitness values calculated by LBPSOSEE are 2.21%, 4.17%, 6.94%, 4.41%, 5.60%, 4.41%, 1.09% higher than those of BWOA, BABC, BGA, BGWO, BPSO, CBPSO and HI-BQPSO, respectively. Moreover, the number of features selected by LBPSOSEE is 122, 46, 106, 51, 80, 105, and 59 less than those of the seven algorithms. From these results, it can be seen that LBPSOSEE has better ability to avoid falling into local optimal solutions than these seven comparison algorithms. After 58 iterations, the fitness value of NBPSOSEE is 0.939, and the selected optimal feature subset has 213 features. However, the fitness value calculated by NBPSOSEE is 1.62% higher than that of LBPSOSEE, and the number of features selected is 69% less than that of LBPSOSEE. This proves that the global search ability of NBPSOSEE is stronger than LBPSOSEE, and the performance of NBPSOSEE is significantly better than other feature selection algorithms.

Figure 4 shows the test results of the improved NBPSOSEE and other eight comparison algorithms for ECR risk in August. The search capabilities of LBPSOSEE and NBPSOSEE are significantly higher than other algorithms. The optimal fitness values obtained by LBPSOSEE and NBPSOSEE are increasing, and the selected features are decreasing. The optimal fitness value calculated by LBPSOSEE is 0.957, which is 2.46%, 3.12%, 1.27%, 1.70%, 7.71%, 2.79%, and 0.95% higher in BWOA, BABC, BGA, BGWO, BPSO, CBPSO, and HI-BQPSO, respectively. LBPSOSEE selects 172 features, and the number of selected features is reduced by 169, 144, 208, 137, 175, 184, and 111, respectively, compared with seven algorithms. However, the optimal feature subset selected by NBPSOSEE contains 102 features, and the optimal fitness value calculated is 0.977. NBPSOSEE deletes 70 more redundant or unrelated features than LBPSOSEE, and the optimal fitness value calculated is 2.09% higher than LBPSOSEE. The experimental results show that NBPSOSEE obtains the highest ability to jump out of the local optimal solution and gets the highest fitness value under the condition of selecting the least features. In conclusion, NBPSOSEE has a higher convergence speed and global search ability than BWOA, BABC, BGA, BGWO, BPSO, CBPSO, HI-BQPSO, and LBPSOSEE.

As shown from Figures 24, the performance of NBPSOSEE is significantly better than BWOA, BABC, BGA, BGWO, BPSO, CBPSO, HI-BQPSO, and LBPSOSEE. The proposed NBPSOSEE selects the least number of features to obtain the highest fitness value for the test of ECR risk in June, July, and August 2018. This verifies that the NBPSOSEE convergence speed and global search ability are higher than other algorithms and also demonstrates the effectiveness and stability of the proposed NBPSOSEE.

6.4. Experimental Results of NBPSOSEE-SBS

Although the results obtained by NBPSOSEE have been improved, the feature subset selected by NBPSOSEE still contains many redundant features. Therefore, a hybrid feature selection algorithm is proposed, namely, NBPSOSEE-SBS, to select optimal feature subset in this paper, which can not only effectively reduce the number of features but also improve the accuracy. Figures 57 show the test results calculated by BWOA-SBS, BABC-SBS, BGA-SBS, BGWO-SBS, BPSO-SBS, CBPSO-SBS, HI-BQPSO-SBS, LBPSOSEE-SBS, and the proposed NBPSOSEE-SBS algorithms for the risk prediction of ECR in June, July, and August 2018. The x-axis represents the number of iterations, where the maximum number of iterations represents the number of features selected by the NBPSOSEE.

Figure 5 shows the test results of the improved NBPSOSEE-SBS for ECR risk in June 2018. With the constantly deleting redundant or irrelevant features, the fitness values calculated by NBPSOSEE-SBS and other eight feature selection algorithms show an increasing trend. The optimal fitness value calculated by BWOA-SBS is 3.01% higher than that of BWOA, and the number of deleted redundant features is 9 more than BWOA. The best fitness value calculated by BABC-SBS is 0.24% higher than that of BABC, and the selected features are reduced by 6 compared to BABC. The optimal fitness value obtained by BGWO-SBS is increased by 1.29% compared with BGWO, and the selected features are 21 fewer than that of BGWO. The optimal fitness value obtained by BPSO-SBS is 1.9% more than BPSO, and the number of selected features is reduced by 12 compared with BPSO. The optimal fitness value obtained by CBPSO-SBS is 0.7% higher than CBPSO, and 9 more redundant features are deleted than CBPSO. The best fitness value of HI-BQPSO-SBS calculation is 3.54% higher than that of HI-BQPSO, and the deleting redundant features are 35 more than HI-BQPSO. The best fitness value of LBPSOSEE-SBS is 0.79% higher than that of LBPSOSEE, and selecting redundant features are 14 fewer than LBPSOSEE. In addition, the optimal fitness value obtained by NBPSOSEE-SBS is 0.54% more than NBPSOSEE, and the number of selecting features is reduced by 11 compared with NBPSOSEE. The experimental results show that NBPSOSEE-SBS obtains the highest fitness value and removes the most redundant or unrelated features.

Figure 6 shows the test results based on NBPSOSEE-SBS and other feature selection algorithms for ECR risk in July. The best fitness values obtained by BWOA-SBS, BABC-SBS, BGA-SBS, BGWO-SBS, BPSO-SBS, CBPSO-SBS, HI-BQPSO-SBS, LBPSOSEE-SBS, and the proposed NBPSOSEE-SBS are significantly superior to BWOA, BABC, BGA, BGWO, BPSO, CBPSO, HI-BQPSO, LBPSOSEE, and NBPSOSEE, respectively, and the more redundant features are removed. NBPSOSEE-SBS obtains the highest fitness value, which increased by 3.68%, 3.68%, 8.79%, 5.73, 1.18%, 1.95%, 2.84%, and 0.64% compared to BWOA-SBS, BABC-SBS, BGA-SBS, BGWO-SBS, BPSO-SBS, CBPSO-SBS, HI-BQPSO-SBS, and LBPSOSEE-SBS, respectively. NBPSOSEE-SBS selects the least number of features, which, respectively, reduced by 184, 88, 171, 121, 134, 158, 114, and 60 redundant features compared with the eight comparison algorithms. This shows that the proposed NBPSOSEE-SBS has better performance than other comparative algorithms in balancing local search and global search.

Figure 7 shows the test results of the improved NBPSOSEE-SBS for ECR risk in August. The results obtained by NBPSOSEE-SBS are significantly better than other algorithms. NBPSOSEE-SBS deletes the most redundant or unrelated features and obtains the highest fitness value. The optimal fitness values obtained by NBPSOSEE-SBS increased by 4.45%, 5.79%, 4.34%, 4.23%, 6.71%, 5.45%, 3.79%, and 2.60% compared with the other eight algorithms, respectively. The number of features selected by NBPSOSEE-SBS is reduced by 256, 236, 299, 227, 259, 280, 198, and 85, respectively, over these eight algorithms. The experimental results show that the proposed NBPSOSEE-SBS has the best performance.

The test results are shown in Figures 57. BWOA-SBS, BABC-SBS, BGA-SBS, BGWO-SBS, BPSO-SBS, CBPSO-SBS, HI-BQPSO-SBS, LBPSOSEE-SBS, and the proposed NBPSOSEE-SBS are significantly better than BWOA, BABC, BGA, BGWO, BPSO, CBPSO, HI-BQPSO, LBPSOSEE, and NBPSOSEE. However, the proposed NBPSOSEE-SBS gets the best results, which not only deletes the most of redundant or unrelated features but also calculates the optimal fitness value. The experimental results show that the proposed NBPSOSEE-SBS improves the convergence speed of particles and also enhances the ability to avoid falling into premature.

7. Discussion

Tables 57 show the comparative experiments of the sixteen traditional algorithms, a new feature selection algorithm, and the proposed NBPSOSEE-SBS for the risk prediction of ECR in June, July, and August 2018. According to the number of selected features, accuracy, precision, recall, and F1-measure, the convergence speed and search performance of the proposed algorithm can be evaluated verifying its effectiveness and superiority. In this table, , Acc, Pre, Rec, and F1 represent the selected number of features, accuracy, precision, recall, and F1-measure, respectively.

Table 5 shows the test results of the improved NBPSOSEE-SBS and comparison algorithms for ECR risk in June 2018. For high-risk arrears users, the accuracy, precision, recall, and F1-measure calculated by these comparison algorithms are all 100%. This is because the number of high-risk arrears is small, and the characteristics of reflecting arrears behavior are obvious. Among the arrears’ users of the medium risk level, HI-BQPSO, NBPSOSEE, BWOA-SBS, BGA-SBS, BGWO-SBS, HI-BQPSO-SBS, and the proposed NBPSOSEE-SBS achieve the highest precision, reaching 100%. The proposed NBPSOSEE-SBS obtains the highest accuracy, precision, and F1-measure for the arrears’ users of the low risk level. The accuracy, precision, and F1-measure obtained by NBPSOSEE-SBS are 0.1%, 1.24%, and 0.67% higher than NBPSOSEE, respectively. Moreover, the proposed NBPSOSEE-SBS removes the most redundant or unrelated features and calculates the highest average value of F1-measure.

Table 6 shows the test results of the improved NBPSOSEE-SBS and comparison algorithms for ECR risk in July 2018. The results calculated using all feature subsets are the worst, but the results of the proposed NBPSOSEE-SBS are significantly better than other algorithms. The accuracy and F1-measure of NBPSOSEE-SBS are higher than other algorithms except for CBPSO in the arrears’ users of the medium risk level. Furthermore, NBPSOSEE-SBS achieves a precision of 100%. In addition, the accuracy and precision calculated by the proposed NBPSOSEE-SBS are close to that of LBPSOSEE-SBS among the arrears’ users of the low risk level. On the whole, the proposed algorithm selects the lowest number of features, less than one-third of the original features, and the calculated average accuracy and F1-measure are the highest.

Table 7 shows the test results of the improved NBPSOSEE-SBS and comparison algorithms for ECR risk in August 2018. Among the arrears’ users of the low risk level, the proposed NBPSOSEE-SBS achieves the highest accuracy, precision, and F1-measure. The accuracy, precision, and F1-measure of NBPSOSEE-SBS increased by 6.73%, 64.74%, and 53.38%, respectively, compared with the results calculated using all features. Furthermore, NBPSOSEE-SBS selects the least number of features in the optimal feature subset obtained. The experimental results show that the proposed NBPSOSEE-SBS deletes irrelevant or redundant features more than 90.4% of the original features, getting the highest average value of F1-measure.

The execution time of algorithm is also an important indicator for evaluating the performance of algorithm. The long running time of algorithm means that the complexity of algorithm is high, and the short running time indicates that the complexity of algorithm is low. In this paper, the running time of the proposed NBPSOSEE, NBPSOSEE-SBS, and other algorithms is shown in Figures 8 and 9. In order to remove more redundant features and improve the classification results, these hybrid feature selection algorithms with SBS take slightly more execution time than those without SBS. The execution time of NBPSOSEE-SBS is not significantly different from other algorithms. The execution speed of NBPSOSEE-SBS is faster than BWOA-SBS, BABC-SBS, BGA-SBS, BGWO-SBS, BPSO-SBS, CBPSO-SBS, HI-BQPSO-SBS, and LBPSOSEE-SBS for the testing in July and August. In summary, the proposed NBPSOSEE-SBS outperforms other algorithms. NBPSOSEE-SBS can effectively reduce a large of redundant features and stably improve the prediction results while keeping low execution time.

8. Conclusion

To solve the problem of accurately predict the risk of ECR for power customers, a hybrid nonlinear inertia weight binary particle swarm optimization with shrinking encircling and exploration mechanism (NBPSOSEE) is proposed for solving feature selection tasks. In addition, an improved feature selection approach that is based on the NBPSOSEE and SBS is proposed, namely, NBPSOSEE-SBS, to select the optimal feature subset. The experimental results prove that the proposed NBPSOSEE-SBS can steadily remove more redundant or irrelevant features and obtain better prediction results of ECR for power customers in the case of low execution time, compared with one state-of-the-art optimization algorithm, and seven well-known wrapper-based feature selection approaches for the risk prediction of ECR for power customers.

Data Availability

The experimental data contain the user’s privacy. Therefore, in order to protect the security of users, we cannot upload the data set.

Conflicts of Interest

There are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was financially supported by the National Natural Science Foundation of China (no. 61672470), Science and Technology Project of Henan Province (no. 202102210351), Doctoral Program of Zhengzhou University of Light Industry (no. 2017BSJJ046), and Key Research Projects of Henan Higher Education Institutions (no. 20A120011).

Supplementary Materials

The experimental results file directory contains three subdirectories: 201806, 201807, and 201808. There are 18 CSV files in the subdirectories 201806, 201807, and 201808, respectively. There are 18 comparison algorithms in the manuscript, which are BWOA, BABC, BGA, BGWO, BPSO, CBPSO, HI-BQPSO, LBPSOSEE, NBPSOSEE, BWOA-SBS, BABC-SBS, BGA-SBS, BGWO-SBS, BPSO-SBS, CBPSO-SBS, HI-BQPSO-SBS, LBPSOSEE-SBS, and NBPSOSEE-SBS. These CSV files in the 201806, 201807, and 201808 represent the experimental results calculated by these comparison algorithms for the risk prediction of electric charge recovery in June, July, and August 2018. The file BWOA _results.csv represents the calculation result of the algorithm BWOA. The file BABC _results.csv represents the calculation result of the algorithm BABC. The file BGA _results.csv represents the calculation result of the algorithm BGA. The file BGWO _results.csv represents the calculation result of the algorithm BGWO. The file BPSO _results.csv represents the calculation result of the algorithm BPSO. The file CBPSO _results.csv represents the calculation result of the algorithm CBPSO. The file HPSO_SSM_results.csv represents the calculation result of the algorithm HI-BQPSO. The file LBPSOSEE _results.csv represents the calculation result of the algorithm LBPSOSEE. The file NBPSOSEE _results.csv represents the calculation result of the algorithm NBPSOSEE. The file BWOA _SBS_results.csv represents the calculation result of the algorithm BWOA-SBS. The file BABC_SBS_results.csv represents the calculation result of the algorithm BABC-SBS. The file BGA _SBS_results.csv represents the calculation result of the algorithm BGA-SBS. The file BGWO _SBS_results.csv represents the calculation result of the algorithm BGWO-SBS. The file BPSO _SBS_results.csv represents the calculation result of the algorithm BPSO-SBS. The file CBPSO _SBS_results.csv represents the calculation result of the algorithm CBPSO-SBS. The file HPSO_SSM _SBS_results.csv represents the calculation result of the algorithm HI-BQPSO-SBS. The file LBPSOSEE _SBS_results.csv represents the calculation result of the algorithm LBPSOSEE-SBS. The file NBPSOSEE_SBS _results.csv represents the calculation result of the algorithm NBPSOSEE-SBS. In the headline of these CSV files, features_num means selected number of features. Accuracy_high, Loss_high, Precission_high, Recall_high, and F1_high represent the accuracy, loss, precision, recall, and F1-measure of predicting the high-risk level of customer electric charge recovery, respectively. Accuracy_med, Loss_med, Precission_med, Recall_med, and F1_med represent the accuracy, loss, precision, recall, and F1-measure of predicting the medium-risk level of customer electric charge recovery, respectively. And Accuracy_low, Loss_low, Precission_low, Recall_low, and F1_low represent the accuracy, loss, precision, recall, and F1-measure of predicting the low-risk level of customer electric charge recovery, respectively. (Supplementary Materials)