Abstract
In the face of the batch, dynamic access data, or the flow of data that continuous changes over time, the traditional support vector machine algorithm cannot dynamically adjust the previous classification model. To overcome this shortcoming, the incremental support vector machine (ISVM) algorithm is proposed. However, many incremental support vector algorithms still have shortcomings such as low efficiency, memory limitation, and poor generalization. This paper puts forward the new ISVM algorithm, HDFCISVM algorithm, based on the highdimensional distance and forgetting characteristics. This paper firstly proposes the original HDFCISVM algorithm that first learns the distribution characteristics of the samples according to the distance between the samples and the normative hyperplane. Then, it introduces the forgetting factor. In the incremental learning process, the classifier gradually accumulates the spatial distribution knowledge of samples, eliminates the samples that have no contributions to the classifier, and selectively forgets some useless samples according to the forgetting factor, which overcomes the shortcomings such as low efficiency and poor accuracy of some algorithms. But, the original HDFCISVM algorithm is sensitive to parameters, and different settings of the parameters have a great impact on the final classification accuracy of the algorithm. Therefore, on the basis of the original algorithm, an improved algorithm HDFCISVM based on the adjustments to the initialization strategy and updating rules of the forgetting factor is proposed. The initialization strategy and updating rules of the forgetting factor are adjusted to adapt datasets with different distributions in this improved algorithm. The rationality of the improved strategy about the forgetting factor is discussed theoretically. At the same time, the proposed algorithm has better classification accuracy, classification efficiency, and better generalization ability than other algorithms, which is verified by experiments.
1. Introduction
Support vector machine (SVM) is a classic machine learning method. It is proposed by Cortes and Vapnik in 1995 according to VC dimension theory of the statistical theory and the structural risk minimization principle (SRM) [1]. SVM can maximally increase the predictive ability of the learning model, and the obtained classification model by learning has high prediction accuracy for the independent testing samples even if it is built on a small training set. Also, SVM is a convex quadratic programming problem, so it can get the extreme value of solution in the global scope to obtain the optimal solution [2–4]. SVM uses a large interval factor to control the learning process. In dealing with the classification of highdimensional data and the classification of limited samples, it avoids the defects that neural networks and other methods are easy to fall into local maxima or over fitting.
The nonlinear classification model based on SVM can be summarized as the following convex semipositive definite programming problem:
At this time, the necessary and sufficient condition of the optimal solution can be obtained for this optimization problem is that the corresponding Karush–Kuhn–Tucker (KKT) conditions are held [5].
SVM was originally designed to solve the problem of binary classification of balanced data obtained in batch. With the development of machine learning research, SVM has also expanded from the initial binary classification problem and regression problem to other machine learning topics, such as feature selection, semisupervision, top order learning, ordered regression, outlier detection, and multiperspective learning [6]. At the same time, the extended algorithms based on SVM are also applied to more complex data classification or regression. For example, for effectively reducing the impact of noise in the dataset, solving the noise sensitivity and instability of resampling, realizing the highprecision classification of imbalanced data, and the efficient classification of dynamically obtained data or stream data, such as the research of its extended algorithms has always been one of the research directions in the field of machine learning. These studies are of great significance for support vector machines and their variants [7–10]. In these new topics, the models evolved from SVM inherit most of the original characteristics, such as interval theory, kernel techniques, and structural risk minimization and also inherit the defects of the original SVM model.
In the face of the batch, dynamic access data or the flow of data that continuously changes over time, the traditional support vector machine algorithm cannot dynamically adjust the previous classification model. To overcome this shortcoming, the incremental support vector machine (ISVM) algorithm is proposed. In recent years, the incremental learning based on SVM, the ISVM algorithm has attracted a lot of researchers’ attention [11–14]. In every incremental learning process of the ISVM, how to effectively retain the historical information, selectively discard and forget the useless training data, and save the storage space while maintaining the classification accuracy is the key of the ISVM classification algorithm.
Scholars at home and abroad have carried out a lot of research work on ISVM algorithms based on sample preselection strategy. For example, Xiao et al. proposed a new incremental learning algorithm—SVM [15]. Wang analyzed that the samples near the classification boundary were easy to become support vectors, and then selected the nonsupport vectors near the classification boundary into the incremental update and proposed a redundant ISVM learning algorithm [16]. Yao et al. proposed a fast ISVM learning method based on local sensitive hashing in order to improve the classification accuracy of largescale highdimensional data. This method firstly used the local sensitive hash to quickly find similar data, and then selected the samples that may become SVs in the increment on the basis of the SVM algorithm, and then used these samples together with the existing SVs as the basis for subsequent training [17]. Tang combined the strict incremental process of the classical ISVM algorithm with the idea of passiveaggressive online learning to effectively solve the problem of how to better select the new SVs in the online process of the classical ISVM algorithm [11]. Zhang et al. introduced RCMDE as the feature extraction method and proposed an improved ISVM fault classifier based on the whale optimization algorithm (WOA) to diagnose and predict bearing faults [18]. The above incremental models are all based on sample preselection strategies. In addition, many scholars and experts have proposed ISVM learning algorithms based on KKT conditions and the Lagrange multiplier methods [19–21].
As can be seen from the abovementioned research, in the incremental learning process, with the addition of new samples, how to select new support vector sets so that useful information will not be discarded while retaining the original training results has become an important content in the construction of the ISVM learning model.
2. Description of the Original HDFCISVM Algorithm
Previously, many classical ISVM learning algorithms have been proposed, including Simple_ISVM [22], KKT_ISVM [23, 24], CSV_ISVM [14], GGKKT_ISVM [25], CD_ISVM [26], and other ISVM algorithms mentioned above, these algorithms provide the different selection methods of incremental learning training samples from different perspectives. However, the ability of the classifier to gradually accumulate the spatial distribution knowledge of samples is still not fully developed, so the accuracy and efficiency can be further improved. In order to further learn the distribution characteristics of samples, this new ISVM algorithm called “HDFCISVM” algorithm based on the highdimensional distance and forgetting characteristics is proposed in this paper. It can fully train the ability of the classifier to accumulate the knowledge of the spatial distribution of samples. The flow of this algorithm is shown in Figure 1.
2.1. The Distance from Every Sample to the Optimal Hyperplane in HighDimensional Euclidean Space
The training of the SVM classification hyperplane is only related to the support vectors, and the support vectors are the ones that fall on the normative hyperplane .In ndimensional Euclidean space, let the mapping function be . If the projection point of point to the optimal hyperplane , then it satisfies the following formula:where and SVs are the set of support vectors.
As can be seen from Figure 2, the vector is parallel to the normal vector of the hyperplane , which satisfies the following formula:where d is the distance from point to the hyperplane . In addition, the following formula holds:
Formula (5) is given by formulas (3) and (4):
Therefore, the distance from any point in ndimensional Euclidean space to the optimal hyperplane is obtained as follows:
2.2. The Distance between Every Sample and the Optimal Hyperplane Is Calculated under the Action of Kernel Function
For nonlinear separability problems, it is necessary to introduce the mapping function to map the samples to the highdimensional space, and then realize the linearly separability or approximately linearly separability of the samples. The literature [27] theoretically proves that under the action of kernel function, the higher the dimension of samples is, the higher the probability of linear separability is after they are mapped to a higher dimensional space and a better classification effect can be obtained. For this reason, this paper first calculates the distance between the sample and the hyperplane in the highdimensional space under the action of the kernel function.
Let the mapping function be and the kernel function be . At this time, and put it into formula (6) to obtain formulas as follows:
Substitute formulas (7) and (8) into formula (6) to obtain the distance between the sample and the hyperplane in the highdimensional space as the following formula:
2.3. Mapping and Normalization
HDFCISVM algorithm firstly trains an optimal classification hyperplane from the initial data, and then obtains the normative hyperplane where the support vectors are located. In each increment process, formula (9) is used to calculate the distance between the newly added positive samples, negative samples, and the corresponding normative hyperplane , respectively. The distance between the positive sample and the hyperplane is denoted as , and the distance between the negative sample and the hyperplane is denoted as , as shown in Figure 3.
Thus, it can be seen, this algorithm uses the distances between the samples in the highdimensional space and the hyperplane (see formula (9)) to describe the geometric distribution of the samples. In order to better reflect the distribution information of the samples, these distances are then mapped into the corresponding probability value.
Definition 1. Let represent the distance sets of the positive and negative samples from their respective normative hyperplanes that are the set and the set . Let be the maximum and minimum values of samples in sets , respectively, then the following definitions can be obtained:In essence, Definition 1 normalizes the distance from each sample to the corresponding normative hyperplane and maps the distance in Euclidean space to the interval (0,1) .
2.4. Forgetting Factor and Initialization
2.4.1. Forgetting Factor
Before introducing HDFCISVM algorithm proposed in this paper, we first give Definition 2.
Definition 2. see([15]). The definitions for the following types of samples are given as follows:①The SV samples that have never been selected for any round of training in the sample set are called the inclass samples, which usually account for a large proportion in the dataset②The samples that always appear in each round of SV sets are called the boundary samples③The samples which jitter appear in the SV set are defined as the quasiboundary samplesIt is not difficult to see from Definition 2 that different types of samples have different contributions to the final classifier due to their different geometric distribution characteristics. The boundary samples contain most of the information of classification. The quasiboundary samples are the supplement and correction of the information carried by boundary samples, while the classification information carried by the inner samples can be covered by the boundary samples and the quasiboundary samples. Therefore, only by focusing on boundary samples and quasiboundary samples can the classification effect of incremental learning be better improved.
Based on the above analysis, we can eliminate the inner samples according to certain rules to reduce the storage of historical samples, and then select some quasiboundary samples to accelerate the convergence speed of the SV set and improve the accuracy of incremental learning. In order to achieve the abovementioned purpose, in the process of incremental learning, HDFCISVM algorithm introduces the forgetting factor to select the dataset.
For the new round of incremental learning dataset , firstly the distances are mapped using formula (10), each distance is mapped to a set of p values, and all the values are ordered from small to large. This algorithm sets the value of 60% quantiles as , the value of 75% quantiles for , and 90% quantiles for . We set the following factors for every sample in the set according to its value:①If , then for the data point , its forgetting factor is assigned②If , then for the data point , its forgetting factor is assigned③If , then for the data point , its forgetting factor is assigned④If , then for the data point , its forgetting factor is assigned
2.4.2. Threshold Adjustment Rule
Determine whether each sample in the dataset is in the SV set which is obtained after the new round of SVM training. If it is in this set, the corresponding forgetting factor ; if the sample point is not in the set , the corresponding forgetting factor , and the forgetting factor corresponding to every sample of the set is updated according to this rule.
2.5. The Steps of HDFCISVM Algorithm
Based on the above analysis, a new ISVM algorithm—HDFCISVM algorithm is proposed in this paper. It tries its best to retain some samples that may become support vectors, discard the useless samples for classifier training, and improve the classification efficiency of the algorithm based on ensuring the accuracy of the algorithm. The specific process of HDFCISVM algorithm is described in Algorithm .

2.6. Experiment and Result Analysis
In this paper, simulated experiments are carried out for the four algorithms using different datasets with different sample numbers, different dimensions, and different distribution characteristics, and the experimental results are analyzed and evaluated.
2.6.1. Experimental Datasets of the Original HDFCISVM Algorithm
This experiment selects some datasets in the UCI database. The specific information of the experimental datasets is shown in Table 1.
2.6.2. The Experimental Results of the Original HDFCISVM Algorithm
This experiment uses the 4 incremental learning algorithms: simpleISVM algorithm, KKTISVM algorithm, CSVISVM algorithm, and HDFCISVM algorithm proposed in this paper to make the incremental learning trains and compare their learning effects for the 6 datasets in Table 1. In this experiment, the initial training dataset contains 500 samples, and 500 samples are added each time for incremental learning until all training samples are completed. The comprehensive performance of the classification effects of the above algorithms is evaluated through the following indicators.where TP,TN, FP, and FN, respectively, represent true positive cases(that is the number of positive samples predicted to be positive samples by the model), true negative cases (that is the number of negative samples predicted to be negative samples by the model), false positive cases (that is the number of negative samples predicted to be positive samples by the model), and false negative cases (that is the number of positive samples predicted to be negative samples by the model). ACC is the accuracy rate, which represents the proportion of the number of samples correctly predicted by the model to the total number of samples. TPR is sensitivity, which denotes the proportion of the samples correctly predicted to be positive samples in all positive samples, and TNR is specificity, which denotes the proportion of the samples correctly predicted to be negative samples in all negative samples. F1score takes into account both accuracy and recall of classification models. It can be regarded as a weighted average of model accuracy and recall.
Tables 2–7 respectively show the predicted results of the above four incremental learning algorithms for the 6 datasets listed in Table 1, where “Iteration count” refers to the incremental learning times, “ACC” refers the accuracy rate of the classifiers, and “Time” refers to the training time of this incremental learning. In addition, TPR and TNR values in Table 8 represent the sensitivity and specificity indexes of the classifiers. Finally, aiming at the training sets and the testing sets, we compare the classification accuracy after each incremental learning by taking their average values.
According to the abovementioned experimental results, we obtained the comparison graphs of the accuracy and cumulative time of the abovementioned four algorithms for all the datasets, as shown in Figures 4 and 5. The comparison graph of TPR and TNR values of all the algorithms for the 6 datasets is shown in Figure 6. Figure 7 shows the average classification accuracy for the training sets and testing sets according to Table 9. In Table 9, “Train_Acc” represents the average precision of training datasets, and “Test_Acc” represents the average precision of testing datasets.
As can be seen from Figure 4, during the incremental learning process, the classification accuracy of SimpleISVM and KKTISVM algorithms fluctuates greatly and the robustness of the classifiers is poor because the two algorithms ignore the set of quasiboundary vectors that may become SVs. However, the classification accuracy of CSVISVM algorithm and HDFCISVM algorithm fluctuates less during the incremental process, and shows an overall growth trend. The classification accuracy of HDFCISVM algorithm is slightly lower than other algorithms under the initial state, but it can continuously learn the spatial distribution knowledge of samples and adjust the training set through the forgetting factors during the incremental learning process. So that it can obtain slightly higher accuracy than SimpleISVM and KKTISVM algorithms at last.
From Figure 5, it can be seen that the difference of training time of the four algorithms is not big in the initial stages of training, but the total run time of the four algorithms has a huge difference after several incremental learning. The average running time of SimpleISVM algorithm and KKTISVM algorithm is 50% longer than CSVISVM algorithm and HDFCISVM algorithm. Comparatively, the average cumulative running time of HDFCISVM algorithm is less than that of the other algorithms, so this algorithm has a great advantage in training efficiency.
It can be intuitively seen from Figure 6, compared with other algorithms, HDFCISVM algorithm has little difference in sensitivity to positive and negative samples for all the datasets, and the maximum accuracy difference is only within 5%, which makes this algorithm obtain better classification accuracy for both positive and negative samples.
Figure 7 shows the average classification accuracy of the classifiers for the training sets and the testing sets after all incremental learning. HDFCISVM algorithm has certain advantages in the average classification accuracy for the training set and testing set. In addition, the difference between the average classification accuracy of HDFCISVM algorithm for the training set and the testing set is only 3.59%, which is lower than 5.61% of SimpleISVM algorithm and 3.89% of CSVISVM algorithm. Therefore, HDFCISVM algorithm has strong generalization performance.
2.7. Parameter Sensitivity Analysis of the Original HDFCISVM Algorithm
The influence of hyperparameters involved in HDFCISVM algorithm on experimental results is studied, and a series of hyperparameters involved in this algorithm are tested to fully explore the sensitivity of the algorithm to the introduced hyperparameters.
2.7.1. Sensitivity Analysis of Parameters
For HDFCISVM algorithm proposed above, the incremental learning dataset is mapped to a set of values using formula (10), and then three hyperparameters are introduced as the thresholds for initializing the forgetting factor. In order to explore the sensitivity of this algorithm to the initial threshold of the forgetting factor, four groups of different parameters are set for the hyperparameters , and tests are carried out on the abovementioned 6 datasets. The experimental results are shown in Table 10. The first column in Table 10 is the values of the hyperparameters used in the previous experiment.
It can be seen from Table 10 that the values of different groups have a greater impact on the experimental results. Appropriately increasing the values of the hyperparameters can improve the classification accuracy to a certain extent. However, if the values of the hyperparameters are too large, the classification accuracy will decrease. Meanwhile, it can be seen from Table 10 that different hyperparameters have different influences on the classification accuracy for different datasets. Experimental results show that setting different hyperparameters will make the accuracy fluctuate to a certain extent, so HDFCISVM algorithm is sensitive to parameters .
2.7.2. Sensitivity Analysis of Parameter
For HDFCISVM algorithm proposed above, the assignment strategy of forgetting factor has a great impact on the algorithm performance, and different assignment of will lead to different tendencies in selecting candidate support vectors. Therefore, this paper still adopts the abovementioned 6 datasets and selects different assignment strategies to explore the influence of on the experimental results. The specific results are shown in Table 11. Here, all the results in Table 11 are the accuracy when is set to 0.6, 0.75, and 0.9 and is set to different values. Table 11 lists the combinations of 4 different thresholds of parameter corresponding to the four different situations and the first column shows the values of used in the previous experiment.
It can be seen from Table 11, the different assignment strategies of will have great influences on the final results of this algorithm. If the value of is too small, the forgetting factor cannot play its due effect, resulting in some data samples being forgotten prematurely. When the value of forgetting factor increases, the testing accuracy will improve for some datasets, while it will decrease for others. At the same time, increasing the assignment of leads to more data to be learned incrementally, which increases the training time. Therefore, it can be seen that the algorithm is also sensitive to parameter .
2.7.3. The Conclusion about the Parameter Sensitivity
The experimental results show that the original HDFCISVM algorithm is sensitive to both and , and different parameters need to be adjusted for different datasets to achieve the best classification effect. When the values of are 0.6, 0.75, and 0.9, respectively, the performance of this algorithm is relatively stable for the abovementioned 6 datasets. When the parameters’ values are increased or decreased, the accuracy of the classifier will fluctuate for different datasets. Similarly, when the values of are 0.1, 0.15, and 0.3, the testing accuracy of the classifier for all the 6 datasets is high, while increasing the values of the parameters will lead to rapid decline of testing accuracy for some datasets.
2.8. The Improvement Strategy for the Original HDFCISVM Algorithm
From the abovementioned sensitivity experiments, it can be seen that different settings of parameters have a great impact on the final classification accuracy for the original HDFCISVM algorithm. Because of too many parameters, the algorithm cannot adapt to the datasets with different distributions. The datasets with different distributions often need different groups of hyperparameters to achieve the ideal classification results. So, it needs to adjust the initialization strategy and update a rule for forgetting factor to some extent. The following article will do this work and the algorithm after adjusting the forgetting factor initialization rule and updating strategy is called HDFCISVM. The new rules are as follows.
2.8.1. Initialization Process
For the new round of incremental learning dataset , we first use formula (10) to perform probability calculation to obtain the set of values of all samples, and then the forgetting factor is initialized by assigning to the samples in dataset by the following formula:where is calculated by formula (10), represents the minimum in the set of values for this round, represents the result taken up by one decimal place, and represents the regulating parameter.
All the samples in incremental dataset are marked with the corresponding forgetting factors by the abovementioned method, then the dataset of the previous round is combined, and the samples with forgetting factor are screened out. The samples constitute dataset , and a new round of SVM training is conducted for dataset .
2.8.2. Updating Rule for the Forgetting Factor
In order to make the forgetting factor selfadaptive update, reduce the setting of parameters and improve the generalization performance of the model, this paper proposes a new forgetting factor update strategy, that is, before a new round of incremental training, the forgetting factor is updated for the original data as follows:where indicates whether the set contains . If it does, return 1, otherwise, return 0. represents the member of set .
The interpretation of formula (16) is as follows. When is the support vector after the last round of training, then function returns 1 and returns 0, and acts as a weight to adjust the increment of the forgetting factor. When the forgetting factor is large, the increment will decrease in each round to ensure the sensitivity of the forgetting factor to the candidate support vectors. On the contrary, if is not the support vector after the last round of training, then function returns 0. At this point, the forgetting factor is reduced by function because the inner function discriminates the distance between and support vector by comparing the cosine similarity between them. We think that the closer is to the support vector, the more likely it is to be a support vector. Therefore, the distance mapping between and the nearest support vector is obtained through the calculation of function . When updating , this algorithm adjusts the attenuation size by threshold . The closer it is to the current support vector, the smaller attenuation of is, and the further it is, the greater attenuation of is. In this way, the forgetting factor is initialized and updated, the number of parameters is reduced, the algorithm can adjust the updating rules adaptively by data distribution for different datasets, and the generalization performance of this algorithm is improved.
2.9. Analysis of Experiments and Results of HDFCISVM Algorithm
2.9.1. Experimental Datasets for the Improved Algorithm
In order to better test the algorithm performance after adjustment, the experiment added 6 datasets in the UCI library on the basis of the original 6 datasets, namely 12 experimental datasets, and the latest dataset information is shown in Table 12.
2.9.2. The Experimental Results
Based on the abovementioned experiments, this round of experiment compares the training results of SimpleISVM [22], KKTISVM [23, 24], CSVISVM [14], GGKKTISVM [25], CDISVM [26], HDFCISVM, and HDFCISVM (HDFCISVM algorithm is the improved algorithm based on the original HDFCISVM algorithm) for the abovementioned 12 datasets in Table 12. In this experiment, for all the algorithms mentioned above, the initial training datasets contain 500 samples. Each time 500 samples are added for incremental learning until all training samples are trained, and the value of is 0.3. The ACC index and F_{1}score index are introduced simultaneously to evaluate the performance of the classifier (see formulas (11) and (14)). The specific experimental results are as follows.
It can be seen from Table 13 and Figure 8 that the F_{1}score values of HDFCISVM algorithm before and after the improvement are significantly improved for the abovementioned 12 datasets. The mean value of F_{1}score of HDFCISVM for all datasets is 0.936, 1.3 percent points higher than HDFCISVM algorithm before the improvement and 2.6 percent points higher than KKTISVM algorithm. The F_{1}score mean value of HDFCISVM algorithm for different datasets is higher than that of other algorithms, which proves that the improved algorithm has advantages in accuracy and recall rate compared with other algorithms. At the same time, it can be obtained from Tables 14 and 15 that the average training accuracy of the HDFCISVM algorithm is 92.29% for all datasets, and the average testing accuracy 90.80% for all datasets. This algorithm is not only obviously better than other algorithms but also has better effect than the original HDFCISVM algorithm. It can be seen from Figure 9, the testing accuracy of the improved HDFCISVM algorithm on almost all datasets is no lower than HDFCISVM algorithm, especially for the “mushroom” dataset, the testing accuracy of the improved algorithm is improved by 8.61%, and the testing accuracy for the “breast_cancer” dataset is improved by 5.46% compared to HDFCISVM algorithm. The experimental results show that by adjusting the initialization and update strategies of the forgetting factor, the new algorithm can better adjust the data of each training round and adjust the update strategy of the forgetting factor adaptively, so as to train the classifier with a better effect.
(a)
(b)
2.9.3. Sensitivity Analysis of Parameter
In order to further explore the influence of parameter on experimental results, the first 6 datasets in the abovementioned experiments are taken to test the accuracy of HDFCISVM algorithm. The experimental results represent the accuracy (%) of the algorithm for different testing sets, with values of are 0.1, 0.2, 0.3, and 0.4, respectively. The experimental results are shown in Table 16.
It can be seen from the results in Table 16, different values of have a certain degree of influence on the experimental results. When the value of increases from 0.1 to 0.4, the classification accuracy of each dataset also fluctuates. In general, when the value of is 0.3, the algorithm performance is optimal. Further increasing the value of will not increase the classification accuracy of the algorithm, but will affect the algorithm’s perception to the overall distribution of the datasets and reduce the classification accuracy of the algorithm because too many samples are deleted.
2.9.4. Application of HDFCISVM in Image Detection
In order to explore the actual effect of HDFCISVM algorithm in image classification, this paper adopts “catsvsdogs” dataset provided by Kaggle as a training dataset. 5000 images are selected for classification to explore the effect of the proposed algorithm in image classification.
In this experimental training set, 2500 pictures of cats and 2500 pictures of dogs are selected for training. In this experiment, 20% of the training pictures are extracted by the method of 5 fold cross validation, AlexNet convolutional neural network [28] is used to extract image features, and 4096 dimensional features are finally extracted as input data. It marks the cat as −1 and the dog as +1. The dataset is divided into 25 incremental learning units, each batch has 200 image data, and the experiment is carried out using HDFCISVM algorithm to obtain data such as running time and test accuracy. The specific experimental results are as follows.
Figure 10 shows the incremental learning training precision of HDFCISVM algorithm for the dataset “catsvsdogs.” The first row in Figure 11 shows the partially correctly classified images, and the second row shows the partially incorrectly classified images. It can be seen from the experimental results, HDFCISVM algorithm has achieved a good classification effect. In this experiment, the convolutional neural network algorithm—AlexNet algorithm is compared with HDFCISVM algorithm proposed in this paper. The comparison results are shown in Table 17. It can be seen from Table 17, HDFCISVM algorithm has higher classification accuracy and better classification efficiency for image dataset “catsvsdogs.”
3. Conclusion
In this paper, an improved incremental learning algorithm, HDFCISVM, is proposed, which achieves a good classification effect. On this basis, aiming at the sensitivity of parameters, the initialization strategy and update rule of the forgetting factor are adjusted to some extent, and an improved algorithm, HDFCISVM algorithm, is proposed at last.
The algorithm has the following innovations:(1)It uses the distance formula in the highdimensional space to better express the spatial distribution law of samples;(2)Forgetting factor screening method is proposed and relevant screening strategies are formulated to retain as much as possible part of the datasets that may become support vectors to improve the classification accuracy. On this basis, the initialization strategy and update rule of the forgetting factor are further adjusted. The experimental results show that HDFCISVM algorithm has a good classification effect on most datasets and has the same sensitivity to positive and negative samples. The experiments verify that HDFCISVM algorithm has higher average classification accuracy for the training sets and testing sets compared with other algorithms.
Finally, the experimental results show that HDFCISVM algorithm has better generalization performance and classification effects than other ISVM algorithms and can be correctly applied to image classification. In the image detection classification experiment, HDFCISVM is compared with the relatively new convolutional neural network algorithm—AlexNet algorithm for image dataset “catsvsdogs.” The results proved that HDFCISVM algorithm has higher classification accuracy and classification effect than AlexNet algorithm.
HDFCISVM incremental learning algorithm proposed in this paper has good classification accuracy and classification effect. However, it can be seen from Tables 14 and 15 that for datasets—“waveform,” “spambase,” and “credit,” the classification accuracy of HDFCISVM algorithm is better than other algorithms, but the overall classification accuracy is not very high. These datasets have features such as uneven positive and negative sample sizes to varying degrees and highdimensional data, which may be the reason why most ISVM algorithms have low classification accuracy for these imbalanced datasets, especially for highly imbalanced datasets because in each round of incremental learning, the distribution of the training samples is very different from the distribution of the overall samples due to the extreme imbalance of these datasets. So, the accuracy of classifier trained by incremental learning algorithm is reduced. Therefore, for the incremental learning of the imbalanced datasets, especially those with large differences in the number of positive and negative samples, further research work can be carried out in the future. For example, we can consider assigning different weights to the forgetting factors of training samples in each round of incremental learning, especially considering the huge difference in the number of positive and negative samples. The optimization of appropriate forgetting factor updating strategy will make the training samples in each round of incremental learning have the same distribution characteristics as the samples in the original total dataset as far as possible, so as to improve the incremental learning effect of such imbalanced datasets. In addition, future research work can continue to explore the application of ISVM algorithm in image classification, outlier detection, and other fields.
Data Availability
The datasets used to support the findings of this study have been deposited in the UCI dataset, and they are available openly (the URL of the UCI dataset is https://archive.ics.uci.edu/ml/index.php). In addition, dataset “catsvsdogs” was provided by Kaggle (the URL of dataset “catsvsdogs” is https://www.kaggle.com/datasets).
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by Natural Science Foundation of Shaanxi Province (Project No. 2022JM409) and the Key R&D Program in Shaanxi Province (Project No. 2021GY084).