Data-Driven Fault Supervisory Control Theory and ApplicationsView this Special Issue
An Adaptive Fuzzy Min-Max Neural Network Classifier Based on Principle Component Analysis and Adaptive Genetic Algorithm
A novel adaptive fuzzy min-max neural network classifier called AFMN is proposed in this paper. Combined with principle component analysis and adaptive genetic algorithm, this integrated system can serve as a supervised and real-time classification technique. Considering the loophole in the expansion-contraction process of FMNN and GFMN and the overcomplex network architecture of FMCN, AFMN maintains the simple architecture of FMNN for fast learning and testing while rewriting the membership function, the expansion and contraction rules for hyperbox generation to solve the confusion problems in the hyperbox overlap region. Meanwhile, principle component analysis is adopted to finish dataset dimensionality reduction for increasing learning efficiency. After training, the confidence coefficient of each hyperbox is calculated based on the distribution of samples. During classifying procedure, utilizing adaptive genetic algorithm to complete parameter optimization for AFMN can also fasten the entire procedure than traversal method. For conditions where training samples are insufficient, data core weight updating is indispensible to enhance the robustness of classifier and the modified membership function can adjust itself according to the input varieties. The paper demonstrates the performance of AFMN through substantial examples in terms of classification accuracy and operating speed by comparing it with FMNN, GFMN, and FMCN.
The merge of fuzzy set theory [1–6] stimulates its development on pattern recognition and classification. The capacity for fuzzy logic to divide the complex class boundaries has generated a lot of achievements in neuro-fuzzy pattern recognition systems [7–23]. The fuzzy min-max neural network (FMNN) which is proposed in  puts a solid foundation for further research in this field. The FMNN utilizes hyperbox fuzzy sets to represent a region of the n-dimensional pattern space; input samples which fall in a hyperbox have full memberships. An n-dimensional hyperbox can be defined by stating its min and max vertices. This algorithm is to find suitable hyperboxes for each input patterns with a three-step process: expansion, overlap, and contraction. But the contraction of hyperboxes of different classes may lead to classification error which is demonstrated in , and its performance highly depends on the initialization of the sequence of the training data and the expansion coefficient which controls the size of hyperbox.
The proposed GFMN  is also an online classifier based on hyperbox fuzzy set concept. Its improvement lies in proposing a new membership function which monotonically decreases with a growing distance from a cluster prototype, thus eliminating the likely confusion between cases of equally likely and unknown inputs . But the contraction process problem remains. This situation is the same with the proposal of a modified fuzzy min-max neural network with a genetic-algorithm-based rule extractor for pattern classification even though it is creative to use genetic algorithm to minimize the numbers of features of input dataset . In FMCN , a new learning algorithm called fuzzy min-max neural network classifier with compensatory neuron architecture (FMCN) architecture has been reported. This method introduces compensatory neurons to handle the confusion in overlap regions and disposal of the contraction process. However, the FMCN does not allow hyperboxes of different class to be overlapped which results in the increasing number of neurons in the middle layer of network, thus consuming much more time during training and testing. And the algorithm distinguishes the simple overlap and containment. In fact, even though FMCN performs better than FMNN and GFMN in most cases, its structural complexity increases and consumes more time during training and testing. Meanwhile, it omits a kind of overlap  which results in classification error. Another improved network based on data core is called data-core-based fuzzy min-max neural network (DCFMN). DCFMN  can adjust the membership function according to samples distribution in a hyperbox to get a higher classification accuracy, and its structural is simpler than FMCN. However, all these four networks cannot perform well with relatively insufficient training samples. A weighted fuzzy min-max neural network (WFMM) is proposed in . The membership function of WFMM is designed to take the frequency of input patterns into consideration.
The proposed AFMN owns its advantages in several aspects. First, the proposed AFMN maintains the simple architecture of FMNN and adds preprocessing for input patterns, and its technique is principle component analysis (PCA) . This kind of data dimensionality reduction technique can reduce the number of features of input patterns and extract the useful information. It is known that without preprocessing of training dataset, it is hard to practically implement the classifier due to the high dimensionality, the redundancy, and even noise inherent in input patterns.
Second, considering that there are nodes of more than one class in a hyperbox, it is not reasonable to allocate full membership for any input pattern that falls in the hyperbox. So the confidence coefficient for each hyperbox is introduced for resolving this confusion to achieve a higher classification accuracy.
Third, membership function is modified according to the inspiration of data core from DCFMN. The concept of data core which can update itself during testing aids to adjust the membership based on the training samples distribution. And loopholes that existed in FMNN, GFMN, and FMCN overlap test cases are found out and resolved by rewriting the rules. Meanwhile, adaptive genetic algorithm (AGA) [31–33] is utilized in classifying algorithm for parameters optimization instead of traversal method to improve the speed and accuracy of the entire neural network classifier.
Finally, the proposal of this new classifier is not only a original attempt of theory, but also an important initial step for its application on the running pipeline for working condition recognition which is a typical nonlinear control system [34–39].
The rest of the paper is organized as follows. Section 2 analyzes the traditional fuzzy neural network classifier. Section 3 introduces the AFMN classifier system in detail. Section 4 provides abundant examples to demonstrate the performance of AFMN. Section 5 concludes with summary.
2. Analysis of Precedent Fuzzy Min-Max Neural Network Classifier
FMNN learning algorithm consists of three procedures: (1) expansion, (2) overlap test, and (3) contraction. Its rule is to find a suitable hyperbox for each input pattern. If the appropriate hyperbox exists (even after expansion), its size cannot exceed the minimum and maximum limits. After expansion, all hyperboxes that belong to different classes have to be checked by overlap test to determine if any overlap exists. So a dimension by dimension comparison between hyperboxes of different class is performed. FMNN designs four test cases, at least one of the four cases is satisfied, then overlap exists between the two hyperboxes. Otherwise, a new hyperbox needs to be added to the network. If no overlaps occur, the hyperboxes are isolated and no contraction is required. Otherwise, a contraction process is needed to eliminate the confusion in overlapped areas.
GFMN focuses on the disadvantages of the membership function proposed in FMNN and proposes an improved membership function that the membership value can decrease steadily when input patterns get far away from the hyperbox.
FMCN distinguishes the simple overlap and containment and introduces overlapped compensation neurons (OCNs) and containment compensation neurons (CCNs) to solve the confusion in the overlap region.
However, there exists two cases in the overlap area that FMNN, GFMN, and FMCN cannot operate properly on the hyperbox adjustment. Figure 1 depicts the two hyperboxes overlap cases. The positions of minimum and maximum points are described below:
When input data that satisfies this condition is trained according to the overlap test rules designed in FMNN, GFMN, and FMCN, overlap cannot be checked because they do not satisfy any one of the four cases in overlap test. However, obviously the two hyperboxes are partly overlapped in Figure 1(a), and the other two hyperboxes are fully overlapped in Figure 1(b). This case shows that the loophole exists in the overlap test case of the three algorithms. Especially in the case depicted in Figure 1(b), the network cannot cancel one of the two identical hyperboxes, which means creating the same hyperbox twice. Meanwhile, the number of nodes will increase if overlap occurs between two hyperboxes of the same class and increase the computation complexity. Figure 2 emphasizes this situation again, there should be four hyperboxes after training, but the overlap test regards the number of hyperboxes as five. Just as precedent discussion shows, the cases in the overlap are not complete and need revising.
Another disadvantage of the traditional classifier and the same with DCFMN is that they do not take verifying the efficiency of a hyperbox into consideration.
The idea of testing the efficiency of a hyperbox is inspired by the situation that in a hyperbox there are input patterns of more than one class. For the convenience of explanation here we name input patterns of the certain class that its hyperbox belongs to as primary patterns (PPs) and those of any other class as subordinate patterns (SPs). Figure 3 shows the hyperboxes generated according to the learning algorithm of FMNN and DCFMN. Among them, hyperboxes 1–3 belong to class 1 and hyperbox 4 belongs to class 2. We can notice that in hyperbox1 of class 1, there are more SPs than PPs which shows that the creation of hyperbox is not appropriate and may insert a negative impact in classification.
(a) Hyperboxes generated by FMNN
(b) Hyperboxes generated by DCFMN
Meanwhile, in other traditional fuzzy min-max neural network classifiers, input data is not preprocessed before training. The redundancy and noise of data can undermine the performance of classification and consume more time during training and testing. In AFMN, the problem is solved by using principle component analysis (PCA) to reduce the dimensionality of input data and adopting genetic algorithm to fast select the optimal parameters combination during test procedure instead of traversal method.
3. AFMN Architecture
3.1. Basic Definitions
3.1.1. Confidence Coefficient
The hyperboxes generated during training are in different sizes and the input patterns included a hyperbox may belong to different classes which means the hyperbox cannot guarantee that an input pattern that falls within it fully belongs to its class. Figure 4 shows a hyperbox creation result in which there are input patterns of three classes A, B, and C. Obviously it is not rational to regard the membership of all input patterns that fall in the hyperbox B as 1 because there are PPs and OPs at the same time in the same hyperbox. This problem can be removed by accounting for the proportion of PPs patterns to total patterns in the same hyperbox. By calculating the proportion of the PPs of total patterns in the same box, the confidence coefficient of each hyperbox can be gotten. Let , be the confidence coefficient of kth hyperbox.
Two possibilities have to be considered when designing .(1)Just like the discussion before, we name input patterns of the certain class that its hyperbox belongs to as primary patterns (PPs) and those of any other class as subordinate patterns (SPs), we should should consider the portion of PPs to total patterns and that of PP and SP patterns. (2)If the amount of training data of different classes is different, for eliminating the training error caused by this imbalance inherent in the samples, normalizing initially the patterns by introducing parameter is necessary which means distributing a weight value for each class.
And the resolution consists of two steps.
Step 1. Compute Weight Value for Each Class.
For there are classes, the number of input patterns for each class is relatively ; the function that decides weight value for each class is given by
Step 2. Compute Confidence Coefficient for Each Hyperbox.
For the th hyperbox and , the corresponding is given by. where represents the number of input patterns of class in hyperbox , ; is the number of hyperboxes. And the value of is decided by where ranges from 0.1 to 1.
3.1.2. AFMN Architecture Overview
The architecture of AFMN is shown in Figure 5. The connections between input and middle layer are stored in matrices and . The connections between middle and the output layer are binary valued and stored in . The equation for assigning the values from to the output layer node is as follows:
3.2. Fuzzy Hyperbox Membership Function
The membership function for an input is given by
where ; is the geometrical core that is known as data core. It is given by where is the number of patterns belonging to its hyperbox’s class. is the patterns belonging to its hyperbox’s class.
is given by where indicates the number of PPs in the hyperbox . is a two-parameter ramp threshold function as follows
3.3. Learning Algorithm
3.3.1. Data Preprocessing by Principle Component Analysis (PCA)
Principle analysis is chosen as a data dimensionality reduction technique that removes redundant features from the input data. The input data after dimensionality reduction can accelerate the training and testing procedure meanwhile improving the network performance because PCA picks up primary features from original dataset to avoid affecting by the redundancy and noise within it. In this paper, the number of features we choose depends on the how many dimensions can include 80% of the total information.
3.3.2. Hyperbox Expansion
This procedure decides the number and min-max points of hyperboxes, its rule is as follows.
If the following criterion is satisfied where controls the size of a hyperbox .
If the expansion criterion has been met, the minimum and maximum points of the hyper box are adjusted using the following equation Otherwise, create a new hyperbox and its min and max points are adjusted as below: Repeat the procedure until all the input patterns finish training.
3.4. Hyperbox Overlap Test
As previously stated, new cases have to resolve the problem existed in FMNN; so first for testing if two hyperboxes are fully overlapped, we design the case as bellow: If the case can be satisfied, that means two hyperboxes of the same class fully overlap, then one of them will be removed from the network.
Here assuming , initially, for hyperbox and hyperbox , the four overlap cases and the corresponding overlap value for the th dimension are given as follows.
Case 1 (). One has
Case 2 (). One has
Case 3 (). One has
Case 4 (). One has If , then . If any dimension cannot satisfy any of the four cases, then . Otherwise if , then there is overlap between hyperbox and hyperbox .
3.5. Hyperbox Contraction
If overlap exists between hyperboxes of different classes, the network will allocate 1 for the overlap region, thus generating the classification confusion. And only one of the dimension needs to be adjusted to keep the hyperbox as large as possible. For , then th dimension is that we should select. The adjustment should be made as follows.
Case 1 (). One has
Case 2 (). One has
Case 3 (). One has
Case 4 (). One has Through all the precedent procedures, parameters and are determined. The entire learning procedure can be summarized in Figure 6.
3.6. Classifying Algorithm
3.6.1. Genetic Algorithm in Network Classifying Procedure
GA is bestowed the task of finding the best parameter combination instead of the traversal method. Compared with the traditional traversal method to search for appropriate parameters for the network to achieve its best performance, genetic algorithm has two advantages.(a)For traversal method, choosing an appropriate step is an obstacle. Setting too small step size can achieve a better classification performance at the cost of more time consuming. Otherwise, testing procedure will be fast at the cost of a relatively low accuracy.(b)For high classification accuracy that means setting the step short. Genetic algorithm completes this task faster than traversal.
The GA fitness function used is defined as The genetic operation implemented consists of the following six steps.
Step 1 (initialization). Set the range for each parameter and initialize the population string in each generation. Here ranges from 0 to 1, ranges from 0.1 to 1, and ranges from 1 to 10.
Step 2 (selection). Select the certain numbers of pairs of strings from the current population according to the rule known as roulette wheel selection.
Step 3 (crossover). For each selected pair, choose the bit position for crossover. The rule is specified as bellow: where indicates the lager fitness value in the pair, is the maximum fitness value, and is the average fitness value of the current population.
Step 4 (mutation). For each bit value of the strings, apply the following mutation operation according to the possibility defined as below: where is the fitness value of the mutation individual.
Step 5 (elitist strategy). Select a string with maximum fitness and pass it to the next generation directly.
Step 6 (termination test). Here we use the number of generations as a condition for genetic algorithm termination.
3.6.2. The Entire Classifier System
The learning and classification algorithm can be summarized in the flowchart in Figure 7.
4.1. Examples to Demonstrate the Effectiveness of Overlap Test and Contraction Cases
Just as the previous discussion about the cases represented in Figures 1(a) and 1(b). When overlap occurs in such case, the overlap and contraction algorithm of FMNN, GFMN, and FMCN will create misclassification error. This problem is solved by the revised overlap test cases. The hyperbox generation result is shown in Figures 8(a) and 8(b).
4.2. The Working of PCA and Genetic Algorithm
4.2.1. Principle Component Analysis
To understand the effect of PCA in improving classification efficiency by implementing data dimensionality reduction, we use AFMN to classify five groups of complete GLASS dataset , and one is preprocessed by PCA to get a simplified input pattern and guarantee the remaining formation is not less than 80%. The number of classification error is showen in Table 1. The training dataset ranges from 20% to 100%. The training dataset is selected randomly each time, and the entire glass dataset is used for testing. The experiment is conducted 100 times. The result is represented in Table 1.
From the results in the table, it is demonstrated that principle analysis can complete the task of dimensionality reduction, and it is important to notice that adding PCA is not bound to increase the classification accuracy which is verified in 40% and 100% training set. But thanks to its ability of reducing the dimensionality of the raw dataset, the consuming time has been shorten rapidly.
4.2.2. Genetic Algorithm for Parameter Optimization
The task of genetic algorithm is to find the appropriate combination of three parameters for best classification performance faster. And the result with genetic algorithm should be no worse than without it. Here Iris dataset is chosen for demonstration, 10% of the given dataset is for training and the rest for testing. The experiment is repeated 100 times to get the minimum misclassification numbers and the average consuming time. The result is shown in Table 2.
Table 2 demonstrates that GA can find better combination of parameters and its speed is faster. Its ability is important for application in real world.
4.3. Performance on the General Dataset
4.3.1. Various Dataset for Training and Testing with Complete Dataset
Here for the given iris dataset, 35%, 45%, 50%, 60%, and 70% of the dataset were selected randomly for training purpose and the complete dataset for testing. The performance of learning and testing is shown in Table 3. It is obvious that AFMN has a better performance with fewer misclassifications.
4.3.2. Different Dataset for Training and Testing
In this section, datasets such as wine, thyroid, and ionosphere are selected for comparing the abilities of several network classifier (Table 4). 50% of each dataset is randomly selected for training and the entire dataset for testing. We conduct 100 times experiment for each dataset. Results show, in terms of classification accuracy, that the FMCN and AFMN have the very approximate performance, but from the consuming time and stability of these two classifiers, obviously AFMN is better than FMCN which demonstrates its advantage (Table 5).
4.4. Test the Robustness of AFMN with Noise-Contaminated Dataset
The robustness of a network classifier is important especially in application. 50% of Iris data was randomly chosen for training, and the entire Iris dataset was used for testing. For the purpose of checking robustness of AFMN, FNCN, FMNN, and GFMN, We added the random noise to the Iris data set. The amplitude of noise added in the Iris data set is 1%, 5%, and 10%. The expansion coefficient varies from 0.01 to 0.4 in step of 0.02. One hundred experiments were performed for getting accuracy result. The result is shown in Table 6, when the amplitude of noise is 1%, the maximum and minimum misclassification of four methods is the same with the numbers of experiment with precise data. It proves that all the methods have robustness. But as the amplitude of noise increases, the performance of four methods becomes worse. Although the performance becomes worse, from Table 6, the average misclassification in AFMN increases more slowly than others, and the AFMN has better robustness.
4.5. Fixed Training Dataset Size (60% of Iris Dataset)
In this simulation, the effect of expansion coefficient is studied on the performance of AFMN, FMCN, GFMN, and FMNN. 60% of iris data is chosen for training and the entire iris data for testing. The expansion coefficient varies form 1.0 to 1 in step of 0.1. The results of training and testing are shown in Figures 9 and 10, respectively.
From the result we can conclude, that FMCN is vulnerable to the fluctuation of expansion coefficient, and GFMN and FMNN have a relatively higher classification error. Compared with them, AFMN performs better.
4.6. Test on Synthetic Image
The dataset consists of 950 samples belonging to two nested classes which make the classification more difficult. Figure 11 shows the synthetic image.
Figure 12 shows the performance of AFMN, FMCN, GFMN, and FMNN on this specified data set. 60% of dataset is randomly selected for training. Expansion coefficient varies from 0 to 0.2 in the step of 0.02. Obviously, AFMN works better than any other algorithm both in training and testing.
4.7. Comparison with Other Traditional Classifier
In this section we can compare the performance of AFMN, FMCN, GFMN, and FMNN on the iris dataset as Table 7 shows. The results show AFMN has no misclassification.
4.8. Comparison with Nodes Number (Hyperbox Number)
The complexity of the created network after training affects the speed and efficiency of classification. 100% iris dataset is selected for training to see how many nodes created in the middle layer after training by each classifier. The results are shown in Figure 13.
As the expansion coefficient increases, the number of nodes decreases. AFMN, GFMN, and FMNN can generate a relatively simple structure network. In contrast, the architecture of FMCN is much more complex.
This paper proposes a complete classification system based on a new neural algorithm called AFMN, principle analysis algorithm, and genetic algorithm. The development of this classifier derives from the modification and completion of the fuzzy min-max neural network proposed by Simpson. Unlike the following neural algorithm for clustering and classification such as GFMN and FMCN, our classifier system is more complete and practical. The advantage of AFMN can be summarized as follows.(1)AFMN adds preprocessing for input patterns and its technique is principle component analysis (PCA). This kind of data dimensionality reduction technique can reduce the number of features of input patterns and extract the useful information. This means saving the training and testing consuming time, meanwhile making the algorithm more suitable for application on real data for pattern classification.(2)The introduction of confidence coefficient is overlooked by precedent neural algorithm for clustering and classification. Considering that there are nodes of more than one class in a hyperbox, the confidence of hyperboxes must be different, thus the operation that allocate 1 for any input pattern that falls in the hyperbox is not reasonable. So in the AFMN we calculate the confidence coefficient of each hyperbox for more precise classification.(3)Adaptive genetic algorithm (AGA) is utilized in testing for parameters optimization while disposing of the step-setting obstacle in traversal method for parameters optimzation. GA can find the proper parameters combination more precisely and faster.(4)Modification to the membership function ensures the self-adjustment according to the samples distribution and maintains the data core concept proposed in DCFMN. The data core can update itself online during classifying procedure, which is an indispensable ability to improve the classifier performance when training samples are insufficient.(5)AFMN solves the problem existing in the overlap test of FMNN, GFMN, and FMCN; thus it can generate hyperboxes properly and remove redundant ones. By rewriting the contraction rules, AFMN maintains the simple architecture of FMNN, and abundant simulations demonstrate its high recognition rate.
In conclusion, integrated with principle component analysis for dimensionality reduction and genetic algorithm for parameters optimization, AFMN is a fast fuzzy min-max neural network classifier with high recognition rate and robustness. The use of AFMN network will be explored out of the laboratory.
This work was supported by the National Natural Science Foundation of China (Grants nos. 61104021, 61034005, and 61203086), the National High Technology Research and Development Program of China (2012AA040104), and the Fundamental Research Funds for the Central Universities of China (N100304007).
J. C. Bezdek and S. K. Pal, Fuzzy Models for Pattern Recognition, IEEE Press, Piscataway, NJ, USA, 1992.View at: Zentralblatt MATH
J. C. Bezdek and S. K. Pal, Fuzzy Models for Pattern Recognition, IEEE Press, Piscataway, NJ, USA, 1992.
G. Carpenter, S. Grossberg, and D. B. Rosen, “Fuzzy ART: an adaptive resonance algorithm for rapid, stable classification of analog patterns,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '91), vol. 2, pp. 411–416, Seattle, Wash, USA, 1991.View at: Google Scholar
T. Hasegawa, S. Horikawa, and T. Furuhashi, “A study on fuzzy modeling of BOF using a fuzzy neural network,” in Proceedings of the 2nd International Conference on Fuzzy Systems, Neural Networks and Genetic Algorithms (IIZUKA), pp. 1061–1064, 1992.View at: Google Scholar
P. G. Campos, E. M. J. Oliveira, T. B. Ludermir, and A. F. R. Araújo, “MLP networks for classification and prediction with rule extraction mechanism,” in Proceedings of the IEEE International Joint Conference on Neural Networks, vol. 2, pp. 1387–1392, July 2004.View at: Google Scholar
A. Rizzi, M. Panella, F. M. F. Mascioli, and G. Martinelli, “A recursive algorithm for fuzzy Min-Max networks,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '00), vol. 6, pp. 541–546, July 2000.View at: Google Scholar
R. Tagliaferri, A. Eleuteri, M. Menegatti, and F. Barone, “Fuzzy min-max neural networks: from classification to regression,” Soft Computing, vol. 5, no. 16, pp. 69–76, 2001.View at: Google Scholar
A. Rizzi, M. Panella, F. M. F. Mascioli, and G. Martinelli, “A recursive algorithm for fuzzy Min-Max networks,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN'00), pp. 541–546, July 2000.View at: Google Scholar
H. J. Kim and H. S. Yang, “A weighted fuzzy min-max neural network and its application to feature analysis,” in Proceedings of the 1st International Conference on Natural Computation (ICNC '05), Lecture Notes on Computer Science, pp. 1178–1181, August 2005.View at: Google Scholar
M. Kallas, C. Francis, L. Kanaan, D. Merheb, P. Honeine, and H. Amoud, “Multi-class SVM classification combined with kernel PCA feature extraction of ECG signals,” in Proceedings of the 19th International Conference on Telecommunications (ICT '12), pp. 1–5, April 2012.View at: Publisher Site | Google Scholar
J. D. Schaffer and A. Morishma, “An adaptive crossover mechanism for genetic algorithms,” in Proceedings of the 2nd International Conference on Genetic Algorithms, p. 3640, 1987.View at: Google Scholar
N. P. Jawarkar, R. S. Holambe, and T. K. Basu, “Use of fuzzy min-max neural network for speaker identification,” in Proceedings of the International Conference on Recent Trends in Information Technology (ICRTIT '11), pp. 178–182, June 2011.View at: Google Scholar