Research Article  Open Access
Massimo Panella, "A Hierarchical Procedure for the Synthesis of ANFIS Networks", Advances in Fuzzy Systems, vol. 2012, Article ID 491237, 12 pages, 2012. https://doi.org/10.1155/2012/491237
A Hierarchical Procedure for the Synthesis of ANFIS Networks
Abstract
Adaptive neurofuzzy inference systems (ANFIS) represent an efficient technique for the solution of function approximation problems. When numerical samples are available in this regard, the synthesis of ANFIS networks can be carried out exploiting clustering algorithms. Starting from a hyperplane clustering synthesis in the joint inputoutput space, a computationally efficient optimization of ANFIS networks is proposed in this paper. It is based on a hierarchical constructive procedure, by which the number of rules is progressively increased and the optimal one is automatically determined on the basis of learning theory in order to maximize the generalization capability of the resulting ANFIS network. Extensive computer simulations prove the validity of the proposed algorithm and show a favorable comparison with other wellestablished techniques.
1. Introduction
In this paper, a new procedure for the structural optimization of adaptive neurofuzzy inference systems (ANFIS) is proposed, whose synthesis is based on the hyperplane clustering in the joint inputoutput space of the data set. ANFIS networks are being widely used to many applicative tasks, such as rulebased process controls, pattern recognition, and function approximation. They solve a general regression problem by means of a set of rules of Sugeno firstorder type [1]. The th rule, , has the following form: where is a column vector (or pattern) in the dimensional input space and is the scalar output associated with the rule. The latter is characterized by the MFs of the fuzzy input variables , , and by the coefficients , , of the crisp output.
Several alternatives are possible for the fuzzification of crisp inputs, the composition of input MFs, and the way rule outputs are combined [2]. Usually, the structure of the fuzzy inference system is the following one: where is the overall fuzzy input variable, is the corresponding MF, and the output estimated for an input .
The numerical parameters of rules are obtained through a learning process, by using a training set of inputoutput pairs , . The crucial problem during the ANFIS learning is to obtain a good generalization capability. This issue has been deeply investigated in the literature; usually, the generalization capability of ANFIS networks is optimized by checking data set for overfitting model validation [3].
The generalization capability is maximized only if the ANFIS network consists of a suitable number of rules. However, the determination of the optimal number is a very critical problem to be solved, since the neural network might be easily overfitted in the case of noisy or illconditioned data. In this paper, a new hierarchical constructive procedure for the automatic determination of the ANFIS rules is proposed. It aims to a regularization of the network architecture based on learning theory and hyperplane clusteringbased techniques [4–7]. The underlying idea of such techniques received in the literature many acknowledgments as, for example, in the case of theoretical models [8–14] as well as applications to specific fields [15–26].
The paper is organized as follows. The synthesis technique for determining the parameters of ANFIS rules is described in Section 2. Successively, the structural optimization for the automatic determination of the rules number, which is based on the proposed hierarchical constructive procedure, is illustrated in detail in Section 3. Finally, the numerical results obtained by extensive computer simulations on synthetic benchmarks and a realworld application are summarized in Section 4.
2. Hyperplane Clustering for the Synthesis of ANFIS Rules
When dealing with numerical data, the rules of ANFIS networks are commonly synthesized by using clustering techniques, which reduce the redundancy of data so that significant rules are determined directly from the clusters modeling the training set. Clustering algorithms can use joint inputoutput data, input data only, or output data only; this choice heavily affects the way ANFIS rules are built [27–34].
When modeling ANFIS networks, the main drawback due to conventional clustering approaches is that induced clusters do not always reflect the real data structure. An innovative approach, dubbed hyperplane clustering synthesis (HCS), was firstly proposed in [4]. By HCS, hyperplaneshaped clusters are determined in correspondence to the consequent part of each Sugeno rule. Namely, the ANFIS architecture can be considered as a smoothed piecewise linear model, where (2) can be rewritten as
The regression model is identified by the set of hyperplanes, each associated with a cluster. Therefore, the prototype of the th cluster in the joint inputoutput space will be represented by the set of coefficients , , evaluated by means of standard leastsquares techniques.
It is worth summarizing in the following the core steps for clustering in the “hyperplane space,” which is fundamentally an alternating optimization technique aiming to identify the cluster prototypes. Let be a set of clusters (i.e., hyperplanes) and let every pattern of the training set be assigned to one of these clusters; this is obtained according to a suitable criterion, as well established in the following of the paper. Then, the hyperplane clustering with prototypes is based on the following iterative steps.
Step 1. The coefficients of each hyperplane are evaluated by using the following procedure. For the th hyperplane , , a set of linear equations has to be solved; the generic equation is where index “” spans only the pairs of the training set assigned to the th cluster. Suited leastsquares techniques can be used to solve the set of linear equations in (4).
Step 2. The assignment of patterns to clusters is updated. Each pair , , of the training set is now assigned to the cluster , with such that
The previous choice states that a pattern is assigned the hyperplane having the minimum orthogonal distance from it. This assures the best results since other choices, as for example, the distance along the output axis of only, produces ambiguous results especially in the case of illconditioned data yielding quite vertical hyperplanes.
Step 3. For every cluster , the local approximation error is evaluated: where index “” spans only the pairs of the training set assigned lately (i.e., in Step 2) to the th cluster.
Step 4. The convergence is based on the quantity: where is the global approximation error over the whole data set in the current iteration, which is defined by and is the global approximation error calculated in the previous iteration. If is less than a predetermined threshold , then the clustering algorithm is stopped. Otherwise, the iteration goes back to Step 1 by using the current updated association of patterns to clusters. The default value will be used in the following.
The clustering algorithm described so far yields the linear consequents of Sugeno rules only. In order to achieve the complete structure of the ANFIS network, it is mandatory to determine the firing strengths corresponding to the antecedents of rules. To this end, once the iterations of hyperplane clustering have stopped, each pattern of the training set can be labeled with an integer , , representing the hyperplane it has been assigned during Step 2 of the last iteration. By using the labeled training set, a classification problem can be solved in the input space only; at the end of the training process, the input space will be tiled by a classification model able to assign a fuzzy label to any pattern of the input space where the th element of represents the fuzzy membership of the pattern to the th class and hence, it can be assumed as the firing strength of the th rule associated with the hyperplane corresponding to that class.
As a reference classification model, we will adopt in the following wellknown Simpson's MinMax model [35], since it has demonstrated the best performance in this regard. The combination of the hyperplane clustering followed by the fuzzy classification in the input space defines the HCS procedure allowing the determination of an ANFIS network for a given number of rules. HCS will be the basic algorithm adopted for the hierarchical optimization successively proposed.
It is worth mentioning that the hyperplane clustering inside HCS is a particular instance of the wellknown fuzzy regression models (FCRM) algorithm [27]; in this case, a set of linear models (hyperplanes) is fit on the training data and each pattern is assigned during clustering in a crisp manner (i.e, using hard labels) to a unique hyperplane, the one at the minimum distance (5). Although FCRM can use nonlinear models as well, they are basically adopted for regression in order to fit the training data only, even if they are not representing a function (i.e, when several output values correspond to a same input ). Conversely, the hyperplane clustering of HCS is successively completed by the classification in the input space, under the assumption that the resulting ANFIS model can be used to approximate/estimate an unknown function represented by its samples in the training set.
In this regard, when the output must be estimated for any input during the normal ANFIS operation (i.e., testing), the classifier is used to determine the fuzzy label (9) using only the input value ; then, the output is calculated using (2) by means of the firing strengths contained in the fuzzy label and the linear consequents in (1), early determined by the hyperplane clustering. In this phase, it is not necessary, in general, to assign the test sample a specific hyperplane cluster. However, a “hard approximation” using a single ANFIS rule can be obtained by a defuzzification of the fuzzy label , that is,
3. Hierarchical Optimization of the ANFIS Structure
When training an ANFIS network, the main problems are the local convergence of estimation algorithms and the correct determination of the number of rules. As evidenced successively, the former problem mainly depends on a good (usually random) initialization of numerical parameters associated with each rule. The latter one is a wellknown problem, which is directly related to the generalization capability of the neurofuzzy network and it can also be referred to as “structural optimization” problem. In fact, the ANFIS performance could be inadequate if the training set is either underfitted or overfitted by a lacking or an excessive number of rules, respectively.
3.1. The Plain Optimization Approach
A plain solution to these problems could be based on the use of the HCS algorithm with different values of and with different initializations for every value of . The HCS algorithm should be initialized before Step 1 of its first iteration by setting an association of each pattern with one of the hyperplane clusters. The simplest way is to associate patterns to clusters randomly. In this paper, a different approach will be followed, by clustering data in the input space only, in order to obtain a preliminary partition of the training set in clusters. Consequently, the random initialization is necessary for the chosen clustering algorithm in the input space as, for instance, the fuzzy cmeans [36, 37]. Once the set of different ANFIS networks is generated, the best network can be chosen by relying on the supervised nature of the learning procedure, that is, by using a cost function measuring the overall generalization capability of the network in terms of its complexity and its approximation error. Basic concepts of learning theory can be adopted in this regard [2, 3]. Namely, the ANFIS network achieving the best generalization capability is the one that, under the same performance on the training set, is characterized by the lowest number of rules.
As a measure of the network performance on the training set, the mean squared error (MSE) is adopted: where is the output generated by the ANFIS network in correspondence to the th input pattern of the training set. The optimal network is selected by using the following cost function depending upon the number of ANFIS rules: where and are the extreme values of the performance that are encountered during the performance investigation of the different ANFIS networks; is a weight . This weight is not critical, since the results are slightly affected by its variation in a large interval centered in 0.5. Evidently, for a given value , the ANFIS network showing the best value in (12) will be the one whose initialization and the successive HCS iterations yield the best performance on the training set.
The plain minimization of (12) can be obtained by the constructive technique denoted as optimized HCS (OHCS). In this procedure, the number of rules is progressively increased from 1 to , where is a fraction of the training set cardinality and it represents the maximum complexity that is allowed to the network. For each value of , different initializations are considered and several ANFIS networks are generated through HCS. In fact, as mentioned, the latter is based on a random initialization of the hyperplanes and hence, different initializations might produce different networks for the same value of . This is mainly due to the hyperplane clustering, which can easily get trapped at local minima of (8), as for any other alternating optimization procedure.
If different initializations are carried out for each value of , the OHCS procedure will generate networks and the optimal one will be selected according to (12). This optimization approach still suffers from serious drawbacks, which basically depend on the number of different initializations performed for each value of . The lower is the number of initializations, the lower is the probability to obtain a satisfactory ANFIS network after the HCS algorithm. Conversely, the higher is the number of initializations, the higher is the computational cost of the optimization procedure.
3.2. The Hierarchical HCS Procedure
In order to overcome the previous problems, in this paper, the use of a splitting hierarchical optimization is proposed, which will be referred to in the following as hierarchical HCS (HHCS). It is based on the constructive procedure shown in Figure 1, where is increased progressively and only one run of the HCS algorithm is needed for every value of . In fact, HHCS starts with only one hyperplane (i.e., ), which is initialized without any ambiguity since every pattern of the training set is assigned to the sole cluster. Then an iteration starts, involving the succession of the following procedures. (i)Considering the current association of patterns to clusters, the HCS algorithm is executed on the ANFIS network with hyperplanes.(ii)An optional fine tuning of the final ANFIS parameters obtained by the HCS algorithm can be performed by using suited techniques, as outlined in Section 3.5.(iii)If the maximum number of rules is reached, then the iteration is stopped and the best ANFIS network is chosen among those obtained during the iterations, once again according to (12). Conversely, if , a splitting procedure is carried out.(iv)By means of the splitting procedure, which is illustrated successively in Section 3.3, the cluster associated with the hyperplane showing “in some sense” the worst local approximation of data is selected to be split into two new hyperplanes, with the aim to better approximate the patterns belonging to the split cluster (as determined by the last execution of Step 2 within the HCS algorithm.) After the split is performed, the old cluster is removed from the ANFIS network and the new ones are inserted, increasing evidently by 1 the value of . So, the iteration starts back with the HCS algorithm applied to an ANFIS network composed by rules, where each pattern of the training set is currently assigned to exactly one cluster.
It is important to outline that patterns belonging to the split cluster are assigned exclusively to one of the new generated clusters, while the association of patterns to the other clusters is kept unaltered. This evidences the hierarchical nature of the proposed HHCS procedure. Furthermore, an unambiguous initialization of HCS is assured for any value of and hence, the necessity of random initializations and to optimize different HCS solutions for a given value of is eliminated. Consequently, the computational cost of HHCS is heavily reduced with respect to the plain OHCS approach.
The algorithm will stop when reaches the maximum complexity . This is not a critical value to be set; in fact, the same solutions obtained when spans up to a given value of are still found when a larger is adopted. So, the quality of the optimal solution can be traded off with the computational cost of the whole optimization routine given that the larger , the more solutions are explored spending more time. By the way, we have determined as a value generally acceptable for the 40% of the training set cardinality (i.e., ).
3.3. Splitting Procedure of Clusters
The key role in the HHCS algorithm is played by the splitting procedure, which is intended to determine optimally the cluster (or hyperplane) to be split and, successively, to initialize the hyperplanes associated with its offspring. The underlying idea is to prevent the initialization of new hyperplanes in underpopulated zones of the training set, since this is the typical situation where the HCS algorithm will converge to a poor local minimum of (8). Several heuristics are possible in this regard; the one illustrated in the following steps is based on the supervised nature of function approximation problems and on principal component analysis (PCA). In fact, the cluster is split orthogonally with respect to the direction of maximum variance of local data.(1)Taking into account the local approximation errors (6), which are determined in the last execution of Step 3 within the HCS algorithm, the cluster to be split, denoted as , , is the one exhibiting the largest local approximation error (2)Let be the set of indexes of the patterns associated to during the last execution of Step 2 within the HCS algorithm. The mean vector of such patterns is assumed in the following to be and the covariance matrix to be Using these notations, the principal components (or directions) , , of are computed. More precisely, the column vector is the th eigenvector of and the eigenvectors are sorted in descending order, according to the value of the corresponding eigenvalue.(3) Let be the principal component corresponding to the highest eigenvalue. Then, the projections of patterns of along the principal direction is calculated Finally, the cluster is removed from the ANFIS network together with the corresponding rule; it will be replaced by the new generated clusters and . Taking into account the superscript notation of these clusters, each pattern of will be assigned to either or , according to the sign of the projection in (16).
3.4. A Toy Example to Clarify the HHCS Procedure
The details of the proposed hierarchical optimization can be further clarified with the following example concerning the approximation of a simple piecewise linear function consisting of 4 different linear models:
The training set derived from this function is shown in Figure 2: 50 points per hyperplane, for a total of 200 points, are randomly sampled using (17) and a small perturbation by white Gaussian noise is also added in order to have a realistic data set.
The HHCS procedure starts with one rule and the hyperplane clustering is uniquely initialized in this case, since all patterns are initially assigned to the unique hyperplane. Its final determination, after the convergence of clustering, is represented in Figure 3. Assuming no network tuning is performed in this example, the splitting procedure illustrated in Section 3.3 is applied to the sole hyperplane that will be removed from the ANFIS network. More precisely, the patterns belonging to the hyperplane (i.e., all the training set in the case ) are separated according to the sign of projection (16) along their principal direction. The resulting initialization of hyperplane clustering for is shown in Figure 4, where the patterns associated with the new generated clusters are represented by crosses and circles, respectively. They are used for the first estimation of the hyperplanes, which are also evidenced in Figure 4.
The hyperplane clustering gets trapped in a local convergence in this case. In fact, the final determination of hyperplanes illustrated in Figure 5 is obtained after a few iterations and they are really similar to the initial estimate in Figure 4. In spite of this, the HHCS procedure continues and the hyperplane with negative slope will be split because it marks the largest local approximation error (13). After the splitting procedure, the initialization of hyperplane clustering for is obtained, as illustrated in Figure 6. The patterns belonging to the split cluster are separated into two subsets similarly to the previous case. Obviously, the patterns associated with the other cluster are not involved in the splitting procedure.
Also in this case, there is a fast but local convergence to the configuration evidenced in Figure 7, which is similar to the former initialization. A hyperplane is then selected for splitting in order to obtain the initialization of HCS illustrated in Figure 8 for the next HHCS iteration with . The convergence of hyperplane clustering is slower and accurate in this case and it yields the final configuration shown in Figure 9 where the linear components of (17) are well modeled now.
Finally, the global approximation performed by the ANFIS network obtained by the HHCS procedure with 4 rules is illustrated in Figure 10. The plotted function is obtained using a test set of 35000 patterns uniformly sampled in the input space, , , , and determining the corresponding outputs using (2).
3.5. Computational Remarks
When dealing with the hyperplanebased synthesis of ANFIS networks, a number of considerations can be taken mostly pertaining to the inference of rules’ antecedents from the consequents associated with the hyperplanes and to the fine tuning of ANFIS parameters after the HCS procedure. For example, when the nonlinear functions to be approximated are quite complicated, the number of HBs generated by the MinMax classifier is often larger than the number of hyperplanes determined by the hyperplane clustering. Thus, a same hyperplane induces several HBs in the input space and can approximate patterns belonging to wellseparated regions of the input space, each associated with a different HB.
Furthermore, a tuning procedure of ANFIS parameters can be applied after every HCS procedure or, alternatively, to the optimal ANFIS network only. A simple tuning step is considered in this paper, it is applied after every execution of the HCS both in OHCS and HHCS procedures. The coefficients , , , calculated during the hyperplane clustering are updated by a fuzzy leastsquares solution of a set of linear equations, which are obtained by using the training set and by setting to zero the derivatives of the MSE (11) with respect to every unknown : where denotes the average over subscript and
What is more important to remark is that HHCS allows the structural optimization of ANFIS networks and this is obtained by controlling the computational cost. Moreover, the increasing complexity of networks during training makes HHCS structurally constrained, similarly to other deterministic annealing or entropy constrained approaches [38–40]. In other words, overfitting due to the presence of outliers in the training set can be prevented when is low, ensuring robustness with respect to noise in the function to be approximated.
Although the convergence of HHCS is guaranteed by the inner convergence of HCS algorithm and by the maximum complexity allowed to the network, it should be noticed that HHCS does not ensure the determination of the global minimum of (8) for any value of . As said, achieving a global optimum is inherently prevented by the HCS algorithm, being this an alternating minimization scheme. HHCS algorithm aims to obtain a suboptimal minimum ensuring an acceptable accuracy of the resulting ANFIS network, although the peculiarity of HHCS is to keep limited the computational cost of the whole structural optimization procedure. In fact, the methods commonly adopted in the literature for searching the global optimum are often characterized by an expensive requirement of computational resources. By the way, in the case of HCS algorithm, one can use some particular modifications that have been proposed in this regard [40–42] or more general approaches for searching the global minimum based, for example, on simulated annealing, genetic algorithms, or tabù search.
A characteristic behavior that usually occurs, revealing all the HHCS potentiality, is illustrated in the example pertaining to the approximation problem described in Section 4. The final MSE of the data set versus the number of rules is plotted in Figure 11, while the number of HCS iterations needed to reach convergence is plotted in Figure 12. The continuous line is related to the single HCS execution in HHCS that generates just one ANFIS network for each value of ; the dashed line corresponds to the HCS execution yielding the ANFIS network that showed the best performance, among those differently initialized by applying the plain OHCS procedure (10 different initializations have been used for each value of ). It is evident that the performance of ANFIS networks obtained by HHCS is comparable, and even better, with respect to the ones obtained by OHCS. This also means that the faster HCS convergence in HHCS does not implies a poor local convergence. On the contrary, the lower number of HCS iterations in HHCS is due to the change of only one rule in the fuzzy inference system when is increased.
4. Numerical Results
The validity of the HHCS procedure has been proved by extensive computer simulations. Some illustrative examples are summarized in the following, considering the numerical results obtained by wellknown neural and neurofuzzy models applied to specific function approximation problems.
4.1. Synthetic Benchmarks
First of all, the HHCS procedure is compared with the plain OHCS in terms of accuracy, complexity and the computational cost necessary to determine the optimal network. Such quantities can be represented, respectively, by MSE, number of rules, overall number of HCS iterations (i.e., the cumulative sum of iterations in all the HCS executions required by the whole optimization procedure), and normalized time. We consider for the sake of comparison the wellknown function approximation problems proposed in [43]: (i); (ii); (iii); with , , is a small perturbation of the output surface of the first example. A uniform spiral distribution was used to produce a training set of 400 examples for each function. As mentioned, the default values also used in the subsequent tests are ; (i.e., 40% of the training set cardinality); (i.e., 10 different initializations carried out for every value of in OHCS). The results shown in Table 1 evidence that HHCS has a computational cost considerably smaller than OHCS, while maintaining a good approximation accuracy and controlling the overall complexity in order to avoid overfitting. The normalized time spent in OCHS with respect to HHCS is more than expected by considering the number of iterations only. As said, this is due to the slower convergence of the HCS procedure within OHCS since random initializations are used whenever.
 
MSE values are scaled by . 
By using the same data sets, the HHCS procedure is also compared with respect to other neural and neurofuzzy paradigms, in particular the selforganized fuzzy rule generation (SOFRG) method proposed in [43], where also the results of the wellknown radial basis function (RBF) [44–46], standard multilayer perceptron (SMLP) [47], and momentumimproved multilayer perceptron (IMLP) [48] are reported. The MSE obtained by these systems on the said data sets are illustrated in Table 2 and once more, they prove that HHCS compare favorably with respect to the other systems and the related learning algorithm. We outline that, in order to obtain comparable results, we have considered the parameters setup of the benchmarks already reported in the literature, where the complexity of RBF, SMLP, and IMLP neural networks was optimized in advance to 25 neurons by using suited crossvalidation experiments. Conversely, HHCS, OHCS, and SOFRG determine automatically the complexity of the neurofuzzy network.
 
MSE values are scaled by . 
The efficacy of the proposed HHCS approach is also evaluated in terms of generalization capability, which is the crucial characteristic of any neural system. Other wellknown methods for ANFIS synthesis are considered [49]: the combination of the resilient propagation (RPROP) and the recursive leastsquares error (RLSE) technique [50]; the gradient descent (GD) optimization joined to the RLSE method. The following threeinput nonlinear function should be modeled: (iv), where 216 points are used in the training set and other 125 points in the test set. The performance results are reported in Table 3 in terms of MSE. The optimal number of rules reported in the literature in the case of RPROP + RLSE and GD + RLSE algorithms is , while the constructive optimization of OHCS and HHCS yields ANFIS networks having fewer rules and showing a lower error on the test set and hence, a better generalization capability especially using HHCS.

Finally, the comparison with some gradient descent optimization techniques for the learning of fuzzy rule bases, in particular the Sugeno rules (TSA) and centeredSugeno rules (CTSA), is considered. Both training and test sets consist of 250 patterns, obtained by random sampling of the following twoinput nonlinear function [51]: (v).
The numerical results are shown in Table 4; the optimal number of rules obtained in [51] by the TSA and CTSA approaches is ; also in this case HHCS attains a good generalization capability on the test set even if a reduced number of rules is used with respect to OHCS, TSA, and CTSA methods.
 
MSE values are scaled by . 
4.2. Time Series Prediction
The synthesis procedure proposed in this paper is also validated by considering the use of the resulting ANFIS network in a typical realworld application. More precisely, we focus on the prediction of biological time series, which are relevant to health monitoring and risk prevention in many daily activities as in clinical applications, telemedicine, road safety, and so on [52].
The general approach to solve a prediction problem is based on the solution of a suitable function approximation problem. In fact, a prediction problem can be solved by synthesizing the function that links the current sample to be predicted to a suitable set of past ones [53]. Let be the sample of the time series to be predicted at time and let the past samples be known up to time , the estimated value can be obtained as
Hence, the implementation of a predictor coincides with the estimation of an approximation model as in (3), by using any data driven function approximation technique. Neurofuzzy networks are useful to solve such problems, because of the intrinsic nonlinearity and complexity of the underlying functions to be estimated [54]. The ANFIS network obtained by the HHCS procedure is compared with several data driven modeling techniques useful for approximation: a linear model determined by the wellknown leastsquares approximation (LSE); the RBF neural network; the Mixture of Gaussian (MoG) neural network, which is particularly suited to the solution of multivalued and nonconvex function approximation problems [40, 55].
The glucose time series is sampled every 20 minutes; the time series of heart rate is sampled every 5 seconds; the skin conductivity is sampled every one second. On the basis of a preliminary analysis of such time series, performed by experts in the context of clinical trials, three past samples are sufficient to predict the future value, that is, is used in (20). Each predictor is trained on the first 500 samples of every time series (training set); the related performance is evaluated as the signaltonoise ratio (SNR) in predicting the successive 150 samples (test set). The SNR is defined as the ratio, measured in decibels (dB), between the mean squared error of prediction and the variance of the time series to be predicted.
The performances obtained on both training and test set of the considered time series are illustrated in Table 5. Except for the linear model, RBF and MoG neural networks have been optimized by crossvalidation and constructive procedures similarly to HHCS, in order to find the optimal number of parameters that maximizes the generalization capability. Looking at the performance on the test set, the ANFIS networks generated by the HHCS procedure obtain a higher SNR with respect to the other models for all the considered time series.

5. Conclusion
This paper focuses on the training of ANFIS networks with the aim to find effective solutions for improving the accuracy in actual regression problems. The novel HHCS procedure is proposed in the paper within the framework of hyperplane clustering synthesis of ANFIS networks; it involves a constructive approach that progressively increases the number of rules. This procedure ensures a structural robustness of the resulting network with respect to the overfitting phenomenon, while improving the convergence and reducing the computational cost in correspondence to any given number of rules. Thus, it is characterized by a high robustness and it is well suited to play the core role in more complex modeling systems.
The numerical results illustrated in the paper encourage to a further development of the HHCS approach, since the resulting ANFIS networks perform better than several neurofuzzy models and the related learning algorithms applied to wellknown benchmarks and realworld problems reported in the literature.
References
 J. S. R. Jang, “ANFIS: adaptivenetworkbased fuzzy inference system,” IEEE Transactions on Systems, Man and Cybernetics, vol. 23, no. 3, pp. 665–685, 1993. View at: Publisher Site  Google Scholar
 J. S. Jang, C. Sun, and E. Mizutani, NeuroFuzzy And Soft Computing: A Computational Approach to Learning and Machine Intelligence, Prentice Hall, Upper Saddle River, NJ, USA, 1997.
 S. Haykin, Neural Networks, A Comprehensive Foundation, PrenticeHall, Englewood Cliffs, NJ, USA, 2nd edition, 1999.
 M. Panella, A. Rizzi, F. M. Frattale Mascioli, and G. Martinelli, “ANFIS synthesis by hyperplane clustering,” in Proceedings of the Joint 9th IFSA World Congress and 20th NAFIPS International Conference, vol. 1, pp. 340–345, Vancouver, Canada, July 2001. View at: Google Scholar
 M. Panella and A. S. Gallo, “An inputoutput clustering approach to the synthesis of ANFIS networks,” IEEE Transactions on Fuzzy Systems, vol. 13, no. 1, pp. 69–81, 2005. View at: Publisher Site  Google Scholar
 Z. He and A. Cichocki, “An efficient Khyperplane clustering algorithm and its application to sparse component analysis,” in Proceedings of the 4th International Symposium on Neural Networks: Part II, Advances in Neural Networks, D. Liu, S. Fei, Z.G. Hou, H. Zhang, and C. Sun, Eds., vol. 4492 of Lecture Notes in Computer Science, pp. 1032–1041, Springer, Nanjing, China, 2007. View at: Google Scholar
 A. Sharma, R. Podolsky, J. Zhao, and R. A. Mcindoe, “A modified hyperplane clustering algorithm allows for efficient and accurate clustering of extremely large datasets,” Bioinformatics, vol. 25, no. 9, pp. 1152–1157, 2009. View at: Publisher Site  Google Scholar
 J. Echanobe, I. del Campo, and G. Bosque, “An adaptive neurofuzzy system for efficient implementations,” Information Sciences, vol. 178, no. 9, pp. 2150–2162, 2008. View at: Publisher Site  Google Scholar
 S. D. Nguyen and K. N. Ngo, “An adaptive input data space parting solution to the synthesis of neurofuzzy models,” International Journal of Control, Automation and Systems, vol. 6, no. 6, pp. 928–938, 2008. View at: Google Scholar
 S. H. Hosseini and A. H. Etemadi, “Adaptive neurofuzzy inference system based automatic generation control,” Electric Power Systems Research, vol. 78, no. 7, pp. 1230–1239, 2008. View at: Publisher Site  Google Scholar
 I. D. Silva and R. Flauzino, “Efficient parametric adjustment of fuzzy inference system using error backpropagation method,” in Proceedings of the 19th International Conference on Artificial Neural Networks: Part I (ICANN '09), vol. 5768 of Lecture Notes in Computer Science, Springer, 2009. View at: Google Scholar
 M. Panella and G. Martinelli, “Neurofuzzy networks with nonlinear quantum learning,” IEEE Transactions on Fuzzy Systems, vol. 17, no. 3, pp. 698–710, 2009. View at: Publisher Site  Google Scholar
 M. Alizadeh, M. Lewis, M. H. F. Zarandi, and F. Jolai, “Determining significant parameters in the design of ANFIS,” in Proceedings of the Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS '11), March 2011. View at: Publisher Site  Google Scholar
 M. Panella and G. Martinelli, “Neural networks with quantum architecture and quantum learning,” International Journal of Circuit Theory and Applications, vol. 39, no. 1, pp. 61–77, 2011. View at: Publisher Site  Google Scholar
 H. Hui and J. H. Li, “Fingerprint matching using ANFIS,” in Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, vol. 1, pp. 217–222, October 2003. View at: Google Scholar
 F. Sun, H. Zhang, and H. Wu, “Neurofuzzy hybrid position/force control for a space robot with flexible dualarms,” in Proceedings of the International Symposium on Neural Networks, Advances in Neural Networks (ISNN '04), vol. 3174 of Lecture Notes in Computer Science, Springer, 2004. View at: Google Scholar
 X. K. Zhang, Y. C. Jin, and G. Guo, “ANFIS applied to a ship autopilot design,” in Proceedings of the International Conference on Machine Learning and Cybernetics, vol. 2006, pp. 2233–2236, August 2006. View at: Publisher Site  Google Scholar
 C. Mazzetti, F. M. Frattale Mascioli, F. Baldini, M. Panella, R. Risica, and R. Bartnikas, “Partial discharge pattern recognition by neurofuzzy networks in heatshrinkable joints and terminations of XLPE insulated distribution cables,” IEEE Transactions on Power Delivery, vol. 21, no. 3, pp. 1035–1044, 2006. View at: Publisher Site  Google Scholar
 P. Civicioglu, “Using uncorrupted neighborhoods of the pixels for impulsive noise suppression with ANFIS,” IEEE Transactions on Image Processing, vol. 16, no. 3, pp. 759–773, 2007. View at: Publisher Site  Google Scholar
 S. Y. Na, D. Shin, J. Y. Kim, S. J. Baek, and S. H. Min, “Obstacle recognition and collision avoidance of a fish robot based on fuzzy neural networks,” Advances in Soft Computing, vol. 40, pp. 337–344, 2007. View at: Publisher Site  Google Scholar
 D. Shin, S. Y. Na, J. Y. Kim, and S. J. Baek, “Fuzzy neural networks for obstacle pattern recognition and collision avoidance of fish robots,” Soft Computing, vol. 12, no. 7, pp. 715–720, 2008. View at: Publisher Site  Google Scholar
 E. Joelianto, S. Widiyantoro, and M. Ichsan, “Time series estimation on earthquake events using ANFIS with mapping function,” International Journal of Artificial Intelligence, vol. 3, no. 9, pp. 37–63, 2009. View at: Google Scholar
 M. Kang and H. Kim, “A design of RFTOG model for distributed realtime applications,” Journal of Intelligent Manufacturing, vol. 20, no. 3, pp. 311–319, 2009. View at: Publisher Site  Google Scholar
 J. Zhang, J. Cheng, and L. Li, “Forecasting coal and rock dynamic disaster based on adaptive neurofuzzy inference system,” in Proceedings of the 2nd International Conference on Computational Collective Intelligence: Technologies and Applications (ICCCI '10), vol. 6422 of Lecture Notes in Computer Science, Springer, 2010. View at: Google Scholar
 M. Hayati, A. Rezaei, M. Seifi, and A. Naderi, “Modeling and simulation of combinational CMOS logic circuits by ANFIS,” Microelectronics Journal, vol. 41, no. 7, pp. 381–387, 2010. View at: Publisher Site  Google Scholar
 R. Khatibi, M. A. Ghorbani, M. H. Kashani, and O. Kisi, “Comparison of three artificial intelligence techniques for discharge routing,” Journal of Hydrology, vol. 403, no. 34, pp. 201–212, 2011. View at: Publisher Site  Google Scholar
 R. J. Hathaway and J. C. Bezdek, “Switching regression models and fuzzy clustering,” IEEE Transactions on Fuzzy Systems, vol. 1, no. 3, pp. 195–204, 1993. View at: Publisher Site  Google Scholar
 M. Sugeno and T. Yasukawa, “Fuzzylogicbased approach to qualitative modeling,” IEEE Transactions on Fuzzy Systems, vol. 1, no. 1, pp. 7–31, 1993. View at: Google Scholar
 S. Chiu, “Fuzzy model identification based on cluster estimation,” Journal of Intelligent & Fuzzy Systems, vol. 2, pp. 267–278, 1994. View at: Google Scholar
 R. Babuška and H. B. Verbruggen, “An overview of fuzzy modeling for control,” Control Engineering Practice, vol. 4, no. 11, pp. 1593–1606, 1996. View at: Publisher Site  Google Scholar
 E. Kim, M. Park, S. Ji, and M. Park, “A new approach to fuzzy modeling,” IEEE Transactions on Fuzzy Systems, vol. 5, no. 3, pp. 328–337, 1997. View at: Google Scholar
 E. Kim, M. Park, S. Kim, and M. Park, “A transformed inputdomain approach to fuzzy modeling,” IEEE Transactions on Fuzzy Systems, vol. 6, no. 4, pp. 596–604, 1998. View at: Google Scholar
 M. R. Emami, I. B. Türksen, and A. A. Goldenberg, “Development of a systematic methodology of fuzzy logic modeling,” IEEE Transactions on Fuzzy Systems, vol. 6, no. 3, pp. 346–361, 1998. View at: Google Scholar
 S. Guillaume, “Designing fuzzy inference systems from data: an interpretabilityoriented review,” IEEE Transactions on Fuzzy Systems, vol. 9, no. 3, pp. 426–443, 2001. View at: Publisher Site  Google Scholar
 P. K. Simpson, “Fuzzy minmax neural networks—I: classification,” IEEE Transactions on Neural Networks, vol. 3, no. 5, pp. 776–786, 1992. View at: Publisher Site  Google Scholar
 J. Bezdek, Pattern Recognition With Fuzzy Objective Functions Algorithms, Plenum, New York, NY, USA, 1981.
 R. Nock and F. Nielsen, “On weighting clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 8, pp. 1223–1235, 2006. View at: Publisher Site  Google Scholar
 K. Rose, “Deterministic annealing for clustering, compression, classification, regression, and related optimization problems,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2210–2239, 1998. View at: Google Scholar
 C. C. Chuang, S. F. Su, and S. S. Chen, “Robust TSK fuzzy modeling for function approximation with outliers,” IEEE Transactions on Fuzzy Systems, vol. 9, no. 6, pp. 810–821, 2001. View at: Publisher Site  Google Scholar
 M. Panella, A. Rizzi, and G. Martinelli, “Refining accuracy of environmental data prediction by MoG neural networks,” Neurocomputing, vol. 55, no. 34, pp. 521–549, 2003. View at: Publisher Site  Google Scholar
 N. Ueda and R. Nakano, “Deterministic annealing EM algorithm,” Neural Networks, vol. 11, no. 2, pp. 271–282, 1998. View at: Publisher Site  Google Scholar
 N. Ueda, R. Nakano, Z. Ghahramani, and G. E. Hinton, “SMEM algorithm for mixture models,” Neural Computation, vol. 12, no. 9, pp. 2109–2128, 2000. View at: Google Scholar
 I. Rojas, H. Pomares, J. Ortega, and A. Prieto, “Selforganized fuzzy system generation from training examples,” IEEE Transactions on Fuzzy Systems, vol. 8, no. 1, pp. 23–36, 2000. View at: Publisher Site  Google Scholar
 S. Chen, C. F. N. Cowan, and P. M. Grant, “Orthogonal least squares learning algorithm for radial basis function networks,” IEEE Transactions on Neural Networks, vol. 2, no. 2, pp. 302–309, 1991. View at: Publisher Site  Google Scholar
 P. Wasserman, Advanced Methods in Neural Computing, Van Nostrand Reinhold, New York, NY, USA, 1993.
 A. G. Borş and I. Pitas, “Median radial basis function neural network,” IEEE Transactions on Neural Networks, vol. 7, no. 6, pp. 1351–1364, 1996. View at: Google Scholar
 D. Rumelhart, G. Hinton, and R. Williams, “Learning internal representations by error propagation,” in Parallel Data Processing, D. Rumelhart and J. McClelland, Eds., vol. 1, MIT Press, Cambridge, Mass, USA, 1986. View at: Google Scholar
 H. C. Hsin, C. C. Li, M. Sun, and R. J. Sclabassi, “Adaptive training algorithm for backpropagation neural networks,” IEEE Transactions on Systems, Man and Cybernetics, vol. 25, no. 3, pp. 512–514, 1995. View at: Publisher Site  Google Scholar
 M. Chen and R. Liou, “An efficient learning method of fuzzy inference systems,” in Procedings of the IEEE International Fuzzy Systems Conference Proceedings (FUZZIEEE '99), vol. 2, pp. 634–638, Seoul, Korea, 1999. View at: Google Scholar
 G. Goodwin and K. Sin, Adaptive Filtering Prediction and Control, PrenticeHall, New York, NY, USA, 1984.
 F. Guely and P. Siarry, “Gradient descent method for optimizing various fuzzy rule bases,” in Procedings of the 2nd IEEE International Conference on Fuzzy Systems, pp. 1241–1246, April 1993. View at: Google Scholar
 M. Panella, “Advances in biological time series prediction by neural networks,” Biomedical Signal Processing and Control, vol. 6, no. 2, pp. 112–120, 2011. View at: Publisher Site  Google Scholar
 H. Abarbanel, Analysis of Observed Chaotic Data, Springer, New York, NY, USA, 1996.
 Z. Ghahramani, “Solving inverse problems using an EM approach to density estimation,” in Proceedings of the Connectionist Models Summer School, Lawrence Erlbaum Associates, Hillsdale, NJ, USA, 1994. View at: Google Scholar
 M. Panella, A. Rizzi, F. M. Frattale Mascioli, and G. Martinelli, “A constructive EM approach to density estimation for learning,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN '01), vol. 4, pp. 2608–2613, Washington, DC, USA, July 2001. View at: Google Scholar
Copyright
Copyright © 2012 Massimo Panella. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.