Fuzzy Weighted Least Squares Support Vector Regression with Data Reduction for Nonlinear System Modeling
This paper proposes a fuzzy weighted least squares support vector regression (FW-LSSVR) with data reduction for nonlinear system modeling based only on the measured data. The proposed method combines the advantages of data reduction with some ideas of fuzzy weighted mechanism. It not only possesses the capability of illuminating local characteristic of the modeled plant but also can deal with the problem of boundary effects resulted from local LSSVR method when the modeled data is at the boundary of whole data subset. Furthermore, in comparison of the SVR, the proposed method only utilizes fewer hyperparameters to construct model, and the overlap factor can be chosen in relatively smaller value than SVR to further reduce more computational time. First of all, distilling the original input space into several regions with fuzzy partition by applying Gustafson-Kessel clustering algorithm (GKCA) is a foundation for data reduction and the overlap factor is introduced to reduce the size of subsets. Following that, those subset regression models (SRMs) which can be simultaneously solved by LSSVR are integrated into an overall output of the estimated nonlinear system by fuzzy weighted. Finally, the proposed method is demonstrated by experimental analysis and compared with local LSSVR, weighted SVR, and global LSSVR methods by using the index of computational time and root-mean-square error (RMSE).
It is well known that, in a large number of applications such as advanced control, process simulation, fault detection, or other research areas, a significant problem is to construct mathematical model of estimated system only based on its measured data. Some major theories or methods on identifying nonlinear system have been independently developed in various research field including fuzzy system , neural networks , and other approach . However, LSSVR method , like SVR , also adopts the structural risk minimization and has the better equilibrium between sparsity and modeling accuracy. Furthermore, LSSVR, by substituting a set of equality conditions for complex inequalities ones, translates complex quadratic optimization as a simple linear programming, which is greatly relieve computational load. In literature , the power of generalization for LSSVR is no worse than that SVR.
Therefore, the LSSVR has been attracting extensive attentions and has obtained successful application like time series prediction [7, 8], subspace identification [9, 10], signal processing [11, 12], and other applications [13, 14] during the past few years. In spite of the LSSVR approach, referred to as the global LSSVR (G-LSSVR) approach, has become an effective tool in various applications, and can identify an estimated model whose modeling accuracy is guaranteed by obtaining an appropriate mathematical model  and a proper hyperparameter set which usually consists of the two variables: the kernel width () and the penalty factor (). Generally, it is an insignificant for global-LSSVR to derive a well local behavior. We discovers in literature [15, 16] that G-LSSVR has also some defective in illuminating local behavior.
Recently, local modeling approaches, as an alternative efficient algorithm, because of their superiorities to identify various areas of estimated nonlinear system, seem desirable. In the literature , a local fusion modeling method based on LSSVR and nerofuzzy has been proposed. It employs LSSVR and a learning method named as layering two kind of problem to construct each local model. In the literature , it adopts another approach which makes use of interest training point to identify local model instead of all points and by applying vector-norm distance  to search for nearest points. To seek a set of optimal data points, a Euclidian distance measurement method [20, 21] is proposed and the local model is set up according to these neighboring points. From the pointview of capturing the local behavior, our aims are to construct their connection between some models and the localized support vector regression (LSVR) method  has been proposed. Considering the much more computational time of G-LSSVR, a local grey SVR  is developed to speed up the calculational time. Further, by introducing regularization, a general local and global learning framework  formulates multiple classifier in each data of neighbours.
Nevertheless, the local modeling approaches or local-LSSVR have more superiorities in identifying local characteristics than that approaches such as global-SVR or LSSVR; it is still unsatisfactory in modeling global capability. First, due to the different criterion to select nearest neighbor training data in subsets, the better performance for local-LSSVR is not derived when those training data are at the boundary area. Second, because the number of constructing all local models is equal to the size of testing set, the local LSSVR approach generally leads to a heavy computation load . Third, it generates boundary effects resulting from local LSSVR method when the modeled data is at the boundary of whole data subset.
Based on the above consideration, our aims present a FW-LSSVR method for nonlinear system modeling based only on the obtained measure data. The paper integrates the superiorities of GKCA, weighted average mechanism, and some ideas from LSSVR. First of all, distilling the original input space into several regions with fuzzy partition by applying GKCA is a foundation for data reduction. Following that, those subset regression models (SRMs) which can be simultaneously solved by LSSVR are integrated into an overall output of the estimated nonlinear system by fuzzy weighted. The proposed method not only possesses the capability of illuminating local characteristic of the estimated models but also can deal with the problem of boundary effects resulted from local LSSVR method. Furthermore, in comparison of the support vector regression (SVR), the proposed method only utilizes fewer hyperparameters to construct model, and the overlap factor is chosen in relatively smaller value than SVR to further reduce more computational time. Finally, experimental analysis demonstrates that our approach not only overcomes the disadvantages of local LSSVR, weighted SVR, and global LSSVR methods in the process of modeling nonlinear system but also has better root-mean-square error (RMSE) performance and needs less computational time.
The paper is organised as follows: brief descriptions for LSSVR and GKCA in Section 2 are firstly given, the proposed method is introduced in detail in Sections 3 and 4 shows several examples for demonstrating our approach, and Section 5 summarizes the whole paper.
2.1. Least Squares Support Vector Regression
It has been shown that the generalization performance for LSSVR presented by  is comparable to that of the SVR through a meticulous empirical study . Next, we will concisely introduce LSSVR with the following training points, where is the input pattern and is the corresponding target.The LSSVR can be represented for a test input asBecause of adopting the Gaussian kernel width, kernel function in (2) can further be rewritten aswhere support-value-vector and bias are solved by formulating the following optimization :where represents a feature mapping which nonlinear space is transformed to a high-dimensional linear space and parameter is regularization constant which governs the relative importance between the data fitting and the smoothness of the solution. Using Lagrange multiplier method for (4) gives rise to an unconstrained optimization problem:In terms of the KKT condition, one derivesConsequently, learning process of LSSVR corresponding to (5) is implemented by solvingwhere , , , and , with a positive-definitive Mercer kernel function meeting the Merer’s theorem.
2.2. Gustafson-Kessel Clustering Algorithm
Clustering analysis plays an important role in classification and regression problem. In order to study some important characteristics of complex system, it is crucial for researchers to decompose an original data set into several subsets which is well reflect a system’s behavior. Especially, GKCA  used for extracting various clustering center in different shape and direction for a larger data set  and is superior to conventional FCM. GKCA can be achieved by minimizing the following objective function: is a component from fuzzy matrix , is defined by (12), describes the number of clustering center , and it needs to be predefined. In a nutshell, GKCA can be boiled down to the following steps:
(1) calculating the cluster centers where denotes iteration number and is the number of all data points.
(2) computing according to the definition of covariance(3) computing the distance (4) revising the components of fuzzy matrix The iteration stops when the difference between the fuzzy partition matrices and in the following iterations are lower than .
3. Fuzzy Weighted Least Squares Support Vector Regression
The paper develops a new method combining respective advantages both global and local learning method to formulate overall framework. The procedures of the proposed FW-LSSVR approach are depicted by Figure 1.
3.1. Constructing Fuzzy Weighted with Triangle Membership Functions
Applications of fuzzy concepts were early developed by Zadeh . A triangular fuzzy number can be parametrized by a triplet (), where and denote the left and right bounds, respectively, and represents the mode of . The membership function of the triangular fuzzy number is defined byThe -cut of the fuzzy set in the universe of discourse is defined bywhere .
In generally, fuzzy partition is implemented by some clustering methods and GKCA  is common used to decompose the original data set. It discovers that GKCA is superior to that of FCM(fuzzy c-means) and subtractive clustering. GKCA extended the standard FCM algorithms by adopting a flexible distance measure that is calculated using covariance matrices as exhibited in (11). Meanwhile, various difformities and orientation in original data set are detected by GKCA.
In this paper, GKCA is used, in which it is based on the minimization of (8). As stated in Section 2.2, the iteration is to be stopped when the termination criterion is satisfied, namely, , and an appropriate fuzzy membership matrices is obtained finally. Following that, the cluster centers and spread widthes are calculated, respectively, as where is the number of training data, is the number of clusters, is the degree of membership of in the cluster , is the th training data, is a feature dimensionality, and measures the distance between two vectors.
From (16) and (17), the weighted values can be calculated by applying triangle membership functions. In order to derive the weighted values, triangle membership function is constructed as follows according to (14): Instead of (15), -cut of the fuzzy set , the overlap factor is introduced into triangle membership functions to more readily dominate the size of original data subsets, and the degree of fulfilment is calculated in terms ofBy the normalized firing level of the th fuzzy sets, weighted values is finally calculated as
In some applications , Gaussian membership function is adoptedwhere and describing the width of the Gaussian fuzzy function, which is usually chosen as a interval .
3.2. FW-LSSVR with Data Reduction
Takagi-Sugeno fuzzy models  have recently become a powerful practical instrument in identifying the complex system. Based on the fuzzy partition, nonlinear description of estimated system can well be expanded into several simple linear descriptions by applying rules of if-thenHere , , , are the fuzzy membership function of , is corresponding output, and and are defined as consequent parameter.
There are the fuzzy sets assigned to corresponding input variables, variable represents the value of the th rule output, and and are parameters of the consequent function.
Next, substituting linearizing around a point, the proposed method make use of the subset regression models (SRMs) which are simultaneously solved by LSSVR in each fuzzy partition area. Firstly, the original input data set is divided into several subsets with fuzzy partition. In each region, SRM is independently trained by LSSVR. Based on the obtained centers and the spread width from (16) and (17), the old data set is once again decomposed into a new one by introducing the overlap factor to reduce the size of original subsets. We can perform the partition by the following pseudo code:
where the overlap factor is introduced to reduce the size of subsets and obtained a new training set with data reduction.
Then, the obtained new training subsets will be used to construct each by (3) as follows:where is termed as the th subset regression model, the parameters and are derived by LSSVR approach, and describes the size of new subset . Following that, the weighted values computed by (21) are combined with the to form the global predicted output as follows:It is clear from (27) that each is solved by LSSVR and can be completed simultaneously. As a result, it can largely improve computational efficiency of the proposed method. In brief, the proposed approach can be summarized as follows.
Step 1. Define the overlap factor and select the size of clustering subsets where is generally selected to 2.
Step 2. Obtain the appropriate fuzzy partition matrices by applying GKCA until the termination criterion is satisfied, namely, finally.
Step 4. Determine new training subsets by (25) based on the overlap factor , the cluster centers, and the spread width.
Step 5. Set two hyperparameters and in the LSSVR.
Step 6. Construct each subset regression model by the LSSVR approach and (26) is thus obtained.
4. Experimental Studies
For the purpose of illustrating our approach, both RMSE (root mean squares error) and computational time are considered by four simulated data experiments. All numerical experiments are carried out on the personal computer with a 2.50GHz Intel(R) Celeron(R) CPU and 2 Gbytes memory. This computer runs on Windows XP, with MATLAB R2012a and VC++ 6.0 compiler installed. The LSSVR from Matlab Toolbox was used (this toolbox has been obtained via the Internet at http://www.esat.kuleuven.be/sista/lssvmlab/.).
We evaluate the performance of the proposed approach on four benchmark data sets . The index adopted for measuring modeling accuracy is selected asAnother index is the total computational time for constructing the proposed method and the local running time for constructing those SRMs. The two indexes of the proposed method are compared with G-LSSVR, local-LSSVR, and . In addition, the importance of the selected different overlap factor is also compared. To obtain a fair comparison, their hyperparameters are set as the same values. The local-LSSVR is shortly introduced in the following. Let the training data be obtained by experiment or a real system and be generated from testing data set and devoted to the test input of predicted output. In the closest regions of , there are training inputs to be selected by applying the norm-distance approach. As a result, local-LSSVR models corresponding to all testing output are derived by training inputs in those regions.
Example 1. The approximated function is
In this function, 501 training points and 1001 testing points are obtained from (29). Due to the use of the proposed approach, there are only two hyperparameters (i.e., and ) to be chosen, whereas SVR approach needs to choose three hyperparameters (i.e., , , and ). For comparison, Figure 2 shows the results of the WFA-LSSVR and G-LSSVR method. The two indexes both RMSE and computational time including local-LSSVR, G-LSSVR,  and our approach, are summarized in Tables 1 and 2, respectively. Additionally, the importance of selected different overlap factor is also compared by Table 3.
If we take these tables into account, it discovers that our approach obtains a better nonlinear function approximation comparing to RMSE in Table 1 for G-LSSVR, local-LSSVR and . In addition, the running time of the proposed methods in Table 2 is approximately 10-times shorter comparing with local-LSSVR method at least. In other words, the proposed method leads to a less computational time than local-LSSVR. As shown in Table 2, local-LSSVR needs more computational time. The main reason is that the number of required local models is too large and is equal to the size of all testing set. From Table 3 we also see that the comparison results on RMSE and computational time corresponding to the overlap factor as 1.5 performed better than as 2.5. That is to say, under the circumstances to cover the training data, larger does not necessarily lead to a better performance. These results confirm the superiority of our proposed method over other methods.
Example 2. The approximated function with two variables wasIn this function, and equally sampling on interval are used as training inputs. The number of the training data obtained is 1681 (i.e., 4141). The number of the used test data is 6561 (i.e., 81 81). For comparison, the same indexes in Example 1 are used including our approach, local-LSSVR, and Global-LSSVR and  is summarized in Tables 4 and 5. Additionally, the importance of selected different overlap factor is also compared by Table 6. These results show that the proposed method (WFA-LSSVR) outperforms G-LSSVR, local-LSSVR, and .
Furthermore, it demonstrates that the predicted outputs of local modeling approach base on LSSVR lead to the problem of boundary effects under the different number of M=49, 81, 121, and 169, as shown in Figure 3. Figure 4 gives the estimated value of the proposed FW-LSSVR approach with 4, 6, 8 and 10 SRMs. From Tables 4 and 5 we can also see that only slightly worse results (training RMSE) were obtained by our approach than the local-LSSVR method with =169 training data points, but running time of the local-LSSVR is significantly longer, in which has no less than 107.3438 seconds in the experiment. From Table 6, the comparison results on RMSE and computational time corresponding to the overlap factor as 1.5 performed better than as 3.0. That is to say, under the circumstances to cover the training data, larger does not necessarily lead to a better performance and conversely a large number of training data points and more computational time are required to construct all SRMs. Therefore, compared with other methods, our method achieved better testing RMSE with less computational time and also had relatively good generalization ability.
Example 3. In this example, the following nonlinear dynamic system was used:501 data points are obtained from (31) and is a Gaussian noise with variance =0.25 that is shown in Figure 5. The number of the used test data is 1001. In order to compare the performance of the proposed method with other approach, the results and the curves are given in Table 7 and in Figure 5, respectively. These results show that the proposed method outperforms G-LSSVR, local-LSSVR, and , and RMSE in Table 7 indicates the proposed method had the best generalization performance. In addition, the running time of the proposed methods in Table 8 is approximately 10 times shorter comparing with local-LSSVR method at least. In other words, the proposed method leads to a less computational time than local-LSSVR. As shown in Table 8, local-LSSVR needs more computational time. Additionally, the importance of selected different overlap factor is also compared by Table 9.
Example 4. In this section, two hundred and ninety-six simulated data generated from a real Box-Jenkins  system are applied to the proposed method. These data points consisted of the gas flow rate signal and the concentration of which is described as the output of . Figure 6 shows the training data that include the input signal and the output signal . To identify the model, we choose as the input variables and as the output variable. In this example, 5-folds cross-validation is employed to evaluate the performance. According to the cross-validation method, the (training/testing) RMSE and the computational time of the global LSSVR (G-LSSVR) approach, the local-LSSVR (L-LSSVR) approach with , and the proposed approach SRMs are summarized in Tables 10 and 11. From Tables 10 and 11, there is a little larger RMSE (training) for our technique than that of those approaches, the RMSE (testing) corresponding to our approach is smaller than that of them, and the run time for our technique is smaller than other local modeling approaches but litter bigger that global modeling techniques based on LSSVR. Figure 7 gives the comparisons between the actual output and the predicted output of our techniques. The importance of the selected different overlap factor is also compared in Table 12. As shown in Table 12, although the RMSE (training) corresponding to the overlap factor of 3.5 is less than that of 2.8, the RMSE (testing) corresponding to the overlap factor as 2.8 is less than that of as 3.4. That is to say, generalization performance in relatively smaller value of outperforms that of the large value. Additionally, in comparison of the , the proposed method only utilizes fewer hyperparameters to construct model, and the overlap factor is chosen in relatively smaller value to further reduce more computational time.
In this paper, a fuzzy weighted least squares support vector regression (FW-LSSVR) method for nonlinear system modeling have been proposed and illustrated based on the advantages of fuzzy weighted mechanism and some ideas from LSSVR. Considering that each training subset is mutually independent, all SRMs can be constructed simultaneously and our method can largely reduce computational time. As shown in our experimental results, there have better superiorities in calculation time and modeling accuracy for our approach than those approaches such as local or global modeling method. It is noted that, run time for our technique is smaller than other local modeling approaches but litter bigger that global modeling techniques based on LSSVR. Nevertheless, modeling accuracy for our approach has a considerable improvement than other techniques. Furthermore, in comparison with SVR, the proposed method only utilizes fewer hyperparameters to construct model, and the overlap factor is chosen in relatively smaller value than SVR to further reduce more computational time.
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declared that they have no conflicts of interest to this work. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.
The research was partially funded by the training program of high level innovative talents of Guizhou (2017]3, 2017]19), the Guizhou Province Natural Science Foundation in China (KY2016]018, KY2016]254), the Science and Technology Research Foundation of Hunan Province (13C333), the Science And Technology Project of Guizhou (2017]1207, 2018]1179), and the Doctoral Fund of Zunyi Normal University (BS2015]13, BS2015]04).
J. Suykens, L. Lukas, P. Van Dooren, B. De Moor, J. Vandewalle et al., “Least squares support vector machine classifiers: a large scale algorithm in,” in European Conference on Circuit Theory and Design, ECCTD, vol. 99, pp. 839–842, Citeseer, 1999.View at: Google Scholar
B.-Y. Sun, D.-S. Huang, H.-T. Fang, and X.-M. Yang, “A novel robust regression approach of lidar signal based on modified least squares support vector machine,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 19, no. 5, pp. 715–729, 2005.View at: Publisher Site | Google Scholar
H. B. Zheng, R. J. Liao, S. Grzybowski, and L. J. Yang, “Fault diagnosis of power transformers using multi-class least square support vector machines classifiers with particle swarm optimization,” IET Electric Power Applications, vol. 5, no. 9, pp. 691–696, 2011.View at: Publisher Site | Google Scholar
V. Kecman and T. Yang, “Adaptive Local Hyperplane for regression tasks,” in Proceedings of the 2009 International Joint Conference on Neural Networks, IJCNN 2009, pp. 1566–1570, USA, June 2009.View at: Google Scholar
T. Takagi and M. Sugeno, “Fuzzy identification of systems and its applications to modeling and control,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 15, no. 1, pp. 116–132, 1985.View at: Google Scholar
G. E. P. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control, Holden-Day, San Francisco, Calif, USA, 1976.View at: MathSciNet