Abstract

In view of techniques for constructing high-order fuzzy time series models, there are three types which are based on advanced algorithms, computational method, and grouping the fuzzy logical relationships. The last type of models is easy to be understood by the decision maker who does not know anything about fuzzy set theory or advanced algorithms. To deal with forecasting problems, this paper presented novel high-order fuzz time series models denoted as GTS based on generalized fuzzy logical relationships and automatic clustering. This paper issued the concept of generalized fuzzy logical relationship and an operation for combining the generalized relationships. Then, the procedure of the proposed model was implemented on forecasting enrollment data at the University of Alabama. To show the considerable outperforming results, the proposed approach was also applied to forecasting the Shanghai Stock Exchange Composite Index. Finally, the effects of parameters and , the number of order, and concerned principal fuzzy logical relationships, on the forecasting results were also discussed.

1. Introduction

In nearly two decades, fuzzy time series approach introduced by Song and Chissom has been used widely for its superiorities in dealing with imprecise knowledge (like linguistic) variables in decision making. In the literature, many studies have been made to propose new methods or improve forecasting accuracy for fuzzy time series forecasting. For simplifying the computational process, Chen [1] improved Song’s methods and presented a simplified forecast model in 1996. Since the lengths of intervals greatly affect forecasting accuracy in fuzzy time series, Yu and many others [25] adjusted the lengths of intervals with use of distribution or optimization technique. In view of higher accuracy of forecast results, the weighted forecast models concerned with the various recurrence and chronological order had also been improved by some researchers [68]. In addition, many original models based on the conventional fuzzy time series were presented and combined with novel algorithms or technologies. For example, Singh [9, 10] proposed fuzzy forecast methods to forecast the crop production based on computational method with difference parameters. Lee et al. [1114] presented several fuzzy forecast models based on the fuzzy time series, genetic algorithm, the simulated annealing algorithm, and type-2 fuzzy set to forecast temperature and TAIFEX. Kuo et al. [15] firstly introduced the particle swarm optimization (PSO) into the fuzzy time series models for forecasting TAIFEX. Song’s [16] and Aladag’s models [17, 18] gained more accurate forecasts by employing artificial neural network to determine fuzzy relationships.

Since first-order fuzzy time series models have a simple structure, they are easy when facing trouble on explaining more complex relationships. And the first-order models are not able to meet the demand of prediction involved in multifactors or long-term time series. As compared with the alternative forecasting models, such as ARIMA, hidden Markov, and Arch models, there are still much room for higher forecasting accuracy in applying fuzzy time series models. For these reasons, Chen et al. [1923] proposed some new methods which analyze high-order fuzzy time series forecasting model to deal with the enrollments forecasting problem. Aladag et al. [17] introduced a high-order model based on feed-forward neural network. Lee et al. [12, 24] had also presented some high-order models based on two-factor and genetic simulated annealing techniques. Most of time series researchers [14, 2528] had shown their, respectively, interest in high-order fuzzy time series forecasting models.

In process of forecasting with fuzzy time series models, fuzzy logical relationship (FLR) is one of the most critical factors that influence the forecasting accuracy. In view of techniques for partitioning the universe of discourse and constructing the fuzzy logic relationships effectively, the above high-order models consist of three parts. The first one is mining the FLRs by applying some advanced algorithms such as genetic algorithms, rough set, neural networks, type-2 fuzzy set, and simulated annealing algorithm [12, 14, 17, 18, 20, 21, 25, 27]. The second one is the class represented by Singh [9, 10] whose models are based on computational method. The last but not least one is the kind of models based on grouping the FLRs represented by [19, 2224, 26, 28]. The first type of hybrid models can get higher forecast accuracy than the other two classes. However, the forecasting process of these algorithms, like a black box, is not easy to be understood. Unlike the fuzzy set theory, its procedure and forecasts are not understandable and accountable for most of decision makers. Although the second type of models had been implemented on a real life problem of crop production and rice production besides enrollment forecast, the models have little to do with FLRs in the procedures of forecasting. It just obtains high forecasting accuracy by dividing the intervals to produce accurate localizations of the forecasting values. In regard to the third type models, the procedures of mining FLRs and forecasting principles are based solely on the FLRs sets, that is, conventional fuzzy time series. The forecasting procedure and principles had been expressed clearly for fuzzy time series researchers and easy to be understood by the decision maker who does not know anything about fuzzy set theory or prerequisite advanced algorithms.

For these reasons, this study proposes a high-order fuzzy time series model based on automatic clustering [2830] and generalized fuzzy logical relationships [31]. The process of abstracting the relationships matrixes among time series and finding out the patterns of time series fluctuations are carried out on the basis of understandable fuzzy rules. Of the above three kinds of models, the proposed method belongs to the third. Since the models of [19, 26] are similar finding the most appropriate forecasting principle with state-transition analysis and backtracking scheme, the models of [24, 26] aim for multifactor forecasting problems and [28] are improved by finding an optimal interval length. [19, 22, 23] then choose as the counterparts for comparing the single-factor forecasting results with determining length of interval. There are two data sets used for the empirical analysis: the enrollments of the University of Alabama and Shanghai Stock Composite Index close price. In view of the three criteria of evaluations: the root mean squared error, mean absolute error, and mean absolute percentage error, the proposed method gets a higher forecasting accuracy rate than the counterparts.

2. Preliminaries

2.1. Some Definitions

In view of making our exposition self-contained, this section provides some definitions and the framework of fuzzy time series models. Followed with some related definitions of fuzzy time series, framework of fuzzy time series model [16, 3234] is summarized in the second part of this section. At the end of this section, the definition and an operation for generalized fuzzy logical relationship are also presented to prepare for the proposed model in the next section.

Definition 1. A fuzzy set of the universe of discourse , is defined as follows: where is the membership function of the fuzzy set , , and denotes the membership degrees in the fuzzy set .

Definition 2. Let be the universe of discourse in which fuzzy sets are defined. Let be a collection of . Then, is called a fuzzy time series on .

Definition 3. Let and . The relationship between two consecutive observations, and , referred to as a fuzzy logical relationship (FLR), is denoted by , where is called the left hand side (LHS) and the right-hand side (RHS) of the FLR.

Definition 4. Let and ; and are the max values of and , respectively; then is called the first principal fuzzy relationship; if is the th max value of , then is called the th principal fuzzy logical relationship.

Definition 5. Let represent the LHSs of FLRG in the th principal fuzzy logical relationship at time ; is the number of FLRG “” in the th principal fuzzy logical relationship, ; . To compute the logical relationships between FLRGs, the intersection operator “” is defined asWe named “” intersection of the th principal fuzzy logical relationship.

This operation is not the only one for combining knowledge of all principal fuzzy logical relationships; one can define someone else for this work. Based on these definitions, this paper will present a high-order fuzzy time series model in the following section.

2.2. Automatic Clustering Algorithm

In this section, we review the automatic clustering algorithm for clustering numerical data. The algorithm is essentially a modification and improvement of the one presented [29, 30]. The algorithm is now summarized as follows.

Step 1. Sort the numerical data in an ascending order. Assume that there are different numerical data in the sorted data sequence and the ascending numerical data sequence is as follows:where and denote the numerical data with the same value, and . Then, the value of “,” which denotes the average of the differences between any two adjacent data, is defined with following formula:

Step 2. Based on the value of , determine wherever two adjacent numerical data and in the data sequence can be put into a cluster by the following three rules.(1)If is the only element in the first cluster, is a datum following , shown as , and ; then put into the cluster in which belongs to.(2)Given that is the last element in some cluster and is a datum following , shown as , let denote the average difference of the distances between every pair of adjacent data in the cluster and calculated as where denotes the element in a cluster and . If and ; then put into the cluster in which belongs to.(3)If is the last element in some cluster, is the first element in the next cluster, and is a datum following , shown as , where and ; then put into the cluster in which belongs to.

Step 3. Assume that the clusters obtained in Step 2 are shown as follows: ,, ,     ,  , where ; then simplify the above representation into the following form: , and transform it into the following form ,,. If a cluster has only one datum and is larger than the lower bound of its left side cluster , then we transform the cluster into . If the cluster is the first or the last cluster, we transform the cluster into , , respectively.

Step 4. Assume that the following clusters are obtained in Step 3: . Transform these clusters into contiguous intervals based on the following substeps.(a)Transform the first cluster into interval .(b)Set as the current interval and let be the current cluster. If , then transform the current cluster into interval , set as the current interval, and set the next cluster as the current cluster; if , then transform into interval , create a new interval between and , set as the current interval, and set the next cluster as the current cluster. If the current interval is and the current cluster is , then transform the current interval into , set as the current interval, and set the next cluster as the current cluster.(c)Repeatedly check the current interval and the current cluster until all the clusters have been checked.

3. Proposed Model

In this section, we present a novel multivariable forecasting model based on automatic clustering algorithm and generalized fuzzy logical relationships. Since the proposed method is a fuzzy time series model related to the number of factors denoted as and principal fuzzy relationship denoted as , we name it . In other words, means a multivariable fuzzy time series model based on factors and principal fuzzy logical relationships. In the similar way of the conventional fuzzy time series forecasting models, the proposed algorithm is introduced in a stepwise manner as follows.

Step 1 (define the universe of discourse and intervals with automatic clustering algorithm). For every factor, the universe of discourse can be defined as , . According to automatic clustering algorithm summarized in the above section, the universe of discourse is departed into some disjoin subintervals. For example, , is the midpoint of whose corresponding fuzzy set is , .

Step 2 (define fuzzy sets based on the universe of discourse and fuzzify the historical data for each factor). For a given factor, the fuzzy set would be expressed as , or , where , , is the number of intervals for the given factor. The value of indicates the membership degree of in . The historical and observed data are fuzzified according to the definition of fuzzy sets. For example, a datum is fuzzified to when the maximal degree of membership of that datum is in ; in other words, if , , then the data at time should be classified into the th class. In this paper, the fuzzy sets are defined with triangular fuzzy function showed by the following formula:The membership degree of the value at time in is defined by the following formula:where is the value at time and is the length of interval .

Step 3 (establish the fuzzy logical relationships based on the number of factors and principal relationship). Given the sample data set and the definition of fuzzy sets, all fuzzy logical relationships between two consecutive data are created. To forecast the time series, the fuzzy logical relationship matrix must be created in this step based on the fuzzy logical relationships. There are many different methods for the work; this paper applies the method proposed by Lee et al. in [8]. For example, the fuzzy logical relationships of a model can be grouped into relationship matrixes denoted as ().

Step 4 (forecasting model). For the given factor, let and let be the maximum membership degree, respectively. By Definition 5, we have the intersection fuzzy logical relationship , and then the forecasting is conducted by the following formula:where “” is a composition operation for forecasting with following principles.(1)If the sum of equal to 0, then the forecasted value is .(2)Otherwise, the forecast value is equal to the weighting aggregate of .Then, for a given , there are forecasts for time . The conclusive forecasting value for time can be obtained by following formula: where () is the adjustment parameter for the forecast; one can obtain it by minimizing the RMSE of the training data set. With the adjustments, the conclusive forecasting value can be adapted to simulate the fluctuation pattern of training data.

To know the detailed process of the model, please confer to reference [34].

4. Empirical Analysis

4.1. Data Description

To demonstrate the effectiveness of the proposed models, large and multiple amounts of data are needed. Since the fuzzy time series forecasting model was proposed, the enrollment data were widely used to test the improved methods. Furthermore, financial data are the most popular target of research, the typical data, Stock Price, have also been studied by people. Thus, the enrollments and Shanghai stock exchange composite index closing prices are used as the illustration data sets for the empirical analysis. The financial data used in this paper are the daily Shanghai Stock Exchange Composite Index (SSECI) closing prices covering the period from 1997 to 2006.

4.2. Criteria of Evaluation

In statistics, the root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) are three ways to quantify the difference between values implied by an estimator and the true values of the quantity which have been estimated before. MSE is a risk function to measure the average of the squares of the difference. For an unbiased estimator, the MSE is the variance, and the RMSE is the square root of the variance known as the standard error. In addition, RMSE has over MSE that its scale is the same as the forecast data. Thus, we take RMSE as the first representative of the size of an “average” error. MAE is also measured in the same units as the original data and is usually similar in magnitude to, but slightly smaller than, the RMSE. Because percentage errors are not scale-independent and MAPE is an average of the absolute percent errors, furthermore, MAPE is simply to calculate and easy to be understood, which attest to its popularity; we also take it as a criterion for comparisons of forecasting results in the paper.

4.3. Performance Evaluation

In Table 1, we have listed the results of on enrollments prediction. Compared with Hwang’s method [23] and Chen’s methods [19, 22] on the enrollment experiment, the proposed model has a more accurate prediction. Moreover, we also apply the proposed method to handle forecasting the close price of Shanghai stock index. The comparison of the four criteria of 1997’s SSECI is listed in Table 2. From these two tables, we can see that the proposed gets more accurate predicted results with the increase of . On all standards of evaluation, the trends are the same as the data of the 1997’s SSECI. Figure 1 has depicted the depicted results of the 1997’s SSECI monthly. In Figure 2, the last 41 predicted results of the 1997’s SSECI are depicted. From these figures, we can easily draw the same conclusion that the higher value, the more accurate the prediction.

In fact, we can also arrive at the same conclusion from the results of other years. The mean predictions of ten years’ SSECI from 1997 to 2006 are listed in Table 3. From the table, we can also arrive to the same conclusion that the higher values or values, the more accurate the prediction.

From Tables 2 and 3, it is clear that the higher values gain little RMSE and the higher order model is better than the lower. This conclusion can also be testified by Figure 2 which has told us another important message that the shorter length of interval can result in robuster forecast errors. All of these conclusions have an important meaning for the proposed mode which can be applied on other data set or area.

5. Conclusion

After discussing the high-order fuzzy time series models and presenting the definition and operation for generalized fuzzy logical relationship, we have proposed a novel high-order fuzzy time series models based on the new relationship and automatic cluster. The work is driven by three main reasons. Firstly, it is to generalized the fuzzy logical relationship; secondly, it is to abstract the relationships matrixes among time-series and find out the patterns of time series fluctuations based on understandable fuzzy rules. The last one is to make the fuzzy time series model able to explain more complex relationships.

In the future research, some suggestions are provided to improve this paper. Firstly, the relation of the principal fuzzy relationships and the conventional fuzzy relationships need to be discussed deeply. For example, what is the effects on the forecasting results threw by the definition of membership function and the operations of principal fuzzy logical relationship. Since the proposed model is on the basis of fuzzy logical relationship, to broaden the application of the proposed model and to obtain more forecasting accuracy, it is worth improving the model by hybridizing some advanced algorithms.

Conflict of Interests

The authors declare that they have no conflict of interests to this work.

Acknowledgments

This work was partially supported by the National Nature Science Foundation of China (NSFC 61261027), Natural Science Foundation of Jiangxi Province, China (20142BAB207013), and Foundation of Jiangxi Province Educational Committee (GJJ14642, GJJ14651).