Abstract

Forecasting activities play an important role in our daily life. In recent years, fuzzy time series (FTS) methods were developed to deal with forecasting problems. FTS attracted researchers because of its ability to predict the future values in some critical situations where most standard forecasting models are doubtfully applicable or produce bad fittings. However, some critical issues in FTS are still open; these issues are often subjective and affect the accuracy of forecasting. In this paper, we focus on improving the accuracy of FTS forecasting methods. The new method integrates the fuzzy clustering and genetic algorithm with FTS to reduce subjectivity and improve its accuracy. In the new method, the genetic algorithm is responsible for selecting the proper model. Also, the fuzzy clustering algorithm is responsible for fuzzifying the historical data, based on its membership degrees to each cluster, and using these memberships to defuzzify the results. This method provides better forecasting accuracy when compared with other extant researches.

1. Introduction

Time series are widely observed in many aspects of our lives; therefore, the prediction of future values based on the past and present information is very useful. In practice, there are several emergent domains that require dealing with short multivariate time series. As a consequence, the prediction of such time series arises in many situations. There are many existing techniques that are well proven in forecasting with multivariate time series data, but they put constraints on the minimum number of observations and require distribution assumptions to be made regarding the observed time series.

Fuzzy time series (FTS) models have become increasingly popular in recent years because of their ability to deal with time series data without the need for validating any theoretical assumptions. However, how to select the proper model, how to partition the universe of discourse and determine effective lengths of intervals objectively to fuzzify the numerical data, and how to defuzzify the results are still open critical issues. These issues are very important and affect the model accuracy. The paper probes into these three questions in the modeling of FTS. The new method incorporates the fuzzy clustering and genetic algorithms (GA) with FTS to reduce its subjectivity and improve its accuracy. More specifically, the new method uses the integer genetic algorithm to search for the optimal model that fits the available data. (In this paper, to solve integer optimization problems, we used the MI-LXPM algorithm which is a suitably modified and extended version of the real coded genetic algorithm, LXPM. In MI-LXPM, a tournament selection procedure, Laplace crossover, and power mutation are modified and extended for integer decision variables. Moreover, a special truncation procedure for satisfaction of integer restriction on decision variables and a “parameter free” penalty approach for constraint handling are used in MI-LXPM algorithm. More details of these operators are defined in [1].) In addition, fuzzy clustering is used to partition the universe of discourse objectively. Furthermore, the method employs clustering centers and the observations’ fuzzy memberships to defuzzify the results, instead of the centers of each interval, which are used in numerous existing models. The empirical results show that the new model is able to forecast with high accuracy measures than the counterpart of existing models.

GA was used before in the literature of fuzzy time series. For example, Chen and Chung [2] use GA to tune up the length of each interval in the universe of discourse for one factor (variable), Lee et al. [3] use genetic algorithms to adjust the length of each interval in the universe of discourse for two factors, Kang [4] uses GA to obtain the optimal fuzzy membership function, while Egrioglu [5] uses genetic algorithm for finding the elements of fuzzy relation matrix.

Many authors applied fuzzy clustering in the fuzzification process. For example, Cheng et al. [6], Li et al. [7], Egrioglu [5], and Yolcu [8] used fuzzy c-means for the fuzzification of time series. While Lie et al. and Egrioglu applied their methods in univariate time series, Cheng et al. applied their method in both univariate and multivariate time series.

The remainder of this paper is organized as follows. Section 2 describes in brief the concept of FTS and its models. Section 3 presents the new FTS method; it begins with a method overview. Then the details of each step of the new FTS forecasting model are described. In Section 4, the method is evaluated by comparing its forecasts with those derived from other related FTS methods based on an example. Finally, Section 5 summarizes the main conclusions.

2. The Basic Concepts of Fuzzy Time Series and Its Models

Fuzzy set theory provides a powerful framework to cope with vague or ambiguous problems and can express linguistic values and human subjective judgments of natural language. In 1993, Song and Chissom [9, 10] successfully employed the concept of fuzzy sets presented by Zadeh [11] and the application of fuzzy logic to approximate reasoning presented by Mamdani [12] to develop the foundation of FTS forecasting. Initially, FTS was proposed to deal with forecasting problems where the historical data are linguistic values. But recently FTS models have become increasingly popular because of their ability to deal with quantitative time series data with limited or even no theoretical assumptions in contrast with conventional time series models. The main difference between a traditional time series and FTS is the values of observations. In a traditional time series, the observations are represented by crisp numerical values. However, in a FTS, the values of observations are represented by fuzzy sets. The basic definitions of FTS could be presented as follows [10].

Let: the set of all real values: the universe of discourse, ,: fuzzy time series, ,: fuzzy set, ,: the fuzzy relation between and .

FTS. Let , a subset of , be the universe of discourse on which fuzzy sets , are defined and is the collection of , . Then is called a FTS on , .

can be understood as a linguistic variable and as the possible linguistic values of . Because at different times, the values of can be different, is a function of time . Also, since the universes of discourse can be different at different times, is used for the universe of discourse at time .

The th Order Model. Suppose is caused by , and simultaneously. This relation can be expressed as the following fuzzy relation equation:  where is the Cartesian product and is defined as the fuzzy relation between and , , and . Then (1) is called the th order model of . When , the model is called the first-order model.

Despite the large number of FTS methods developed since the introduction of the subject by Song and Chissom [9, 10, 16], the main steps in most of them are similar to the first method [9], which could be summarized as follows:(1)defining and partitioning the universe of discourse, defining fuzzy sets and fuzzifying time series,(2)establishing fuzzy logical rules,(3)grouping the fuzzy logical rules,(4)calculating the forecasted fuzzy sets,(5)defuzzifying the forecasted results.

Most of the work done in FTS employs one variable to build a fuzzy time-series model, that is, univariate model. Recently, FTS models have employed multiple variables in forecasting processes to deal with more complex forecasting problems, that is, multivariate model [17, 18]. It is important to note that the term “multivariate” in most FTS literature is limited to the prediction of one variable of interest, taking into consideration the multiple variables that could have effects on it, into the prediction process. To forecast some or all of the other variables, the same process has to be individually repeated for each variable. In this paper, we adopt the same understanding to deal with multivariate fuzzy time series data.

3. The New Method

In this section, we present the new forecasting method for fuzzy time series. Section 3.1 provides an overview of the new method. Section 3.2 presents the notations and definitions. Finally, the detailed description of the new method is given in Section 3.3.

3.1. Method Overview

Figure 1 summarizes the structure of the new method in the following steps.Set the upper bound and lower bound for the model parameters, the probability of crossover (), and probability of mutation (). Also set the stopping criteria (the algorithm stops if the weighted average relative change in the best fitness function value over generations is less than or equal to tolerance (TolFun) or a prespecified maximum number of generations is reached [19]), maximum number of generations, and fitness function tolerance (TolFun).With the language of GA, the method begins with creating a random initial population of chromosomes. Each chromosome suggests one possible model and consists of number of genes that is equal to the model’s parameters and the values which specify that model.Make a mating pool by applying the tournament selection procedure on initial (old) population [1].Applying FTS procedure which is conducted for each chromosome in the mating pool, this procedure can be briefed as follows.(i)Fuzzification: applying fuzzy clustering to partition each time series according to the specific number of clusters (as specified in the chromosome), ranking cluster centers of each variable, and then fuzzifying the historical data.(ii)Building fuzzy rules: building fuzzy relations according to specific order and number of variables (as specified in the chromosome) in addition to assigning membership degree of each data point to each cluster.(iii)Defuzzification: defuzzifying the forecasted outputs using the membership degrees and the clusters centers.(iv)Calculate the penalty function which includes a term for infeasibility. If the member (chromosome) is feasible, the penalty function will be the fitness function; otherwise, it will be the maximum fitness function among feasible members of the population, plus a sum of the constraint violations of the (infeasible) point.Apply Laplace crossover and power mutation to all individuals in mating pool, with and to make new population.Replace the current population with the offsprings to form the next generation.Check the stopping criteria. If satisfied stop; else go to [19].

After this overview of the new method, we will go through the method in more detail.

3.2. Notations and Definitions

Assume that we have time series (or variables) ; and without loss of generality, assume that the first time series is the main time series that we want to forecast and the rest of time series, that is, , , are the auxiliary time series.

The following notations and definitions are used throughout the formulation of the new method: denotes the variable’s (time series) number; , where is the total number of variables. denotes the time point number; , where is the total number of time points. denotes the actual value of the variable at time ; ; . ; denotes the forecasted value of the variable at time ; ; .; . denotes the cluster’s number; , where is the total number of clusters.

Generally speaking, the number of clusters could take any value from 1 to . However, in the new method a number of clusters lower than 3 is considered too small and a number of clusters higher than is considered too large. If the number of clusters is too small, there will be no fluctuations in FTS. On the other hand, if the number of clusters is too large, the meaning of FTS will be diminished and it creates computational complexities [20], although the higher number of intervals may contribute to reaching better forecasting accuracy.

In the new method we assume that takes values ranging from 3 to ; that is, denotes the model order, which indicates how many previous time points we use to predict the present time. It takes values ranging from 1 to (). Generally, lower and upper limits of the model order are optional. However, it is common in the literature that the model order takes values between 1 and 5. In the new method, we consider that the upper limit of the model order is (). That is, ; : the indicator variables, each one takes the value 0 or 1 to indicate the absence or presence of the auxiliary variable in the model; that is,

3.3. Description of the New Method

In this section, each step of the new method is further illustrated and detailed as follows.

Step 1 (initialization). The first step is to create an initial population of possible models or chromosomes. For each model, three main factors are of concern:(1)the number of clusters or fuzzy sets which will be used to fuzzify the historical data;(2)the order of the model, whether first order or high order;(3)the selected auxiliary variables to be entered in the forecasting model.

These factors are considered as the parameters of the model, so each chromosome the GA created represents one possible model. Accordingly, the number of genes in each chromosome is set equal to the number of auxiliary variables () plus 2. That is, each chromosome consists of () genes. Figure 2 illustrates the model parameter as a chromosome.

The population of chromosomes is generally chosen at random. There are no hard rules for determining the size of the population. Generally, the size is defined to range between 20 and 100. Large populations guarantee greater diversity and may produce more robust solutions. Following this, the initial population is used to make a mating pool by applying the tournament selection procedure [19]; for more details, see [1].

Step 2. Applying the fuzzy time series procedure for each chromosome in the mating pool as follows.

(1) Each time series , in the possible model at hand is partitioned using fuzzy c-means into clusters (for more details about fuzzy c-means, see [21]). When fuzzy c-means algorithm is applied, the clusters’ centers for each time series, denoted by , , and , are calculated iteratively. After the clusters’ centers are fixed, the memberships of each time point with respect to these centers, denoted by , , , and , are computed. Then, the clusters of each variable are ranked ascendingly according to their centers’ values, and these ranks are used as linguistic values (fuzzy sets) , , and such that and .

(2) Fuzzify each time series by replacing each value with the equivalent linguistic value to obtain the FTS, that is, each crisp value is mapped into a fuzzy set where its membership degree has maximum value.

(3) Given that the model order and the indicator variables ’s are as specified in the randomly selected chromosome at hand, the fuzzy rule is established as follows: where is the value of the FTS at time , , and is the row vector of membership degrees of the main variable at time to the clusters; .

For example, assume that we have two fuzzy time series (the main variable) and (the auxiliary variable) both clustered and fuzzified into four fuzzy sets (i.e., ) and assume also that the model is of order 2 (i.e., ). Let the current states of the main variable be and , let its next state be , and let the next state vector of membership degrees to the four fuzzy sets (i.e., ) be equal to .

Similarly, let the current states of the auxiliary variable be and . Then, under these assumptions, the rule can be induced as follows:

494239.fig.005

Now moving one time point ahead, let the current states of become and , let the next state be , and let its vector of membership degrees be equal to . And for the auxiliary variable let , . Then, the corresponding rule can be stated as

494239.fig.006

It is important to notice that, for each of the above two rules, the vector of membership degrees in the right-hand side is the main variable’s next state row vector of memberships; that is, it has no relation with the vectors of memberships of all of the current states appearing in the left side of the rule whether they belong to the main variable or to the auxiliary variable.

 Group all generated fuzzy logical rules that have the same left-hand side. If this grouping process resulted in any group with more than one relation, then the average vector of all related membership vectors in this group is calculated. The resulting groups will be denoted by ; , , where is the total number of groups with at least one relation. The average membership vector : and where is the total number of groups with at least two relations.

For example, assume that we have the following 3 rules attributed to 3 different points of time:

494239.fig.007

These rules have similar left-hand sides (same fuzzified classes of the current states) but different vectors of membership degrees of the 3 next states (different right-hand sides). Hence, by calculating the average vector of membership degrees, we get the following final rule representing this group:

494239.fig.008

To calculate the forecasted output for each time point , that is, to forecast , , we have to match the current state of time point with the left-hand sides of the grouped constructed fuzzy rules. Three possibilities could be faced.If the current state exists in one of the groups, say , then the forecasted output of year , that is, , is calculated as where is the row vector of clusters’ centers of the main variable.If there are no fuzzy relations with the same left-hand side in the grouped constructed fuzzy rules whose current state is which is the case if one or more of the time series points is missing or for the first future time point in the time series, then we try to reduce the order of the model gradually to reach the same left-hand side; then we apply case .If there are no fuzzy relations in the grouped constructed fuzzy logical rules with any order whose current state is , then the forecasted output of time point will be taken as the center of the cluster .

Calculate the fitness value that reflects the prediction accuracy of the main variable under the specified model. Theoretically speaking, the search is for the proper model thatminimizes a certain measure of solution error;maximizes the separation between the clusters centers of the main variable.

Hence, a fitness function composed of two measures is proposed in this method to compare the possible models performance or prediction accuracy: the symmetric mean absolute percentage error (sMAPE) for measuring the solution error and the minimum absolute distance for measuring the cluster separation.

sMAPE can be defined as follows [22]: where is the actual value at time point , is the forecasted value at time point , and is the number of time points used in the calculation. The formula above provides a result between 0% and 200%, which is considered an advantage to this measure. The closer the value of sMAPE to 0 is, the better the forecasting accuracy is.

sMAPE is well known in FTS literature as a measurement of solution error as it has an independent scale and it is less influenced by extreme values. It also corrects the computation asymmetry of the percentage error [22]. Hence, the fitness function in the new method depends mainly on the sMAPE as a measure of solution error; to improve the accuracy of the results, sMAPE has to be minimized.

The minimum absolute distance between the centers of clusters is chosen to measure the overlapping between clusters. The minimum absolute distance can be defined as follows: where and are any two clusters for the main variable. Conventionally, can take any positive value.

Now, we have a two-objective optimization problem. Since the parameters of the model take only integer values, it is not easy to find software that can solve multiobjective integer genetic algorithm. To overcome this deficiency, the weighted sum of objectives method [23] is used. This method is the simplest approach and is probably the most widely used one.

The weighted sum method [23] scalarizes a set of objectives into a single objective by premultiplying each objective by a user-supplied weight. There is one critical question that arises here: what values of the weights must one use? The answer depends on the importance of each objective and the scaling of them.

The scale of the first objective (sMAPE) ranges from 0 to 2, while the second objective can take any nonnegative value. To guarantee that both objectives have the same scale, we redefine the second objective as follows: Equation (9) is considered as a proportional absolute distance between the centers of two clusters and relative to their average.

Accordingly, the overall fitness function can be defined as a linear combination of the two objectives (7) and (9) as follows: where is the weight (a simulation study is conducted to determine suitable weights for the two objectives. The simulation result shows that the best value for the weight is 0.9 because it gives lower sMAPE) and it takes any arbitrary value between 0 and 1.   is the shortest distance between cluster centers.

The formula in (10) is considered as a tradeoff between the prediction accuracy and the reasonability of the clustering results.

Step 3 (selection and mutation for all chromosomes in the mating pool). To produce the chromosomes for the next generation, genetic algorithm creates three types of children for the next generation:(i)selection of elite children which are the individuals in the current generation with the best fitness values. These individuals automatically survive to the next generation.(ii)Crossover children, created by combining the vectors of a pair of parents.(iii)Mutation children, created by introducing random changes, or mutations, to a single parent.

Step 4. Replace the current population with the offsprings to form the next generation.

Step 5. If the end conditions are not satisfied, go to Step 2. Otherwise, stop and retain the best solution in current population.

4. Experimental Results

In this section, we consider two examples, one for multivariate time series and the other for univariate case. Predictions that resulted from the new method are compared with the well-known results reported in the literature by other researchers. The forecast accuracy is compared by the mean absolute percentage error (MAPE) and mean square error (MSE). Suppose the actual value in time point is , the forecasted value in time point is , is the total number of time points used in calculation; then the MAPE and MSE are computed by the following equation, respectively:

4.1. The Multivariate Case

Table 1 shows the total number of annual car accidents casualties in Belgium from 1974 to 2004 [13]. Here, the main time series is the number of killed persons and the auxiliary time series are mortally wounded and died within 30 days, severely wounded, and light casualties.

To implement the new method, The mixed integer genetic algorithm module in MATLAB package is used after making slight modifications to cope with the nature of the FTS problem. The solution steps of the new method are carried out using a specially developed MATLAB code.

The best model selected by the new method, in the language of Figure 2, is represented by the chromosome (630001). This means that the best model found is of order three (second gene) with one auxiliary variable (last gene); namely, the light casualties and a number of fuzzy sets equal 6 (first gene). The other three auxiliary variables are ruled out from the fitted model as their corresponding indicator variables take the value 0 in the chromosome. The trends of the actual values and the forecasted ones obtained by the new method are shown in Figure 3.

As seen in Table 2, the new method has the smallest values for MSE and MAPE: 445 and 1.28, respectively; that is, the new method outperforms the Jilani and Burney model [14] and Egrioglu et al. model [13] which are superior to other existing methods in the literature in terms of the mean square error and the average forecasting error rate.

4.2. The Univariate Case

New method is applied on the famous enrollment data in FTS literature (Table 3). The best model according to the results of the new method is (55). This means that the best model found is of order five and a number of fuzzy sets equal 5. It should be noted that in the univariate time series the chromosome genes reduce to two as we have no auxiliary variables in this case. The trends of the actual values and the forecasted ones obtained by the new method are shown in Figure 4.

Comparison of the forecasting results is made between the new method and two other recently published ones, the Kuo et al. method [15] and the Egrioglu method [5]. Both mentioned methods considered the same enrollment data set and proved being superior to other existing methods in the literature in terms of the mean square error and the average forecasting error rate. From Table 3, it can be seen that the forecasting accuracy of the new method outperforms both mentioned methods in terms of MSE and MAPE.

5. Conclusion

When dealing with short multivariate time series, it is obvious that the assumptions of classical time series are not satisfactory. This paper introduces a new fuzzy time series method to deal with such cases. The new method drew on the ability of FTS approaches to deal with time series data without the need for checking any various theoretical assumptions. It also integrates the fuzzy clustering with genetic algorithm to improve the prediction accuracy for solving the multivariate high order FTS.

In the new method, fuzzy clustering algorithm is responsible for fuzzifying the historical data based on its membership degrees to each cluster, then using these memberships to defuzzify the results. Also, genetic algorithm is responsible for selecting the proper model. The method is applied to time series of the yearly car accident causalities in Belgium from 1974 to 2004 and to the famous university enrollment data set. It provided forecasts with MSE and MAPE values that are smaller than those obtained from the other existing fuzzy time series methods.