Abstract

Time series forecasting models based on a linear relationship model show great performance. However, these models cannot handle the the data that are incomplete, imprecise, and ambiguous as the interval-based fuzzy time series models since the process of fuzzification is abandoned. This article proposes a novel fuzzy time series forecasting model based on multiple linear regression and time series clustering for forecasting market prices. The proposed model employs a preprocessing to transform the set of fuzzy high-order time series into a set of high-order time series, with synthetic minority oversampling technique. After that, a high-order time series clustering algorithm based on the multiple linear regression model is proposed to cluster dataset of fuzzy time series and to build the linear regression model for each cluster. Then, we make forecasting by calculating the weighted sum of linear regression models’ results. Also, a learning algorithm is proposed to train the whole model, which applies artificial neural network to learn the weights of linear models. The interval-based fuzzification ensures the capability to deal with the uncertainties, and linear model and artificial neural network enable the proposed model to learn both of linear and nonlinear characteristics. The experiment results show that the proposed model improves the average forecasting accuracy rate and is more suitable for dealing with these uncertainties.

1. Introduction

Prediction for future data based on analyzing temporal data is an important way to explore the value of data since a precise prediction is conductive to make policy analysis and decision in many fields, such as government [1], economics, and management [2]. However, considering the instability of the data sources and the unreliability of data collecting process, most of the collecting data contain incomplete, imprecise, and ambiguous records, which makes preprocessing an indispensable procedure for machine learning. Thus, forecasting methods based on fuzzy time series have been proposed to cope with uncertainties caused by vagueness, ambiguity, and other nonprobabilistic reasons and widely applied in finance domain, such as Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX) forecasting [311], the NTD/USD exchange rates forecasting [6, 8], and the market price of shares of the State Bank of India (SBI) at the Bombay Stock Exchange (BSE) forecasting [1219].

In 1965, Zadeh [20] developed fuzzy set theory. By replacing the values of time series with fuzzy sets, a fuzzy time series (FTS) model was proposed by Song and Chissom [2123] in 1993. In past decades, many forecasting methods based on this framework were proposed. Most of these researches used an interval-based FTS model to handle the fuzzification of the time series and applied fuzzy logic relationship, which can be executed on the FTS dataset, for making forecast. For example, Chen [24] developed a method for forecasting enrollments based on high-order fuzzy time series. Chen and Chang [3] published a method for multivariable fuzzy forecasting based on fuzzy clustering and fuzzy rule interpolation techniques. Huarng [15] determined interval lengths by using distribution-based and average-based lengths to improve the forecasting accuracy. Chen and Chen [25] constructed a hybrid fuzzy time series model based on granular computing. Cai et al. [9, 10] selected the partition points of the universe of discourse by adopting the metaheuristic optimization algorithm. Qu et al. [26] applied multipartition to improve the process of fuzzification and adopted a linear model to deal with samples that are not suitable for fuzzy logic relationship. Recently intuitionistic fuzzy set (IFS) [27] and hesitant fuzzy set (HFS) [28] have gained attention of researchers [16, 29] and proposed a fuzzy time series forecasting model based on intuitionistic fuzzy logical relations, respectively. Bisht et al. [17, 30] explained fuzzy time series forecasting models built by hesitant fuzzy logical relations to address issue of aforesaid nonstochastic hesitation. Gupta and Kumar [19] proposed an aggregation operator to aggregate hesitant probabilistic fuzzy elements to fuzzy elements. These models are focused on modeling with fuzzy logic relationship. However, after high-order fuzzy time series was proposed, many works began to employ the linear model to make forecast. Cai et al. [9] adopted Levenberg–Marquardt to build the forecasting model of high-order time series. Askari et al. [31] depicted a forecasting model based on fuzzy clustering and linear combinations of the input variables (CFTS) instead of fuzzy logic. These models make great performances on forecasting accuracy.

1.1. Motivation

In 2015, Askari et al. [31] depicted CFTS which applied fuzzy C-means clustering to replace interval-based fuzzification. CFTS [31] made a great improvement of predict accuracy, which suggests that a cluster-based linear model is much more suitable for this problem than other existing models applied. Clustering is widely adopted by the fuzzy time series model, and it is mainly used to determine the fuzzy sets to make a more suitable partition of the universe of discourse [32]. In fact, as for time series, clustering is usually executed on the set of subtime series [3, 33] and integrate all forecasting results given by models constructed on each clusters. The time series within a cluster is highly similar, which could be modeled with some simple models, such as the linear model. Chen and Chang [3] applied fuzzy C-means clustering algorithm to construct fuzzy rules, with which to make forecast. Cheng et al. [11] defined a similarity with fuzzy logic relationship and employed K-means to improve the forecasting accuracy. Aladag et al. [34] executed fuzzification with Gustafson–Kessel fuzzy clustering to eliminate the influence of the number of intervals. All of them still adopt fuzzy logic to construct the relationships and make a little improvement.

Recently, most of FTS research studies focus on applying the linear model to construct the relationship between observed data and the data to predict. Talarposhti et al. [35] employed an exponential model to construct the forecasting model based on fuzzy time series. These models show a little improvement in accuracy compared with the model based on fuzzy logic relationship. Cai et al. [9, 10] applied Levenberg–Marquardt algorithm to build the model on the relationship based on interval-based fuzzification. Also, CFTS employed a linear model to build the relationship instead of fuzzy logic. These models do improve the forecasting accuracy of FTS.

However, most of the economic research studies suggest that the linear model is not always suitable for market price [36] since the markets are commonly imperfect as, for example, in the case of a higher interest rate for borrowing or different risk-premia for the seller and the buyer. Artificial neural network is capable of modeling these nonlinear data characteristics and has attracted a lot of interest as a popular tool of artificial intelligence in the late years. More and more research studies try to develop a forecasting model based on ANN. Huarng and Yu [37] adopted artificial neural network (ANN) for building the nonlinear model of fuzzy time series. Yolca and Alpaslan [38] integrated all steps into one ANN to reduce the training error propagation between steps. Selvin et al. [39] employed three deep learning approaches, including convolutional neural network (CNN), recursive neural networks (RNN), and long short term memory (LSTM), for making forecast of National Stock Exchange. ANN shows a great ability to model the time series of market prices than the linear model.

Askari’s CFTS model [31] suggests that the cluster-based linear model could make an improvement on the forecasting accuracy, which reduces the forecasting error to of other existing FTS models. However, unlike interval-based fuzzification, CFTS replaces the process of fuzzification with fuzzy C-mean clustering [40], and it works with the time series instead of fuzzy time series. Besides, the fuzzy C-mean clustering just gives the grade of membership by exact distance, and it could not handle the incomplete, imprecise, and ambiguous data. Essentially, CFTS is not a real fuzzy time series forecasting model.

For reasons above, we proposed a new fuzzy time series forecasting model based on multiple linear regression and time series clustering. This work makes three main contributions:(1)First, we applied preprocessing to transform fuzzy time series set into a weighted time series set. Then, we applied the synthetic minority oversampling technique (SMOTE) to deal with the unbalanced samples and finally transformed the weighted time series set into a time series set, on which the multiple linear regression model (MLRM) and ANN could be worked.(2)Second, we developed a novel high-order time series clustering algorithm based on the multiple linear regression model, which clusters data by similarity of linear relationship instead of shape, to extract suitable linear models for dataset.(3)Finally, we designed a new forecasting model for FTS based on the multiple linear regression model and ANN, which employs the ANN to give the weights of each multiple linear regression. We devised a specific learning algorithm for our forecasting model to train these ANNs together. Unlike ANN-based fuzzy models, we adopted ANN to give the weight of linear models instead of forecasting results, which could model the nonlinear characteristics of the market prices. Besides, for each clusters, the forecasting model builds a linear model on it instead of fuzzy logic relationship, which other cluster-based FTS models used.

The experimental results show that the proposed model has a higher average forecasting accuracy rate than other existing FTS-based models.

The rest of this paper is organized as follows. Section 2 briefly reviews some basic concepts and algorithms. Section 3 explains a new forecasting model based on fuzzy time series and describes the key steps in detail, and it shows how the proposed model works with a case of TAIEX. Section 4 shows the comparison of forecasting results of the proposed model and other existing models. The conclusion is provided in Section 5.

2. Preliminaries

This section will introduce terminologies and briefly reviews basic concepts and algorithms.

2.1. Fuzzy Time Series

To deal with the fuzzy, incomplete sequences containing noise, Song and Chissom [2123] introduced the concept of fuzzy mathematics into time series and propose the concept of fuzzy time series.

In fuzzy time series, values are represented by fuzzy sets defined on the universe of discourse U, where . A fuzzy set A can be represented bywhere denotes the membership function of fuzzy set A, and denotes the grade of membership of belonging to the fuzzy set A and .

Definition 1. Let be the universe of discourse consisting of real numbers, on which fuzzy sets defined. Let be a collection of . Then, is called a fuzzy time series defined on .

Definition 2. Let be a fuzzy time series. Assume that there is a relationship between and that satisfies , where and are fuzzy sets and is max-min composition operator; then, the is a fuzzy logic relationship denoted by .

Definition 3. Suppose is caused by , it can be presented by a fuzzy logical relationship: .
The fuzzy time series model uses a four-step framework to make forecast: (1) define the universe of discourse and partition it into intervals; (2) determine the fuzzy sets on the universe of discourse and fuzzify the time series; (3) build the model of the existing fuzzy logic relationships in the fuzzified time series; and (4) make forecast and defuzzify the forecast values.

3. A New Forecasting Model Based on Fuzzy Time Series and High-Order MLRM-Based Time Series Clustering

In this section, we proposed a new forecasting method based on fuzzy time series and the clustering algorithm.

The proposed model is different from traditional FTS-based models that adopt fuzzy logic relationship. It builds the model which makes forecasting by combining the results of several linear models as CFTS [31]. This model consists of a set of linear models, and each model represents an individual rule that time series changes. For each sample, we could calculate the similarities related to every linear models and make forecast by weighted summation of the outputs of linear models according to the similarities. On the one hand, it is important to construct a suitable set of linear models; on the other hand, a corresponding precise estimation of similarity between each sample and the linear model is required. Thus, we proposed a high-order MLRM-based time series clustering algorithm to give a suitable set of linear models. After that, for each linear model, we adopted an ANN to obtain its weight for each time series sample. With the output of linear models and their weights, we could make forecast by weighted summation.

This model has four main steps during training process as shown in Figure 1:Step 1: generate the high-order fuzzy time series with the origin data set. The high-order time series is generated and fuzzified with method defined in [24], which generates fuzzy sets by partitioning the discourse of universe equally.Step 2: transform the set of fuzzy time series into a set of time series by the preprocessing defined in Section 3.1.Step 3: build a set of multiple linear regression models with a clustering algorithm described in Section 3.2.Step 4: train an ANN to calculate the grades of membership between high-order time series and the multiple linear regression model constructed, which is explained in Section 3.3.After training, we get a model consisting of a set of linear models and a set of ANNs. As displayed in Figure 1, for an observed time series, the details of making forecasting are as follows.Step 5: generate a high-order fuzzy time series corresponding to the observed time series data as Step 1.Step 6: transform the high-order fuzzy time series into a weighted high-order time series set that contains several samples as the preprocessing defined in Section 3.1.Step 7: for each weighted sample, without considering the weight, calculated the outputs of linear models and ANNs are obtained during training process. Then, calculate the weighted summation of the outputs of linear models according to the weights given by ANNs, which is described in Section 3.3. The results of the weighted summation are the forecasting results of this sample.Step 8: after all weighted samples have been forecasted without considering the weight, give the final output by making a weighted summation of these forecasting results according to these samples’ weights. The final output is the forecasting result of the observed time series.

Besides, we take the TAIEX data of 1999 as a case to show how our proposed model works in Section 3.4.

3.1. Preprocess of the High-Order Fuzzy Time Series

For high-order fuzzy time series, it easily to build a linear model with their corresponding actual high-order time series, if values of these actual high-order time series are not incomplete, ambiguous, which could be represented by a numeric value. However, it is hard to employ the linear model to build relationships with fuzzy sets, neither with the actual time series, when the actual time series contains incomplete, ambiguous value. Suppose is a high-order fuzzy time series that contains incomplete, ambiguous data. The detailed preprocessing of is as follows:Step 1: replace each dimension with the corresponding numeric value if could. Then, we obtain a hybrid vector , where could represent either a numeric value or a fuzzy set . And, we construct two empty sets of tuple and . If contains fuzzy set, add into ; otherwise, add it into and set the number of tuples .Step 2: if is empty, then go to Step 5; otherwise, choose a tuple from and remove it from .Step 3: for , suppose that is described by fuzzy set instead of numeric value, and could be represented as a vector of membership to all fuzzy sets, , where denotes the number of intervals that determined the fuzzy sets. We construct new tuples by replace with numeric values related to intervals defined before as follows:where denotes the average value of the kth interval, and the weight of could be calculated as follows:Step 4: for each constructed hybrid vector of tuple , if its related weight and still contains fuzzy sets, add into ; otherwise, if its related weight and only contains numeric values, add into . Then, set the number of tuples and go to Step 2.Step 5: for each tuple in , is a high-order time series sample, which only contains numeric values, and denotes the weight of it.

After the preprocessing, the high-order fuzzy time series are transformed into a new weighted high-order time series set, which consists of only numeric values. And, we applied this processing on each sample of the high-order fuzzy time series set and combined all of the weighted high-order time series sets obtained into one set . However, it is hard to deal with the weighted dataset, which is actually a kind of unbalanced sampling dataset. The commonly solution is to applied SMOTE [41] to deal with the undersampling. The SMOTE could regenerated a set of high-order time series with the weighted high-order time series set . For a particular sample in , it would have several corresponding samples in , andwhere denotes the number of corresponding samples in of and denotes the number of samples in .

The regenerated high-order time series set has the same distribution of sample as the weighted high-order time series set and could be easily processed by the multiple linear regression model and ANN.

3.2. A High-Order MLRM-Based Time Series Clustering Algorithm

The proposed model makes forecast by weighted summation of several linear models; this section describes the method to construct these linear models.

Forecasting of high-order time series is the matter of forecasting future values of a process with a certain number of observed time points. Supposed that we have observed with given integer t, and L is the number of observed time points of a high-order time series and wish to forecast . The forecasting model of time series could be described as follows:where is a map function. The multiple linear regression model is one of the most popular map functions.

As for forecasting models that adopt multiple the linear regression model, some of them classify time series into several groups by performing clustering algorithm [42], which usually applies similarity defined by shape-based distance (e.g., Euclidean distance). Since those time series in the same group are similar in shape with each other, the multiple linear regression model should also be similar. However, the similarity of the multiple linear relationship is not completely equivalent to the similarity of shape, which means that similar time series may cause different future values and vice versa. Since the reason mentioned above, we proposed a high-order MLRM-based time series clustering algorithm.

With the preprocessing described in Section 3.1, we transformed the set high-order of fuzzy time series into a set of high-order time series. Let be the set of high-order time series obtained, where . The given dataset should be classified into K clusters.

Suppose , where , be a set of multiple linear regression models, where multiple linear regression model corresponding to cluster . The distance between cluster and high-order time series can be calculated as follows:

The distance suggests the fitness of multiple linear regression model to the high-order time series. In other words, small value of means that the multiple linear regression model of time series is similar with the multiple linear regression model .

The clustering algorithm could group the time series into clusters according to the distance defined in equation (6) so that the multiple linear regression models of time series within a cluster are highly similar. Thus, the proposed clustering algorithm aims at minimizing objective function J, given bywhere denotes the set of time series classified into the ith cluster and indicates that the time series is classified into cluster .

Figure 2 shows the process of high-order MLRM-based time series clustering algorithm, and the details are described as follows.

3.2.1. Initialization

The above discussion shows that there are high-order time series in dataset and K clusters wished to obtain. Since it is not easy to determine whether a couple of time series are similar in linear relationship or not, it is hard to give a better initialization of the cluster. Besides, for each clusters, in order to initialize its related linear model with multiple linear regression algorithm, there should be enough time series belonged to it. Thus, we divide into K clusters equally and randomly. Therefore, each cluster has around time series belonged to it.

Since each cluster has a corresponding multiple linear regression model, K multiple linear regression models need to be constructed. As for cluster , which consists of a set of time series, each series has an individual weight. In order to adopt multiple linear regression, we applied a weighted multiple linear regression model which is described in [43]. could be partitioned into a matrix of explanatory variables and a vector of response variables . Besides, and could be obtained as follows:where is the jth time series which belonged to cluster and is the number of time series belonged to cluster .

According to the given and , the multiple linear regression models of cluster is initialized as follows:where MLR() is the regular process of multiple linear regression.

Hence, the set of multiple linear regression models can be initialized as . And, each multiple linear regression model relates to a cluster constructed.

Initialize a matrix of distance that has K rows and columns. is the initial distance that between cluster and sample and can be initialized aswhere represents the initial multiple linear model corresponding to cluster .

3.2.2. Update of the Distance Matrix and Classification of Time Series

During the th round of iteration, assume that is the set of multiple linear regression models obtained in the previous round. The distance matrix is updated according to . As the distance defined in equation (6), can be updated by:where represents the multiple linear model corresponding to cluster in th iteration.

Each cluster’s time series set would be set as empty, and each time series would be reclassified into a cluster, constructed in the previous round, which leads to the minimum distance. For example, given time series , and represents the distance between and cluster , which could be obtained by distance matrix defined by equation (11). By comparing , will be added to the time series set of the cluster that with the least distance. In other words, would be reclassified into only whenwhere represents the minimal function.

3.2.3. Update of the Set of Multiple Linear Regression Models

After all of the time series have been reclassified, the multiple linear models of clusters should also be updated. For cluster in the th round, if there is no time series belonged to it, continue with the next cluster; otherwise, the corresponding multiple linear regression model will be calculated according to all of time series that are reclassified into by the method described in the step of initialization. Yet, the multiple linear regression model could be built by partitioning into a matrix of explanatory variables and a vector of response variables and applying the multiple linear regression.

After all of the multiple linear regression models have been updated, if there are empty clusters that are ignored, new models should be generated to replace them. Suppose that there are empty clusters. First, an estimation of each nonempty cluster should be done. The estimation result of cluster is calculated as follows:

The small suggests that time series are highly similar with each other within .

In order to generate new models, we choose worst-performing clusters, whose estimations are larger than others. For each cluster, half of time series belonged to it is randomly selected, and a new multiple linear regression model is generated based on a cluster formed by them. Then, new models are obtained to replace old models of empty clusters.

If the number of clusters obtained is less than , which means we could not find worst-performing clusters, let , and generate new models by the process described above.

3.2.4. Stopping Criterion

If the multiple linear regression shows no differences with , or this process reaches the given maximum number of iterations, the iteration process ends. is the final output of the proposed clustering algorithm.

Finally, we get a set of multiple linear regression models , and each of them relates to a cluster.

3.3. A New Forecasting Model Based on Fuzzy Time Series and High-Order MLRM-Based Time Series Clustering

After the clustering defined former, we obtained a set of linear models. In order to make forecasting, we need to calculate the weighted summation of these linear models’ outputs. This section describes a forecasting model which adopts ANNs to give the weights of linear models.

The fuzzy time series model usually exacts fuzzy logic relationships to build the relationships between current state and next state, and the state transition matrix is the core of the model. In the MLRM-based forecasting algorithm, time series are clustered as other fuzzy time series models, but model is built without using state transition matrix. Instead, models of multiple linear regression are applied for mapping variables of current state into space of next state.

The proposed fuzzy time series model works on the high-order time series built as described in Section 3.1, and it contains four steps as other existing fuzzy time series models explained in Section 2. During making forecast, existing fuzzy time series models based on linear model [31] employed grade of membership, obtained by fuzzy C-means clustering. However, the distance defined by equation (6) depends on the numeric value to forecast. Since we could not obtain the numeric value before forecasting, this distance cannot be calculated. Thus, we build a model for each multiple linear regression model to calculate the grade of membership with only observed data.

Suppose the set of multiple linear regression models R, calculated as described in Section 3.2, containing K models. is the model corresponding to the ith cluster. Let the membership function of ith cluster be . Then, for a high-order time series obtained by the methods described in Section 3.1, the forecast result of can be calculated by weighted summing results of all multiple linear regression models according to the grades of membership:where is an algebraic dot production and is the grade of membership between and the ith cluster.

With given time series, the forecast model aims at finding a set of membership functions so that the root mean squared error (RMSE) of prediction results is minimized as follows:

To build the membership function models, we construct an ANN for each cluster. Then, the forecasting model can be described as in Figure 3.

3.3.1. Learning Algorithm of Forecasting Model Based on High-Order MLRM-Based Time Series Clustering

Traditional ANN usually adopts supervised learning, which uses a set of example pairs , and aims at finding a function in the allowed class of functions that matches the examples. Generally, the cost function is related to the mismatch between the mapping f and the data. Therefore, the expected output Y of the ANN is necessary.

In the forecasting model based on high-order MLRM-based time series clustering, each ANN corresponds to a membership function. Since we cannot give expected grades of membership between samples and multiple linear regression model, the expected outputs of ANNs is missing. In other words, we cannot train each ANNs individually. Thus, we designed a learning algorithm of the whole forecasting model.

The learning algorithm can be divided into two phases: propagation and backpropagation.

During the propagation, for each sample , each ANN takes anterior L-dimension of sample as inputs and generates its own output value as a grade of membership. After all grades of membership obtained, the forecasting model could calculate the prediction result by equation (14). The cost of the forecasting model is defined by squared error function as follows:where denotes the forecasting error of the jth sample.

After the process of propagation, the forecasting error propagates backward throughout the model to adjust the parameters. The derivative of the forecasting error with respect to a weight can be expressed as follows:

Given j, the derivative part of equation (17) can be simplified to

Then, equation (17) could be expressed as follows:

Treating all ANNs as equal, we supposed a constant as learning ratio for the weight sum layer of our model. Letthen, can be obtained as follows:

Thus, for the ANN corresponding to the ith multiple linear regression model, by using the result of equations (19) and (21), the cost is defined as follows:which suggests the error of the ith ANN.

Since we got error of each ANN, the traditional backpropagation [44] algorithm could be executed. During the backpropagation of ANN, all neurons update their weights and offsets according to gradients. After that, another sample will be employed to train the model unless the stopping criterion is satisfied.

The pseudocode for learning algorithm of the proposed forecasting model is presented as follows (Algorithm 1):

Input:
All multiple linear regression models ;
Training sets ;
Output:
Forecasting model with ANNs ;
(1) ⟵ initialize all networks
(2) ⟵ normalize
(3)Repeat
(4)For all do
(5)  For all do
(6)   Calculate the output value of
(7)  End for
(8)  ,
(9)   ⟵ cost defined in equation (16)
(10)  For all do
(11)   Calculate the partial derivative defined in equation (19)
(12)  End for
(13)  
(14)  For all do
(15)   
(16)   Compute and for all weights and offsets
(17)   Update network weights and offsets
(18)  End for
(19)End for
(20)Until RMSE is small enough or fallen into a local minimum
3.4. A Case of Forecasting the TAIEX

TAIEX data are frequently used for evaluation of the FTS algorithms. This section presents a detailed example on the TAIEX data of 1999 to explain our model. There are 266 samples in this dataset. 221 samples are used for training and others for testing. The proposed forecasting model is presented as follows:

Step 1. In order to explain the details of the proposed model, we initialize number of clusters , number of observed time points of a high-order time series , and time to predict .

Step 2. Normalize the data by calculating the daily percentage change of the closing price as the universe of the discourse. Let the and be the maximum and the minimum of daily change. Then, the scope of the universe can be represented as . The percentage change is calculated by using

Step 3. Transform the time series of percentage changes, in both training set and testing set, into vectors of time series as follows:

Step 4. For the scope of the universe U, apply an equivalent-interval partition described in [21]. Let the number of intervals . With obtained intervals , fuzzy sets are defined as follows:Furthermore, if there are missing values existing, these values could be represented by a special fuzzy set, , which employs equal grade of membership for each interval.
Then, the high-order time series can be fuzzified as follows:where denotes the fuzzy set corresponding to .

Step 5. In order to employ the linear model, we transmit into numeric vectors as the methods explained in Section 3.1. Since, each fuzzy set could be replaced by p intervals and the grades of membership, defined in equation (25), could generate time series vectors. And, for each vector, the grade of membership related to it can be calculated by accumulating grades of membership related to intervals in each dimension. For example, could be transformed intoAfter the transformation, we get a set of tuples. For each tuple, it consists of a vector of intervals and a weight calculated by accumulating grades of membership related to intervals in each dimension. So, we just collect the tuples with nonzero weight and formed a weight high-order time series set .

Step 6. After that, we get weighted numeric vectors set . Besides, grade of membership represents the weight of the vectors in training set, and the value of it can only vary within in this case since this dataset does not contain missing value. Then, to form a new training set with high-order time series with , we applied the SMOTE defined in [41]. Also, we partition the into two parts, including training set and testing set .

Step 7. Based on the clustering algorithm described in Section 3.2, cluster these high-order time series vectors in and calculate the centers of these clusters. The final output of the clustering and linear models corresponding is

Step 8. With training set and linear models , a forecasting model is trained by methods described in Section 2. K ANNs are built during training process, and each takes anterior L-dimension of sample as input and gives the grade of membership between sample and multiple linear regression model as output.

Step 9. Make a forecast of each sample in the testing set . The forecast value of sample can be obtained byNote that if the sample in the testing set contains uncertainness, the sample should be transformed into a weighted high-order time series set by the method explained in Section 3.1. Then, make forecasting for each weighted high-order time series as formula (29), and the final forecasting result could be obtained by weighted summation of these forecast results according to the weights of these high-order time series.

Step 10. The forecasting results are formulated by antinormalizing the result of Step 9.
The number of clusters K is not optional and should be computed. Optimal number of clusters is simply found by running the algorithm for and choosing K corresponding to the minimum testing error. During the process of finding optimal K, the iteration should be stopped if there are redundant clusters during Step 7.

4. Experimental Results and Analysis

To demonstrate the performance of the proposed model, we made experiments on the dataset of historical closing prices of the TAIEX [45]. For each year, we used historical data from January to October to construct the training set and the data from November to December as the testing set. Besides, the structure of ANN, including the number of layers, the number of neurons ,and the type of transfer function were decided by experimenting on the training set to give the best performance. Some main parameters are listed in Table 1.

First of all, we tested our high-order MLRM-based time series clustering algorithm with other clustering algorithms, such as K-means clustering, hierarchical clustering and density-based spatial clustering of applications with noise (DBSCAN). For the results of clustering, we applied multiple linear regression to construct a linear model for each cluster. So, the cost of the clustering results can be calculated by summing all linear models’ cost, which can be expressed aswhere K is number of clusters and the denotes the cost, defined by common linear regression model, for ith cluster. Given (for proposed algorithm, K-means clustering, and hierarchical clustering) and , we made an experiment by using the TAIEX data from 1999 to 2004, and the experiment results is given by Figure 4. Since the smaller cost suggests the better performance, our high-order MLRM-based time series clustering algorithm is better than other three algorithms. Especially, the small cost means that the cluster results are suitable for the linear model, so our clustering algorithm is more suitable for the proposed forecasting model.

For a given order , we made an experiment to choose the suitable number of ANNs K by using the TAIEX data of year 1999. During the experiment, the RMSE of the forecasting results is calculated for each K. Since the model built by ANN is random, for each K, the forecasting should be repeated 10 times to calculate the average RMSE. Figure 5 shows the relationship between the number of ANNs K and the average RMSE of the forecasting results. The results suggest that the RMSE decreases when the number of ANNs K increases. However, the RMSE becomes stabilized when , since there are redundant clusters.

In the similar way, we made an experiment to decide the order of high-order time series with given by using the TAIEX data of year 1999. The relationship between the order L and the average RMSE is depicted by Figure 6. The RMSE of the forecasting results varies little when the order of high-order time series .

Also, we compared running time of the proposed model with different L and K. For each K and L, the proposed model has been run for 10 times, and the average running time is calculated. Results are given in Figure 7. The running time has a linear relationship with the number of ANNs K. Also, it has an exponential relationship with the order of high-order time series L. Thus, the following experiments applied parameters that and , which ensures a well forecasting accuracy as well as an acceptable running time.

Besides, we made forecast of TAIEX, with the data from 1990 to 2004, by the proposed model and other existing FTS-based forecast models [48, 1012]. All of the RMSEs in Tables 2 and 3 are given by [11]. Table 2 shows the comparison of RMSE and average RMSE between the proposed model and other existing FTS methods by using the TAIEX historical data from 1990 to 1999, and Table 3 shows the comparison by using the TAIEX historical data from 1999 to 2004. The proposed method gets a much more accurate forecast result than other methods, and the average RMSE of the proposed method, which equals 42.91 with the data of the TAIEX from 1990 to 1999, and 36.91 with the data of from 1999 to 2004. Since the forecasting results depend on the results of clustering and ANN and both of these two procedures’ results are dynamic, the forecasting results obtained varied in each run. Thus, we repeated our model for 100 times with TAIEX data from 1990 to 2004 and calculated the average , variance , and standard deviation of the forecasting RMSEs. The results are displayed in Table 4.

Moreover, since most of the IFS-based [16, 18] and HFS-based [17, 19] models are experimented on the dataset of State Bank of India (SBI) at BSE, we compared our model with other models by using the market prices of SBI share at BSE. As the experiment in [19], we took one sample per month as testing sample, and others as training samples. Figure 8 shows the forecasting results of testing set. Our proposed model suggests a better forecasting accuracy than other FTS models. To make more clear comparison, we repeat the experiment 10 times and calculated the , , R, and as defined in [19]. The forecasting results given in Table 5 suggests that our model gives the best and , which indicates the highest forecasting accuracy.

In addition, we compared our model with some nonfuzzy methods, including Askari’s CFTS model [31], SVM-based model [46], and LSTM-based model [47] by forecasting the TAIEX data from 1999 to 2004. The results are presented in Table 6. Our model and CFTS model is much better than others, which indicates that linear model is more suitable for this problem. Furthermore, our model shows the best performance with the RMSE equaled 36.40, which is a little better than CFTS model’s 40.72.

At last, we tested the robustness by using data from year 1999 to 2004. We randomly generated deviations within for each value of time point in the training set. Then, we compared the change of RMSE with other nonfuzzy methods [31, 46, 47]. The change of RMSE can be evaluated aswhere denotes the forecasting RMSE of origin dataset and denotes the forecasting RMSE of modified dataset. Figure 9 shows that the proposed model is less insensitive to fluctuation of the training set than other models, which suggests that the proposed model has better robustness (Figures 9 and 10).

Furthermore, we randomly removed values of time points of the training set. For nonfuzzy models, we replaced the value which is removed with the average value of whole time points in training set. For fuzzy time series model, these values are defined by equivalent grades of membership related to intervals of universal. This experiment is also executed on the data of year 2002. Figure 10 shows that the missing values have little influence on forecasting results of the proposed model. Also, the proposed model shows a better robustness to the missing values than other two models.

5. Conclusion

This paper proposes a robust fuzzy time series forecasting model based on multiple linear regression and time series clustering. And, it makes three main contributions.

First, we apply a preprocessing to transform high-order fuzzy time series set into a weighted high-order time series set, which consists of numeric values. Futhermore, we employ SMOTE to generate a high-order time series with the weighted set. After preprocessing, the generated dataset could be processed by multiple linear regression model and ANN.

Second, to build the multiple linear model of the dataset, a novel high-order time series clustering algorithm is employed to cluster data by their linear relationships instead of shapes, which constructs more suitable linear model than other cluster-based models. The proposed clustering algorithm is compared with other three clustering algorithms and built the best suitable linear model which gives the least cost.

At last, we design a forecasting model based on ANN, which calculates the weight of related multiple linear model, as well as its learning algorithm.

The proposed forecasting model is compared with other FTS-based model on the TAIEX, BSE, and it gave a better forecasting accuracy than others. Also, by comparing with forecasting model that are FTS-based, the results suggest that the proposed forecasting model could handle the incomplete, imprecise data, and a better robustness than others.

Appendix

Proof of the convergence of high-order MLRM-based time series clustering algorithm

Proof. Clustering high-order time series is equivalent to minimize the cost function given by formula (4). Since each time series should only belong to one cluster, the cost function could also be presented as follows:where denotes the cost of time series , which equals the distance between and the cluster that belongs to.
For the th round of iteration, assume is the cost after the th iteration, and is the corresponding cost for .
During the process of updating the distance matrix and classification of time series, each time series would be classified into the cluster which leads to the minimum distance defined, which meanswhere is the distance between time series and cluster in the th iteration and is the new cost for after update. Thus,where denotes the new cost after update. Hence, the cost decreases after the process of updating the distance matrix and classification of time series.
During the process of update of the set of multiple linear regression models, for each cluster, the linear regression model is aimed to minimizing the cost function:Let be the distance matrix after former process and be the distance matrix after linear regression, the cost after the th iteration could be expressed as follows:

Finally, we could conclude that the high-order MLRM-based time series clustering algorithm is convergent since according to formula (A.3) and (A.5).

Data Availability

All data created during this research is openly available from https://finance.yahoo.com/quote/LV30.TW and https://finance.yahoo.com/quote/SBIN.NS.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was funded by the National Natural Science Foundation of China (grant number 61531013) and the National Science and Technology Major Project of China (grant number 2018ZX03001016).