Abstract

This paper discusses the prediction of time series with missing data. A novel forecast model is proposed based on max-margin classification of data with absent features. The issue of modeling incomplete time series is considered as classification of data with absent features. We employ the optimal hyperplane of classification to predict the future values. Compared with traditional predicting process of incomplete time series, our method solves the problem directly rather than fills the missing data in advance. In addition, we introduce an imputation method to estimate the missing data in the history series. Experimental results validate the effectiveness of our model in both prediction and imputation.

1. Introduction

The subjects of time series prediction have sparked considerable research activities, ranging from short-range-dependent series to long-range-dependent series [14], from conventional time series to fractal time series [5, 6]. Traditional predicting technologies are targeted for complete time series, such as Neural networks (NNs) [7], and Support Vector Regression (SVR) [8], and so forth. However, the time series we encountered in real life often contain missing data due to malfunctioned sensors, human factors, and other reasons. When dealing with prediction of incomplete time series, traditional process consists of two steps. The first step is to recover the incomplete time series by an imputation model, and the second step is to estimate the predicting model as complete time series. This process is shown in Figure 1(a). It may consume large number of calculations, and bring deviation for inaccurate imputation. In this paper, we propose a novel predicting model built directly by incomplete time series, which is shown in Figure 1(b).

The issue of modeling incomplete time series is interpreted as classification of data with missing features in this paper. We use the optimal hyperplane of the classification to determine the prediction values. A similar approach has been applied to prediction of complete data [9]. In addition, our model is also used as an imputation method, which means estimating the missing data of history series. There have been several works carried out on the imputation of missing data. In the following, these methods are separated into two groups.(1)Statistical Methods. Examples include Maximum likelihood (ML) algorithm [10], expectation maximization (EM) algorithm [11], Multiple Imputation [12], and so forth. Based on statistical theory, ML and EM algorithms need to know the distribution model of data and often have higher computational complexity. Multiple Imputation, which imputes the missing data M times, assumes that the missing data are Missing At Random (MAR).(2)Machine Learning Methods. Examples include mean, K-Nearest Neighbor (KNN) [13], Varies Windows Similarity Measure (VWSM) [14] and Regional Gradient Guided Bootstrapping (RGGB) [15], and so forth. KNN algorithm finds a plausible value of missing data by measuring the distance. VWSM algorithm fills the missing data by complete subsequences which are in the similar cycles. RGGB algorithm imputes the missing data by estimating the slopes of the imputing boundary regions. From the results of [15], RGGB algorithm outperforms other traditional imputation methods, such as MI and VWSM. However, it cannot be used to analyze dataset with high fluctuations.

Compared with traditional imputation methods, different samples can be selected to calculate the absent data in the history series using our model, which ensures the imputing accuracy of missing data.

The rest of this paper is organized as follows. Section 2 introduces the establishment of our model. The theory of max-margin classification of data with absent features is reviewed briefly in Section 3. The solution and algorithm of our model are discussed in Section 4. Section 5 follows with the experiments, in which the prediction and imputation performance of our model are tested in detail. Finally, conclusions are presented in Section 6.

2. Presentation of Our Model

We start by formalizing the problem of incomplete time series. Assume that a time series with missing data is given as where “” represents the missing data. The sample set of the incomplete time series can be formulated as where denotes the embedding dimension.

Predicting technologies usually establish regression models by , where acts as the predicting target. In order to predict the value of , must be the input data of the model.

The implementation process of our model starts by dividing the sample set into two parts: training set and imputing set . The predicting targets of the training samples in are existing values, while those of the imputing samples in are missing values. Training set is used to construct our predicting model. The role of imputing set is to estimate the missing values.

We construct two classes of incomplete data and by , which can be expressed as where is the fitting error. Being the optimal hyperplane of classification of and , is obtained by the theory of max-margin classification of data with missing features. Predicting samples must fall on the hyperplane determined by the training set for a small ; thus the prediction values can be calculated by .

This model can also be used to predict the missing data of incomplete time series. The imputing samples, taken from the sample set, also fall on the hyperplane. Therefore the missing values can be estimated in the same way as the prediction values. The implementation process of our model is shown in Figure 2.

In the process of imputation, each missing data can be estimated by all the samples containing it in , not just the imputing sample in . Assume that is absent; the number of different samples we can use to compute the value of is where is equal to the frequency of in .

3. Max-Margin Classification of Data with Missing Features

In the previous discussion, the issue of modeling incomplete time series is interpreted as classification of data with missing features. In this section, we review the theory of max-margin classification of data with missing features proposed by Chechik [16].

Assume a set of samples with missing features. denotes the binary class label of , and . Each sample is characterized by a subset of features from a full set .

The problem of classification can be interpreted as to find an optimal hyperplane with the max-margin framework. In the case of classification of incomplete data, the instance margin treating the margin of each instance in its own relevant subspace is defined as where is a vector obtained by taking the entries of that are related to . Considering the geometric margin to be the minimum over all instance margins, it comes to an optimization problem Define the scaling coefficients , and rewrite (3.2) as For a given set of , we can solve a constrained optimization problem where the inner product is taken only over features that are valid for both and . The nonlinear classification is solved by using kernels. Thus we obtain the optimal separating hyperplane of classification of data with missing features, which is expressed as where is set as in Support Vector Machines [17, 18].

4. Solution and Algorithm

In our model, the hyperplane of classification of data with missing data is used to compute the estimation values. Both predicting samples and imputing samples satisfy (3.5). In this section we introduce the solution and algorithm of our model.

4.1. Analytical Solution

Suppose a test sample , where is the value to be estimated. In this paper we use the kernel function . Replacing the kernel of (3.5), we obtain

The simplification of (4.1) is where the product operator “” is taken only over features that are valid for both and . Equation (4.2) is a quadratic equation of , and can be solved easily.

4.2. Numerical Solution

Sometimes, analytical solution is meaningless or nonexistent. We need to get numerical solution of our model by iterative algorithms [19, 20]. Still take as an example, supposing we use Newton method, the object function of our model can be represented as

The iterative equation of can be expressed as

Therefore, the estimation values are calculated by our model effectively. Numerical solution is more complicated, but applicable in every case.

In conclusion, we have introduced the establishment and solution of our model. The key idea is to first identify a hyperplane of classification of data with missing features by incomplete time series. Then, the hyperplane is used to calculate the estimation values in predicting and imputing samples. Figure 3 provides the algorithms of our model for prediction and imputation.

5. Experiments

To check the validity of our model, four experiments are conducted in this section. Firstly, the prediction performance of our model is evaluated in test A. Given that conventional imputation methods usually perform distinctly when incomplete time series are missing discretely and continuously, we examine the imputation performance of our model in two missing modes in test B and test C, respectively. The performance of our model is compared with that of RGGB and other two classical imputation methods: Mean and KNN. Finally, we verify the prediction performance of incomplete time series imputed by different models in test D.

The time series used in the experiments are Mackey-Glass time series and Henon time series. Mackey-Glass time series is generated by the chaotic equation where parameter is set to 17, , and . The interval of is 5. Henon time series is generated by the nonlinear equation

By contrast, Henon time series has a higher volatility. The dimension of the sample set . Parameters of our model are and . The value of K in KNN is set to 5.

MSE (Mean Squared Error) and MAE (Mean Absolute Error MAE) are used to evaluate the performance of the experiments. All the results are obtained by repeating the algorithms 10 times.

5.1. Prediction of Incomplete Time Series

In this test, continuous 115 data of Mackey-Glass time series with the missing level from 3% to 18% are used to construct the initial sample set,and the next 65 data are for testing the prediction performance of our model. The prediction results are shown in Figure 4.

From Figure 4 we can see that, with the increase of the missing level, larger deviations of the prediction results occur inevitably due to the decrease of the number of training samples. However, in practice, we can use an acceptable limit of error as the basis for judgment. For example, set the acceptable limit of error , which equals to the minimum scale of Mackey-Glass time series. Thus our model performs well even when the missing level reaches 17%. Figure 5 shows an example of prediction performance of our model in Mackey-Glass time series with the missing level of 3% and 17%. From Figure 5, our model predicts the future time series roughly when 17% of the history data are absent. Compared with the performance of missing level at 3%, only some details are missing. Similar results are obtained in Henon time series, which are shown in Figures 6 and 7.

5.2. Imputation of Incomplete Time Series with Discrete Missing Data

The continuous 115 data of Mackey-Glass time series and Henon time series with discrete missing data are used as the experimental data in this test. The imputation results of different models are shown in Tables 1 and 2.

From Tables 1 and 2 we can see that, the imputation performance of our model is similar to that of KNN and RGGB in Mackey-Glass time series. However, in Henon time series our model outperforms other three methods at every missing level. An example of imputation performance of our model over Henon time series with the missing level of 10% is shown in Figure 8.

Figure 8 shows that our model imputes most of the missing data in Henon time series effectively. Compared with other methods, the performance of our model is not sensitive to the fluctuation of time series.

5.3. Imputation of Incomplete Time Series with Continuous Missing Data

We evaluate the performance of different imputation methods by incomplete time series with continuous missing data in the same way. Set the maximum length of continuous missing data . The imputation results are shown in Tables 3 and 4.

Tables 3 and 4 indicate that our model outperforms other three methods in both Mackey-Glass time series and Henon time series when the data are missing continuously. Compared with Tables 1 and 2, no significant difference is observed between the two missing modes in our model, while other methods perform better in the former. Figure 9 shows an example of imputation performance of our model over Mackey-Glass time series with the missing level of 10%.

There are three sets of continuous missing data in Figure 9. Our method imputes the first two effectively. Based on the above observation, we conclude that our method performs better than other traditional technologies of imputation.

5.4. Prediction after Imputation

The prediction performance of incomplete time series imputed by different models in test B and test C is evaluated in this test. We also use the next 65 data to test the prediction performance. The error-tolerant BP algorithm is used to build the predicting model. The prediction results are shown in Figures 10 and 11.

From Figures 10 and 11 we can see that, the prediction performance in Mackey-Glass time series and Henon time series imputed by our model are both superior to that of imputed by other imputation algorithms.

6. Conclusions

Learning and prediction of incomplete data are still pervasive problems, although extensive studies have been conducted to improve the efficiency of data acquisition and transmission [21, 22]. We have proposed a new prediction model for incomplete time series. Experiments conducted in this paper confirm that our model can be successfully applied to prediction of incomplete time series with a missing level below than that of acceptable error limit. In addition, the imputation performance of our model is superior to that of other imputation methods, and insensitive to the fluctuation of time series. Future work may focus on applications of the model in some relevant fields [23, 24] and real-life problems.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under the project Grants nos. 60573125, 90820306, and 60873264. The authors would like to thank the anonymous reviewers in MPE for helpful suggestions and corrections.