Intelligent Collaborative Decision Making Models, Methods, and ToolsView this Special Issue
A Cycle Deep Belief Network Model for Multivariate Time Series Classification
Multivariate time series (MTS) data is an important class of temporal data objects and it can be easily obtained. However, the MTS classification is a very difficult process because of the complexity of the data type. In this paper, we proposed a Cycle Deep Belief Network model to classify MTS and compared its performance with DBN and KNN. This model utilizes the presentation learning ability of DBN and the correlation between the time series data. The experimental results showed that this model outperforms other four algorithms: DBN, KNN_ED, KNN_DTW, and RNN.
Time series data are sequences of real-valued signals that are measured at successive time intervals. They can be divided into two kinds: univariate time series and multivariate time series (MTS). Univariate time series contain one variable, while MTS have two or more variables. MTS is a more important data type of time series because it is widely used in many areas such as speech recognition, medicine and biology measurement, financial and market data analysis, telecommunication and telemetry, sensor networking, motion tracking, and meteorology.
As the availability of MTS data increases, the problem of MTS classification attracts great interest recently in the literature . MTS classification is a supervised learning procedure aimed for labeling a new multivariate series instance according to the classification function learned from the training set . However, the features in traditional classification problems are independent of their relative positions, while the features in time series are highly correlated. That resulted in the loss of some important information if the traditional classification algorithms are used for MTS, since they treat each feature as an independent attribute. Many techniques have been proposed for time series classification. A method based on boosting are presented for multivariate time series classification . In , the authors proposed a DTW based decision tree to classify time series and the error rate is 4.9%. In , the authors utilize a multilayer perceptron neural network on the control chart problem and the best performance achieved is 1.9% error rate. Hidden Markov Models are used on the PCV-ECG classification problem and achieve 98% accuracy . Support vector machine combined with Gaussian Elastic Metric Kernel is used for time series classification . The dynamics of recurrent neural networks (RNNs) for the classification of time series are presented in . However, simple combination of one-nearest-neighbor with DTW distance is claimed to be exceptionally difficult to beat .
Deep Belief Network is a type of deep neural network with multiple hidden layers, introduced by Hinton et al.  along with a greedy layer-wise learning algorithm. Restricted Boltzmann Machine (RBM), a probabilistic model, is the building block of DBN. DBN and RBM have witnessed increased attention from researchers. They have already been applied in many problems and gained excellent performance, such as classification , dimensionality-reduction , and information retrieval . Taylor et al.  proposed conditional RBM, an extension of the RBM, which is applied to human notion sequences. Chao et al.  evaluated the DBN performance as a forecasting tool on predicting exchange rate. Längkvist et al.  applied DBN for sleep stage classification and evaluated the performance. The result illustrated that DBN either with features (feat-DBN) or using the raw data (raw-DBN) performed better than the feat-GOHMM. The feat-DBN achieved 72.2% and the raw-DBN achieved 67.4%, while the feat-GOHMM achieved only 63.9%.
Raw-DBN do not need to extract feature before classifying the sleep data and this algorithm is easy to implement. However, it neglects the important information in time series data and its performance is not satisfactory. This paper proposed a Cycle DBN model for time series classification. This model possesses the ability of feature learning since it is developed on the basis of DBN. Meanwhile, the characters of time series data are taken into consideration in the model.
The remainder of the paper is organized as follows. Next section reviews the background material. In Section 3, we detail the Cycle DBN model for multivariate time series. Section 4 evaluates the performance of our Cycle DBN on two real data sets. Section 5 concludes the work of this paper.
2. Background Material
A time series is a sequence of observations over a period of time. Formally, a univariate time series is an ordered set of real-valued numbers, and is called the length of the time series . Multivariate time series is more common in real life and it is more complex since it has two or more variables. A MTS is defined as a finite sequence of univariate time seriesThe MTS has variables and the corresponding component of the th variable is a univariate time series of length :In this paper, we use bold face characters for MTS and regular fonts for univariate time series.
The time series classification problem is a supervised learning procedure. First we should learn a function according to the given training set . The training set includes samples and each sample consists of an input paired with its corresponding label . Then we can assign a label to a new time series instance based on the function we learned from the training set.
A Deep Belief Network (DBN) consists of an input layer, a number of hidden layers, and finally an output layer. The top two layers have undirected, symmetric connections between them. The lower layers receive top-down, directed connections from the layer above.
The process of training DBNs includes two phases. Each two consecutive layers in DBN are treated as a Restricted Boltzmann Machine with visible units and hidden units . There are full connections between visible layer and hidden layer, but no visible-to-visible or hidden-to-hidden connections (see Figure 1). The visible and hidden units are connected with a weight matrix, , and have a visible bias vector and a hidden bias vector , respectively. We need to train each RBM independently one after another and then stack them on top of each other in the first phase. This procedure is also called pretraining. In the second phase, the BP network is set up at the last level of the DBN, and the output of the highest RBM is received as its input. Then we can perform a supervised learning in this phase. This procedure is called fine-tuning since the parameters in the DBN are tuned using error back propagation algorithm in this phase.
From the above analysis, we can conclude that the most important of DBN is the training of each RBM.
Since there are no hidden-hidden or visible-visible connections in the RBM, the probability that hidden unit is activated by visible vector and the probability that visible unit is activated by given hidden vector is given by Contrastive Divergence (CD) approximation is used to train the parameters by minimizing the reconstruction error and the learning rule is given by is expectation of the training set and represents the expectation of the distribution of reconstructions.
3. Cycle_DBN for Time Series Classification
Längkvist et al.  applied DBN in time series classification and obtained a remarkable result. The standard DBN optimizes the posterior probability of the class labels given the current input . However, time series data are different from other kinds of data and there are correlations between time series data. It is unsuitable to apply DBN for time series classification without any modification because it neglects the important information in time series data.
Based on the above discussion, this paper proposed a Cycle DBN model for time series classification just as Figure 4. The model inherits the powerful feature representation of DBN and utilizes the data correlation of the time series. Thus, this model is quite suitable for time series classification.
In this model, is the input at time step and is the corresponding output of DBN. Since our purpose is classification, we add a softmax function on the top layer and is the corresponding label. After training DBN and getting the label , is then treated as one item input of DBN. At time , the inputs of DBN not only include but also include , the output of DBN at time .
The training procedure of this Cycle_DBN, which is similar to the traditional DBN, includes two procedures. The only difference is that the output at time is feedback to Cycle_DBN as one of the inputs at time . The first procedure is unsupervised training to initiate the parameters of DBN. After unsupervised learning, we add a softmax function on the top layer and do a supervised training procedure.
4. Experimental Evaluation
In this section, we conduct extensive experiments to evaluate the classification performance of the proposed model Cycle_DBN and compare it against traditional DBN, NN_ED, NN_DTW, and recurrent neural networks (RNN).
The -NN is one of the most well-known classification algorithms that are very simple to understand but performs well in practice. An object in the testing set is classified according to the distances of the object to the objects in the training set and the object is assigned to the class its nearest neighbors belongs to. We will choose in our experiment and the algorithm is simply called the nearest neighbor algorithm. In NN_ED, we use Euclidean Distance to measure the similarity between two instances.
Dynamic Time Warping (DTW)  is another distance measure for time series and it was originally and typically designed for univariate time series. However, the time series handled in this paper is multidimensional and a multidimensional version of DTW is needed. Fortunately, ten Holt et al.  proposed a multidimensional DTW and it utilizes all dimensions to find the best synchronization. In standard DTW, the distance is usually calculated by taking the squared distance between the feature values of each combination of points: . But in multidimensional DTW, a distance measure for two -dimensional points must be calculated: . In NN_DTW, we use multidimensional DTW distance to measure the similarity between two instances.
RNN allows the identification of dynamic system with an explicit model of time and memory, which makes it ideal for time series classification. In this paper, we choose Elman’s architecture, which consist of a context layer, an input layer, one hidden layer, and an output layer.
To evaluate the performance of these methods, we test them on real-world time series datasets, including sleeping dataset, PAMAP2 dataset, and UCR Time Series Classification Archive.
The performance of the classifier is reported using error rate and the error rate of classifiers is defined as shown in
4.1. Sleep Stage Classification
We first consider the problem of sleep stage classification. The data used in the paper is provided by St. Vincent’s University Hospital and University College Dublin and can be downloaded from http://www.physionet.org/pn3/ucddb/ PhysioNet.
The recordings of this data set have been obtained from 25 adult subjects with suspected sleep-disordered breathing. Each recording consists of 2 EEG channels (C3-A2 and C4-A1), 2 EOG channels, and 1 EMG channel. We only use one of the EEG signals (C3-A2) in our studyAccording to Rechtschaffen and Kales (R&K) , sleep recordings can be divided into the following five stages: awake, rapid eye movement (REM), stage 1, stage 2, and slow wave sleep (SWS). Our goal is to find a map function that correctly predicts the corresponding sleep stage according to the : .
4.1.2. Experiment Setup
The raw signals of all subjects are slightly preprocessed by notch filtering at 50 Hz to cancel out power line disturbances and then are prefiltered with a band-pass filter of 0.3 to 32 Hz for EEG and EOG and 10 to 32 Hz for EMG. After that they are downsampled to 64 Hz.
Since the sample rate is 64 samples per second and we set window width to be 1 second of data, our time series becomeSince the length of is 64, we have corresponding 64 labels. The last label is selected as the label of the time series .
In our study, we use five people recordings as the training set. In order to balance the samples, we select 6000 records every category random. So we have 30000 recordings and we divide 25000 into train samples and 5000 into validation samples. The other six people recordings are used for test data. The distribution of dataset is listed in Table 1.
4.1.3. Experiment Result
Our goal is to compare the performance of IDBNs with original DBN, NN_ED, NN_DTW, and RNN for time series classification. We illustrate the error rate of each model in Table 2. The best results are recorded in boldface in Table 2.
Compared with other four algorithms, the proposed algorithm has best performance. The classification accuracies of Cycle_DBN on all the test data are up to 90% and especially most of them are more than 99%. Standard DBN has a higher rate of correct classification than NN_ED, NN_DTW, and RNN. RNN shows quite poor performance and the error rate is about 50%.
4.2. Activity Classification
Our second experiment is on the PAMAP2 dataset for activity classification. This dataset can be downloaded at http://archive.ics.uci.edu/ml/datasets/PAMAP2+Physical+Activity+Monitoring.
This data set records 18 activities performed by 9 subjects wearing 3 IMUs and a HR-monitor. Each of data contains 54 columns per row and the columns contain the following data: timestamp (), activityID (), heart rate (), IMU hand (4–20), IMU chest (21–37), and IMU ankle (38–54). In our experiment, we only select 7 activities which are “lying (),” “sitting (),” “standing (),” “walking (),” “running (),” “cycling (),” “Nordic walking ).” Since the records of subject103 and subject109 do not have all the above seven activities and we discard these two subjects. That is to say, we select subject 101~subject 102 and subject 104~subject 108, seven subjects to classify seven activities. Furthermore, the record of heart rate is not used in our experiment.
4.2.2. Experiment Setup
To improve the performance of the proposed approach, we need to carry out a data preprocessing process at the beginning of the experiment. Each dimension of time series is normalized through where and are the mean and standard deviation of the variable for samples belonging to the same column, not all samples.
For each subject of seven subjects, we randomly select 1/2 as training set, 1/6 as validation set, and the rest as test set.
4.2.3. Experiment Result
We evaluate classification accuracies of each model on these seven subjects. Table 3 shows the detailed error rates comparison of each subject. From Table 3 we can see that the classification accuracies of the five models on the seven datasets are more than 90%. However, our Cycle_DBN model is either the lowest error rate one or very close to the lowest error rate one for each subject. NN_ED also shows quite excellent performance and we should note that NN is feature-based model.
It is well known that feature-based models have an advantage over lazy classification models such as NN in efficiency. Although NN has high classification accuracy, the prediction time of NN will increase dramatically when the size of training data set grows. The prediction time of DBN and Cycle_DBN will not increase no matter how large the training data is. Therefore, Cycle_DBN shows excellent performance in terms of classification accuracy and time consuming.
4.3. UCR Time Series Classification
Besides the above two data sets, we also test our Cycle_DBN on the ten distinct time series datasets from UCR time series . All the dataset has been split into training and testing by default. The only preprocessing in our experiment is normalization and divides them into training, validating, and testing set.
Table 4 shows the test error rate and a comprehensive comparison with NN_ED, NN_DTW, RNN, DBN, and Cycle_DBN.
Cycle_DBN outperforms other four methods on five datasets of ten datasets; NN_ED and NN_DTW achieve best performance on the same two datasets. DBN achieves best performance on two datasets. Although the performance of RNN is not prominent, the effect is also acceptable.
Time series classification is becoming more and more important in a broad range of real-world applications. However, most existing methods have lower classification accuracy or need domain knowledge to identify representative features in data. In this paper, we proposed a Cycle_DBN for classification of multivariate time series data in general. Like DBN, Cycle_DBN is an unsupervised learning algorithm which can discover the structure hidden in the data and learn representations that are more suitable as input to a supervised machine than the raw input. Comparing with DBN, the new model Cycle_DBN predicts the label of time not only based on the current input but also based on the label of previous time . We evaluated our Cycle_DBN model on twelve real-world datasets and experimental results show that our model outperforms DBN, NN_ED, NN_DTW, and RNN on most datasets.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This work was supported by the National Natural Science Foundation of China (nos. 51574232, 61379100, 61673196, and 61502212).
L. Chen and M. S. Kamel, “Design of multiple classifier systems for time series data,” Lecture Notes in Computer Science, pp. 216–225, 2005.View at: Google Scholar
I. Batal et al., “Multivariate time series classification with temporal abstractions,” The Florida Ai Research Society, vol. 22, 344 pages, 2009.View at: Google Scholar
J. J. Rodriguez, C. J. Alonso, and H. Bostrom, “Boosting interval based literals,” Intelligent Data Analysis, vol. 5, no. 2, pp. 245–262, 2001.View at: Google Scholar
J. J. Rodríguez and C. J. Alonso, Interval and Dynamic Time Warping-Based Decision Trees, p. 548, 2004.
A. Nanopoulos, R. Alcock, and Y. Manolopoulos, “Feature-based classification of time-series data,” in Information Processing and Technology, M. Nikos and D. N. Stavros, Eds., pp. 49–61, Nova Science Publishers, Inc., New York, USA, 2001.View at: Google Scholar
D. Zhang, W. Zuo, D. Zhang, and H. Zhang, “Time series classification using support vector machine with Gaussian elastic metric kernel,” in Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR '10), IEEE, Istanbul, Turkey, August 2010.View at: Publisher Site | Google Scholar
X. Xi, E. Keogh, C. Shelton, L. Wei, and C. A. Ratanamahatana, “Fast time series classification using numerosity reduction,” in Proceedings of the the 23rd International Conference on Machine learning (ICML '06), pp. 1033–1040, ACM, Pittsburgh, Pennsylvania, USA, June 2006.View at: Publisher Site | Google Scholar
H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations,” in Proceedings of the 26th Annual International Conference on Machine Learning (ICML ’09), pp. 609–616, ACM, Montreal, Quebec, Canada, June 2009.View at: Publisher Site | Google Scholar
M. W. Amp, M. Rosenzvi, and G. E. Hinton, “Exponential family harmoniums with an application to information retrieval,” in Proceedings of the 18th Annual Conference on Neural Information Processing Systems (NIPS '04), 2004.View at: Google Scholar
G. W. Taylor, G. E. Hinton, and S. T. Roweis, “Modeling human motion using binary latent variables,” in Proceedings of the International Conference on Neural Information Processing, 2006.View at: Google Scholar
G. A. ten Holt, M. J. Reinders, and E. Hendriks, “Multi-dimensional dynamic time warping for gesture recognition,” in Proceedings of the Thirteenth annual conference of the Advanced School for Computing and Imaging, 2007.View at: Google Scholar
A. Rechtschaffen and A. Kales, A Manual of Standardized Terminology, Techniques, and Scoring Systems for Sleep Stages of Human Subjects, 1968.
Y. Chen et al., The ucr time series classification archive, http://www.cs.ucr.edu/~eamonn/time_series_data/, 2015.