Machine Learning in Mobile Computing: Methods and ApplicationsView this Special Issue
Multivariate Time Series Data Clustering Method Based on Dynamic Time Warping and Affinity Propagation
In view of the importance of various components and asynchronous shapes of multivariate time series, a clustering method based on dynamic time warping and affinity propagation is proposed. From the two perspectives of the global and local properties information of multivariate time series, the relationship between the data objects is described. It uses dynamic time warping to measure the similarity between original time series data and obtain the similarity between the corresponding components. Moreover, it also uses the affinity propagation to cluster based on the similarity matrices and, respectively, establishes the correlation matrices for various components and the whole information of multivariate time series. In addition, we further put forward the synthetical correlation matrix to better reflect the relationship between multivariate time series data. Again the affinity propagation algorithm is applied to clustering the synthetical correlation matrix, which realizes the clustering analysis of the original multivariate time series data. Numerical experimental results demonstrate that the efficiency of the proposed method is superior to the traditional ones.
Time series is a kind of time-dependent and high-dimensional data type, which can be divided into univariate time series (UTS) and multivariate time series (MTS) in terms of the variable type. With the rapid development of society and economy, such time-axis-sorted data widely exist in finance , economy , engineering , wireless communication , and other fields. Owing to the high dimensionality of variable attributes, MTS brings extremely difficulties to clustering, classification, prediction, pattern discovery, visualization, and other mining tasks. Therefore, the research on MTS mining has gained growing interests from scholars. In particular, MTS clustering is the most concerned issue in the field of data mining, from which the feature pattern or data classification of MTS are easily found .
In fact, the current academic research on the clustering of time series mainly focuses on the UTS data. Generally, they concentrate on the clustering for the whole time series class, subseries, time points, etc. [6–8]. The traditional time series clustering methods involve model-based clustering [9, 10], feature-based clustering [11, 12], segmentation-based clustering , and distance-based clustering [14, 15]. However, since MTS data have the characteristics of high dimensionality, uncertainly, and dynamics , the above methods cannot be effectively adapted to MTS clustering. To reduce the high dimensionality of MTS attributes, both principal component analysis (PCA)  and wavelet analysis  are commonly adopted to conduct the clustering analysis for MTS. However, most clustering algorithms [18–21] cannot solve nonspherical distribution of MTS data well. In the process of clustering, due to the lengths of most MTSs are different, it is necessary to use the dynamic time warping (DTW) [22–26] to measure the similarity between their corresponding component attributes. But the importance of component attributes has not yet received due attention. To further analyze the features of component attributes, Li et al.  apply the affinity propagation (AP) algorithm [28, 29] to adaptively cluster the component attributes which addressed the shortcomings of other traditional methods in a certain extent. However, their calculation process for the comprehensive relationship matrix is still complex. The quality and efficiency of MTS clustering need to be further improved.
To better solve the asynchronous correlation of different time points in MTS clustering, this paper proposes a simple and high-quality clustering method DTW_AP based on the DTW and AP. DTW is used to measure the similarity of both the component attributes and the overall MTS data. After that, from the perspective of local characteristics and global information, we quickly construct the comprehensive correlation matrix of MTS data. Based on this matrix, we further use the AP algorithm to achieve MTS clustering. Numerical experimental results indicate that the proposed method can achieve better clustering results compared with other traditional methods.
The proposed method possesses some advantages that the current MTS clustering research does not have. These advantages are as follows: (1) The similarity among the components of the MTS data is considered during the clustering process. The accurate of MTS clustering is ensured. (2) With a thorough consideration of the complicated relationships between the local component attributes and the overall information, it retains the multipath nature of MTS information.
The rest of this paper is organized as follows. “Preliminaries” introduces some related theoretical methods, such as the DTW and AP algorithms. In “Multivariate Time Series Clustering,” the detailed construction process of the novel method is described. In “Numerical Experiment,” the comparison analysis among the clustering methods is fulfilled in the experimental evaluation. Finally, the conclusions and future work are drawn in “Conclusions.”
2.1. Dynamic Time Warping
Euclidean distance is a common and efficient distance measurement method of time series which can be denoted as where and are two MTSs, and the dimension of is , .
From (1), we know that the Euclidean distance usually needs to be measured by two time series with equal length at the same time point. That is, we regard the square difference between and at the same time point as the contribution of the Euclidean distance. Obviously, its characteristics with equal sequence length, equal time matching, and being sensitive to abnormal data make it impossible to obtain satisfactory results when measuring the similarity between time series data.
Compared with the Euclidean distance, dynamic time warping (DTW) [22–26] is a distance measurement method, in which it firstly calculates the distance matrix between time series data and then uses the dynamic programming method to construct the cost matrix . Finally, based on the matrix , an optimal bending path is found so that where , is the number of elements in the optimal bending path , and . In addition, indicates that the path element consists of the data point in and the data point in .
According to the dynamic programming method, the cumulative cost matrix can be figured out as follows: where , , and .
Obviously, the cumulative distance is , that is, . Meanwhile, a corresponding dynamic bending path is also found.
Using DTW to measure the distance of time series data not only matches the data points with the same shape but also measures the similarity between time series data with unequal length. In addition, it can also better solve certain sensitive problems caused by the abnormal data points in the Euclidean distance [22, 23]. However, since the time and space complexity of DTW is higher than that of the Euclidean distance, it will result in poor efficiency when directly applying DTW to measure the similarity between high-dimensional time series data. To solve this problem, many improved methods are proposed by scholars. Salvador and Chan  presented a fastDTW based on the search range limitation and data abstraction. On the basis of the penalty coefficient, Li and Du  proposed an improved DTW algorithm () to effectively measure the similarity among an asynchronous approximation of the morphology. To promote the quality of DTW, Li  designed a novel DTW with time weight in measuring the distance between time series.
In this paper, to reduce the impact of high dimensionality of variable attributes on the clustering, we will utilize the traditional DTW method to measure the similarity among each component of MTS.
2.2. Affinity Propagation
The classical algorithm randomly chooses the initial center points, and certain local optimal solutions are easily obtained. To avoid this problem, Frey proposed the affinity propagation (AP) clustering method  based on the graph theory. Compared with other algorithms, AP clustering does not limit the number of clusters and does not choose the initial points randomly. It takes each data point as the potential representative point of the candidate class, which effectively reduces the influence of the selection of original clustering centers on the clustering results. In addition, it also has no symmetry requirement for the similarity matrix, and the speed of dealing with multiclass data is also faster. Therefore, AP clustering algorithm can solve non-Euclidean space problems caused by asymmetry or trigonometric inequalities or large-scale sparse matrix computing issues [28, 34]. Owing to the important functions, AP algorithm is employed to the various research fields, and its performance and efficiency are also improved [35–37]. For instance, Li et al.  proposed a new AP algorithm, the adjustable preference affinity propagation (APAP) algorithm, in which the initial value of each element preference is independently determined on the basis of the data distribution and is automatically adjusted during the iteration process. Experiments verify its better performance comparing with the standard AP algorithm.
As a type of clustering algorithm depending on the information exchange between data points, AP algorithm can locate the optimal set of representative points when the cluster error function is minimized. Each representative point is from a certain point in original data set, and the sum of cumulative information volume between it and other data points reaches the maximum. In this algorithm, there exist two kinds of information between any two data points and . They are called the responsibility degree and the availability degree , respectively. refers to the information exchanged from the data point to the candidate representative data point , which reflects the representative degree of data point to data point . And denotes the effectiveness degree in which data point chooses data point as its cluster center, which indicates the information transmitted from the candidate cluster center to data point .
Given a data set , the detailed process of AP clustering algorithm is described as follows.
During the AP clustering process, the purpose of setting the matrices and to zero matrix in S1 is to choose each point in data set as the original cluster center. After initializing, each element in the matrices and is needed to be updated from S2 to S4. Specifically, to avoid the possible vibration, a damping coefficient is introduced to S4. S5 is the condition for the end of algorithm. The corresponding cluster center of each object and the final cluster set can be obtained in S6 and S7, respectively.
In the process of calculation, it is worth noting that the number of iterative cycles is affected by the damping coefficient . Generally, they change in opposite directions with each other. When the damping coefficient is large, the number of iterative cycles is small, otherwise the opposite. Gui and Yang  found that when the range of is [0.5, 1], the stability of AP algorithm can be extremely improved. To reduce the probability of the appearance of fluctuations, we take the value of to be 0.9 as suggested by Frey and Dueck .
3. Multivariate Time Series Clustering
Compared with univariate time series (UTS), multivariate time series (MTS) has high-dimensional attribute variables, which makes it necessary to consider their approximation in clustering. However, to achieve the MTS clustering, some traditional clustering methods usually regard MTS data as the complete data object and select certain appropriate distance measurement methods to calculate their similarity. Due to the high dimensionality of attribute variables in MTS, the distance measurement method will affect the clustering quality of MTS. To this end, this paper proposes a MTS clustering method based on DTW and AP from the view of univariate attribute series of MTS data.
Given a MTS data set , where is the MTS, refers to the attribute sequence of the MTS, and . DTW is used to measure the following similarity distance between two MTSs and :
Obviously, the bending path measured by the minimum bending distance will be obtained, where .
Since each dimension of the MTS can be viewed as the UTS, we can compute the similarity between any two MTSs from the data bending path provided by (4). That is, under the same attribute of different MTS, we can calculate the following similarity of UTS by using the DTW.
From (5), we can construct the similarity matrix of the UTS data set under each attribute component.
This paper employs the AP algorithm to cluster the similarity matrices and constructed by formulas (4) and (5). The clustering results can reflect the relationship among MTSs under the overall information environment and local component. Furthermore, we obtain the relation matrices by the transformation formula . Notice that and and the element of the relation matrix only include 0 or 1. If two data objects in the final clustering result point to the same representative object, then their relationship is denoted to be 1, otherwise 0.
Based on the above relationship , we design the following comprehensive relationship matrix .
Regarding the matrix as the similarity matrix, we will further obtain the clustering results by means of AP algorithm. A new clustering algorithm constructed by DTW and AP is described as follows.
Compared with the traditional clustering methods with Euclidean distance, has better clustering effects. In fact, DTW can measure the similarity of time series with different lengths while not being sensitive to sudden change or outlier of time series. In addition, DTW also can fulfill the asynchronous similarity comparison. Meanwhile, this paper applies the AP algorithm to cluster the component attribute of MTS. The influence of each component on the clustering is considered in a comprehensive manner. The application of the AP algorithm not only avoids the limitation of the clustering results achieved by the representative point of the initial class but also has no requirement for the symmetry of similarity matrix. When processing the multiclass data, once the operation speed is faster, its performance will be better.
With the help of the effectiveness evaluation, we find that the algorithm proposed by this paper has significantly higher clustering accuracy than other algorithms. Meanwhile, it also achieves the purpose of high accuracy for the high-dimensional data, which proves its component steps to be effective and meaningful. In this paper, the AP clustering algorithm is used twice. The first time is to cluster each component and the overall of MTS. The last time is to cluster the comprehensive relationship matrix . Thus, the achieved clustering results will be more accurate. In addition, it is more universal for MTS with missing data or different length to use DTW as the distance measurement, which can be widely adopted in practical application.
In the method DTW_AP, the AP algorithm solves the problem that the traditional methods are only suitable for data clustering with spherical distribution. Moreover, the application of DTW in the clustering process enhances the clustering effect of asynchronous morphological similarity. In addition, this paper mainly focuses on the effect of each component attribute sequence on the clustering result and achieves MTS clustering under the local component attribute sequence and the overall information environment.
4. Numerical Experiment
To verify the performance of the proposed method, we choose an effective evaluation index and compare it to several other MTS clustering methods when processing various MTS data sets. In the experiments, the computer is Lenovo T420. The CPU is Intel Core i7-2640M, the computer main frequency is 2.8 GHz, and the memory is 4.0 GB. The operation system is Windows 7 (64 bit). The programming environment is MATLAB R2012b.
4.1. Effectiveness Evaluation Index
Due to the existence of different application requirement, the quality of clustering has no uniform standards for a long time. Therefore, it must rely on certain indicators to evaluate the effectiveness of algorithm. The evaluation methods of clustering validity are divided into external effectiveness and internal effectiveness. The former is to judge the validity of clustering results based on the known correct results. By comparing the known correct clustering results with the experimental clustering results, we can get the degree of similarity, which is the accuracy. The latter is a completely opposite method to the former, which usually uses the self-characteristics of original data set to evaluate the results based on the unknown standard clustering results. Since the measurement is labeled by the known clustering results, this paper chooses the former as the valuation index.
Suppose Pre is the external index of measuring the clustering effect and algorithm quality. Label is the correct result, while idx is the experimental result. is the total number of points in the data set. The evaluation index Pre of clustering validity between Label and idx is defined as
Given that the class label clustered by the AP algorithm is the coordinate of clustering center points in Pre, we only need to figure out the class label value of experimental results. For example, if the class label of data 1 is 5, and the value of data 5 in the real result is also 5, then data 1 should be clustered into the cluster of data 5. Obviously, this clustering proves to be true and records as 1. Similarly, the correct matching is marked by 1, otherwise 0. Obviously, the cumulative sum divided by the value of Pre is between 0 and 1. The closer the value is to 1, the more accurate the clustering result is.
4.2. Clustering Analysis
To demonstrate the performance of the proposed method, ten MTS data sets from the UCR and UCI are selected. They mainly come from the fields of industrial engineering, medical and health, gestures behavior, language pronunciation, and so on. The detailed information of these data sets are shown in Table 1.
As shown in Table 1, the former five MTS data sets have different lengths. For instance, the time dimension of data set AD is from 4 to 93. The last five MTS data sets with the same industry have 6 variables, and their time dimension is 15. However, they contain different numbers of categories and data volume.
The traditional clustering methods of time series usually use the feature representation and similarity measurement to process data and further achieve time series clustering based on the designed specific distance measurement method and the classical clustering method. To demonstrate the clustering performance of the method DTW_AP, PCA_AP  based on PCA and AP, based on the distance measurement, and OAP based on AP clustering for the original data  are selected to be compared with it. In addition, our experiment also compares DTW_AP with the permutation distribution clustering (PDC)  and the component attributes based affinity propagation clustering for multivariate time series data (cACM) . Different methods are adopted to carry out the clustering analysis on ten different data sets. The experimental results are displayed in Table 2, where the best experimental result in PCA_AP is from the former main ingredient, represents the attribute dimension of the MTS data set. The bias parameter Pr affecting the number of clustering and the damping factor affecting the algorithm convergence in AP clustering algorithm are set to be the median of the similarity matrix and 0.9, respectively .
As shown in Table 2, we know that the mean value obtained by DTW_AP is higher than all other methods on ten different data sets. Meanwhile, the standard deviation (SD) value obtained by DTW_AP is also lower than that by OAP, PCA_AP, , and PDC. Note that the SD value computed by cACM is lower than that by DTW_AP. However, its stability advantage is not extremely clear. In a word, compared with other MTS clustering methods, DTW_AP is more effective and feasible. Clearly, it is more superior to most traditional AP clustering algorithms based on the whole original data information.
We take the novel method as the benchmark and compare the clustering effect between this method and other traditional methods. The scatter diagrams of clustering results are shown in Figure 1. Obviously, if the data point in the figure is below the gray line, it will indicate that the novel method is better than the traditional method, otherwise the opposite. From the visual comparative analysis of clustering results in Figure 1, we observe that the method DTW_AP achieves better clustering results, compared with other traditional methods on most data sets. In fact, it is easy to find that most discrete points in Figure 1 are below the gray line except Figure 1(a), which proves that the novel method has stronger advantages. However, Figure 1(a) generally fluctuates near the gray line, and there is one discrete point deviating from the gray line downward. To a certain extent, it also shows that the clustering quality of the novel method is better. In particular, from the visual comparison analysis of different methods on all data sets in Figure 2, we find that the ranking of clustering quality is roughly DTW_AP>cACM>OAP>PCA_AP>>PDC. Meanwhile, based on the average effect and the variance stability of clustering results in Table 2, it also indicates that the clustering performance of DTW_AP is better than other traditional methods.
In this paper, a new MTS clustering method is proposed on the basis of DTW and AP. DTW is used to calculate the similarity between the whole information of MTS and parse out the similarity between the component attributes. Meanwhile, the AP algorithm is utilized to construct the clustering relationship between each component attribute and the whole sequence. After that, certain MTS relationship matrices are further presented based on the component attributes and global information. The numerical experimental results indicate that the novel method can achieve better clustering performance when comparing with other traditional clustering methods of time series. However, during the clustering process, the operation using the AP algorithm to cluster the UTS of each component attribute is time-consuming. Therefore, further work should consider improving the time efficiency of the proposed method.
The datasets analyzed in this study are available from the corresponding author upon reasonable request.
Conflicts of Interest
The authors declare that they have no conflicts of interests.
This study was sponsored by the Huaqiao University’s Academic Project Supported by the Fundamental Research Funds for the Central Universities (14SKGC-QG17), Huaqiao University’s High Level Talent Research Start-up Funding Project (14SKBS205), Social Science Planning Project of Fujian Province (FJ2020B088), National Natural Science Foundation (71771094), and Ministry of Science & Technology, Taiwan (MOST 109-2511-H-003-049-MY3).
C. Matias and V. Miele, “Statistical clustering of temporal networks through a dynamic stochastic block model,” Journal of the Royal Statistical Society, vol. 60, no. 4, pp. 12–31, 2016.View at: Google Scholar
M. Caron, P. Bojanowski, A. Joulin, and M. Douze, “Deep clustering for unsupervised learning of visual features,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 139–156, Munich, Germany, 2018.View at: Google Scholar
S. Bhati, S. Nayak, K. Murty, and N. Dehak, “Unsupervised acoustic segmentation and clustering using siamese network embeddings,” in Interspeech 2019, pp. 2668–2672, Graz, Austria, 2019.View at: Google Scholar
A. M. Brandmaier, “PDC: permutation distribution clustering,” Psychological Methods, vol. 18, no. 1, pp. 71–86, 2015.View at: Google Scholar
H. Li, C. Wang, and X. Deng, “Multivariate time series clustering based on affinity propagation of component attributes,” Control and Decision, vol. 33, no. 4, pp. 649–656, 2018.View at: Google Scholar
S. Salvador and P. Chan, “FastDTW: toward accurate dynamic time warping in linear time and space,” Intelligent Data Analysis, vol. 11, pp. 561–580, 2007.View at: Google Scholar