Table of Contents Author Guidelines Submit a Manuscript
The Scientific World Journal
Volume 2014 (2014), Article ID 562194, 12 pages
http://dx.doi.org/10.1155/2014/562194
Research Article

A Hybrid Algorithm for Clustering of Time Series Data Based on Affinity Search Technique

Faculty of Computer Science & Information Technology Building, University of Malaya, 50603 Kuala Lumpur, Malaysia

Received 4 October 2013; Accepted 2 February 2014; Published 25 March 2014

Academic Editors: H. Chen, P. Ji, and Y. Zeng

Copyright © 2014 Saeed Aghabozorgi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” ACM Computing Surveys, vol. 31, no. 3, pp. 316–323, 1999. View at Google Scholar · View at Scopus
  2. E. Keogh and S. Kasetty, “On the need for time series data mining benchmarks: a survey and empirical demonstration,” Data Mining and Knowledge Discovery, vol. 7, no. 4, pp. 349–371, 2003. View at Publisher · View at Google Scholar · View at Scopus
  3. J. Lin, M. Vlachos, E. Keogh, and D. Gunopulos, “Iterative incremental clustering of time series,” in Advances in Database Technology—EDBT 2004, pp. 106–122, 2004. View at Google Scholar
  4. S. Rani and G. Sikka, “Recent techniques of clustering of time series data: a survey,” International Journal of Computational and Applied, vol. 52, no. 15, pp. 1–9, 2012. View at Google Scholar
  5. C. Ratanamahatana, “Multimedia retrieval using time series representation and relevance feedback,” in Proceedings of the 8th International Conference on Asian Digital Libraries (ICADL '05), pp. 400–405, 2005.
  6. M. Vlachos, J. Lin, and E. Keogh, “A wavelet-based anytime algorithm for k-means clustering of time series,” in Proceedings of the Workshop on Clustering High Dimensionality Data and Its Applications, pp. 23–30, 2003.
  7. E. Keogh and M. Pazzani, “An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback,” in Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining, pp. 239–241, 1998.
  8. T. Oates, M. D. Schmill, and P. R. Cohen, “A method for clustering the experiences of a mobile robot that accords with human judgments,” in Proceedings of the National Conference on Artificial Intelligence, pp. 846–851, 2000.
  9. S. Hirano and S. Tsumoto, “Empirical comparison of clustering methods for long time-series databases,” in Active Mining, vol. 3430, pp. 268–286, 2005. View at Publisher · View at Google Scholar
  10. X. Wang, K. Smith, and R. Hyndman, “Characteristic-based clustering for time series data,” Data Mining and Knowledge Discovery, vol. 13, no. 3, pp. 335–364, 2006. View at Publisher · View at Google Scholar · View at Scopus
  11. J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the 5th Berkeley Symposium Mathematical Statistics and Probability, vol. 1, pp. 281–297, 1967.
  12. L. Kaufman, P. J. Rousseeuw, and E. Corporation, Finding Groups in Data: An Introduction to Cluster Analysis, vol. 39, Wiley Online Library, 1990.
  13. P. S. Bradley, U. Fayyad, and C. Reina, “Scaling clustering algorithms to large databases,” in Proceedings of the 4th International Conference on Knowledge Discovery & Data Mining (KDD '98), pp. 9–15, 1998.
  14. C. Guo, H. Jia, and N. Zhang, “Time series clustering based on ICA for stock data analysis,” in Proceedings of the 4th International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM '08), pp. 1–4, 2008. View at Publisher · View at Google Scholar · View at Scopus
  15. V. Hautamaki, P. Nykänen, and P. Fränti, “Time-series clustering by approximate prototypes,” in Proceedings of the 19th International Conference on Pattern Recognition (ICPR '08), pp. 1–4, 2008. View at Scopus
  16. C. A. Ratanamahatana and V. Niennattrakul, “Clustering multimedia data using time series,” in Proceedings of the International Conference on Hybrid Information Technology (ICHIT '06), pp. 372–379, 2006. View at Publisher · View at Google Scholar · View at Scopus
  17. D. Tran and M. Wagner, “Fuzzy c-means clustering-based speaker verification,” in Advances in Soft Computing—AFSS 2002, vol. 2275, pp. 318–324, 2002. View at Google Scholar
  18. J. Alon, S. Sclaroff, G. Kollios, and V. Pavlovic, “Discovering clusters in motion time-series data,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 375–381, 2003. View at Scopus
  19. S. Aghabozorgi, T. Y. Wah, A. Amini, and M. R. Saybani, “A new approach to present prototypes in clustering of time series,” in Proceedings of the 7th International Conference of Data Mining, vol. 28, pp. 214–220, 2011.
  20. M. Ji, F. Xie, and Y. Ping, “A dynamic fuzzy cluster algorithm for time series,” Abstract and Applied Analysis, vol. 2013, Article ID 183410, 7 pages, 2013. View at Publisher · View at Google Scholar
  21. J. W. Shavlik and T. G. Dietterich, Readings in Machine Learning, Morgan Kaufmann, 1990.
  22. A. Bagnall and G. Janacek, “Clustering time series with clipped data,” Machine Learning, vol. 58, no. 2-3, pp. 151–178, 2005. View at Publisher · View at Google Scholar · View at Scopus
  23. C. Biernacki, G. Celeux, and G. Govaert, “Assessing a mixture model for clustering with the integrated completed likelihood,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 7, pp. 719–725, 2000. View at Publisher · View at Google Scholar · View at Scopus
  24. M. Ramoni, P. Sebastiani, and P. Cohen, “Multivariate clustering by dynamics,” in Proceedings of the National Conference on Artificial Intelligence, pp. 633–638, 2000.
  25. M. Bicego, V. Murino, and M. Figueiredo, “Similarity-based clustering of sequences using hidden Markov models,” in Machine Learning and Data Mining in Pattern Recognition, vol. 2734, pp. 86–95, 2003. View at Publisher · View at Google Scholar
  26. J. Hu, B. Ray, and L. Han, “An interweaved HMM/DTW approach to robust time series clustering,” in Proceedings of the 18th International Conference on Pattern Recognition (ICPR '06), pp. 145–148, 2006. View at Publisher · View at Google Scholar · View at Scopus
  27. B. Andreopoulos, A. An, X. Wang, and M. Schroeder, “A roadmap of clustering algorithms: finding a match for a biomedical application,” Briefings in Bioinformatics, vol. 10, no. 3, pp. 297–314, 2009. View at Publisher · View at Google Scholar · View at Scopus
  28. C.-P. P. Lai, P.-C. C. Chung, and V. S. Tseng, “A novel two-level clustering method for time series data analysis,” Expert Systems with Applications, vol. 37, no. 9, pp. 6319–6326, 2010. View at Publisher · View at Google Scholar · View at Scopus
  29. E. Keogh, S. Lonardi, and C. A. Ratanamahatana, “Towards parameter-free data mining,” in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '04), vol. 22, pp. 206–215, 2004. View at Scopus
  30. A. Ben-Dor, R. Shamir, and Z. Yakhini, “Clustering gene expression patterns,” Journal of Computational Biology, vol. 6, no. 3-4, pp. 281–297, 1999. View at Publisher · View at Google Scholar · View at Scopus
  31. S. Chu, E. Keogh, D. Hart, M. Pazzani, and Michael , “Iterative deepening dynamic time warping for time series,” in Proceedings of the 2nd SIAM International Conference on Data Mining, pp. 195–212, 2002.
  32. X. Zhang, J. Liu, Y. Du, and T. Lv, “A novel clustering method on time series data,” Expert Systems with Applications, vol. 38, no. 9, pp. 11891–11900, 2011. View at Publisher · View at Google Scholar · View at Scopus
  33. B. Morris and M. Trivedi, “Learning trajectory patterns by clustering: experimental studies and comparative evaluation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '09), pp. 312–319, 2009. View at Publisher · View at Google Scholar · View at Scopus
  34. C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, “Fast subsequence matching in time-series databases,” in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD '94), vol. 23, pp. 419–429, 1994. View at Scopus
  35. H. Sakoe and S. Chiba, “A dynamic programming approach to continuous speech recognition,” in Proceedings of the 7th International Congress on Acoustics, vol. 3, pp. 65–69, 1971.
  36. H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 26, no. 1, pp. 43–49, 1978. View at Google Scholar · View at Scopus
  37. L. Chen and R. Ng, “On the marriage of lp-norms and edit distance,” in Proceedings of the 30th International Conference on Very Large Data Bases, vol. 30, pp. 792–803, 2004.
  38. J. Aßfalg, H. P. Kriegel, P. Kröger, P. Kunath, A. Pryakhin, and M. Renz, “Similarity search on time series based on threshold queries,” in Advances in Database Technology—EDBT 2006, pp. 276–294, 2006. View at Google Scholar
  39. M. Vlachos, G. Kollios, and D. Gunopulos, “Discovering similar multidimensional trajectories,” in Proceedings of the 18th International Conference on Data Engineering, pp. 673–684, 2002. View at Scopus
  40. A. Banerjee and J. Ghosh, “Clickstream clustering using weighted longest common subsequences,” in Proceedings of the Web Mining Workshop at the 1st SIAM Conference on Data Mining, pp. 33–40, 2001.
  41. L. Chen, M. T. Özsu, and V. Oria, “Robust and fast similarity search for moving object trajectories,” in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD '05), pp. 491–502, 2005. View at Scopus
  42. E. Keogh, S. Lonardi, C. A. Ratanamahatana, L. Wei, S.-H. Lee, and J. Handley, “Compression-based data mining of sequential data,” Data Mining and Knowledge Discovery, vol. 14, no. 1, pp. 99–129, 2007. View at Publisher · View at Google Scholar · View at Scopus
  43. E. Keogh, “Fast similarity search in the presence of longitudinal scaling in time series databases,” in Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence, pp. 578–584, 1997. View at Scopus
  44. F. K.-P. Chan, A. W.-C. Fu, and C. Yu, “Haar wavelets for efficient similarity search of time-series: with and without time warping,” IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 3, pp. 686–705, 2003. View at Publisher · View at Google Scholar · View at Scopus
  45. E. Keogh, M. Pazzani, K. Chakrabarti, and S. Mehrotra, “A simple dimensionality reduction technique for fast similarity search in large time series databases,” Knowledge and Information Systems, vol. 1805, no. 1, pp. 122–133, 2000. View at Google Scholar
  46. C. Ratanamahatana and E. Keogh, “Three myths about dynamic time warping data mining,” in Proceedings of the International Conference on Data Mining (SDM '05), pp. 506–510, 2005.
  47. E. Keogh and C. A. Ratanamahatana, “Exact indexing of dynamic time warping,” Knowledge and Information Systems, vol. 7, no. 3, pp. 358–386, 2005. View at Publisher · View at Google Scholar · View at Scopus
  48. A. Xi, E. Keogh, C. Shelton, L. Wei, and C. A. Ratanamahatana, “Fast time series classification using numerosity reduction,” in Proceedings of the 23rd International Conference on Machine Learning (ICML '06), pp. 1033–1040, 2006. View at Publisher · View at Google Scholar · View at Scopus
  49. D. Berndt and J. Clifford, “Using dynamic time warping to find patterns in time series,” in Proceedings of the AAAI94 Workshop on Knowledge Discovery in Databases, pp. 359–370, 1994.
  50. S. Aghabozorgi and Y. W. Teh, “Stock market co-movement assessment using a three-phase clustering method,” Expert Systems with Applications, vol. 41, no. 4, part 1, pp. 1301–1314, 2014. View at Publisher · View at Google Scholar
  51. J. Aach and G. M. Church, “Aligning gene expression time series with time warping algorithms,” Bioinformatics, vol. 17, no. 6, pp. 495–508, 2001. View at Google Scholar · View at Scopus
  52. B. K. Yi and C. Faloutsos, “Fast time sequence indexing for arbitrary Lp norms,” in Proceedings of the 26th International Conference on Very Large Data Bases, pp. 385–394, 2000.
  53. S. Salvador and P. Chan, “Toward accurate dynamic time warping in linear time and space,” Intelligent Data Analysis, vol. 11, no. 5, pp. 561–580, 2007. View at Google Scholar · View at Scopus
  54. J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco, Calif, USA, 2011.
  55. A. Mueen, E. Keogh, and N. Young, “Logical-shapelets: an expressive primitive for time series classification,” in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '11), pp. 1154–1162, 2011. View at Publisher · View at Google Scholar · View at Scopus
  56. V. Niennattrakul and C. Ratanamahatana, “Inaccuracies of shape averaging method using dynamic time warping for time series data,” in Computational Science—ICCS 2007, pp. 513–520, 2007. View at Publisher · View at Google Scholar
  57. S. Aghabozorgi, M. R. Saybani, and T. Y. Wah, “Incremental clustering of time-series by fuzzy clustering,” Journal of Information Science and Engineering, vol. 28, no. 4, pp. 671–688, 2012. View at Google Scholar
  58. S.-W. Kim, S. Park, and W. W. Chu, “An index-based approach for similarity search supporting time warping in large sequence databases,” in Proceedings of the 17th International Conference on Data Engineering, pp. 607–614, 2001. View at Scopus
  59. B.-K. Yi, H. V. Jagadish, and C. Faloutsos, “Efficient retrieval of similar time sequences under time warping,” in Proceedings of the 14th International Conference on Data Engineering, pp. 201–208, 1998. View at Scopus
  60. H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh, “Querying and mining of time series data: experimental comparison of representations and distance measures,” Proceedings of the VLDB Endowment, vol. 1, no. 2, pp. 1542–1552, 2008. View at Google Scholar
  61. X. Wang, A. Mueen, H. Ding, G. Trajcevski, P. Scheuermann, and E. Keogh, “Experimental comparison of representation methods and distance measures for time series data,” Data Mining and Knowledge Discovery, vol. 26, no. 2, pp. 275–309, 2012. View at Publisher · View at Google Scholar · View at Scopus
  62. R. R. Sokal, “A statistical method for evaluating systematic relationships,” University of Kansas Scientific Bulletin, vol. 38, no. 1958, pp. 1409–1438, 1958. View at Google Scholar
  63. I. Gronau and S. Moran, “Optimal implementations of UPGMA and other common clustering algorithms,” Information Processing Letters, vol. 104, no. 6, pp. 205–210, 2007. View at Publisher · View at Google Scholar · View at Scopus
  64. E. Keogh, Q. Zhu, B. Hu et al., “The UCR time series data mining archive,” UCR Time Series Classification, 2011, http://www.cs.ucr.edu/~eamonn/time_series_data/.
  65. J. Lin, E. Keogh, L. Wei, and S. Lonardi, “Experiencing SAX: a novel symbolic representation of time series,” Data Mining and Knowledge Discovery, vol. 15, no. 2, pp. 107–144, 2007. View at Publisher · View at Google Scholar · View at Scopus
  66. H. Zhang, T. B. Ho, Y. Zhang, and M.-S. Lin, “Unsupervised feature extraction for time series clustering using orthogonal wavelet transform,” Informatica, vol. 30, no. 3, pp. 305–319, 2006. View at Google Scholar · View at Scopus
  67. E. Amigó, J. Gonzalo, J. Artiles, and F. Verdejo, “A comparison of extrinsic clustering evaluation metrics based on formal constraints,” Information Retrieval, vol. 12, no. 4, pp. 461–486, 2009. View at Publisher · View at Google Scholar · View at Scopus
  68. C. Ratanamahatana, E. Keogh, A. J. Bagnall, and S. Lonardi, “A novel bit level time series representation with implications for similarity search and clustering,” in Proceedings of the 9th Pacific-Asian International Conference on Knowledge Discovery and Data Mining (PAKDD '05), pp. 771–777, 2005.
  69. J. Wu, H. Xiong, and J. Chen, “Adapting the right measures for K-means clustering,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '09), pp. 877–886, 2009. View at Publisher · View at Google Scholar · View at Scopus
  70. M. Chiş, S. Banerjee, and A. E. Hassanien, “Clustering time series data: an evolutionary approach,” Foundations of Computational, IntelligenceVolume 6, vol. 206, pp. 193–207, 2009. View at Publisher · View at Google Scholar · View at Scopus
  71. F. Rohlf, “Methods of comparing classifications,” Annual Review of Ecology and Systematics, vol. 5, pp. 101–113, 1974. View at Publisher · View at Google Scholar
  72. M. Song and L. Zhang, “Comparison of cluster representations from partial second-to full fourth-order cross moments for data stream clustering,” in Proceedings of the 8th IEEE International Conference on Data Mining (ICDM '08), pp. 560–569, 2008. View at Publisher · View at Google Scholar · View at Scopus
  73. F. Gullo, G. Ponti, A. Tagarelli, G. Tradigo, and P. Veltri, “A time series approach for clustering mass spectrometry data,” Journal of Computational Science, vol. 3, no. 5, pp. 344–355, 2012. View at Publisher · View at Google Scholar
  74. C. J. van Rijsbergen, “A non-classical logic for information retrieval,” The Computer Journal, vol. 29, no. 6, pp. 481–485, 1986. View at Google Scholar · View at Scopus
  75. E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, “Dimensionality reduction for fast similarity search in large time series databases,” Knowledge and Information Systems, vol. 3, no. 3, pp. 263–286, 2001. View at Publisher · View at Google Scholar
  76. E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, “Locally adaptive dimensionality reduction for indexing large time series databases,” ACM SIGMOD Record, vol. 30, no. 2, pp. 151–162, 2001. View at Publisher · View at Google Scholar
  77. C. J. van Rijsbergen, Information Retrieval, Butterworths, London, UK, 1979.