About this Journal Submit a Manuscript Table of Contents
Applied Computational Intelligence and Soft Computing
Volume 2013 (2013), Article ID 863146, 12 pages
http://dx.doi.org/10.1155/2013/863146
Research Article

Subspace Clustering of High-Dimensional Data: An Evolutionary Approach

1Department of Computer Science and Engineering, Faculty of Engineering and Technology, Mody Institute of Technology and Science, Lakshmangarh, Rajasthan 332311, India
2School of Computer Engineering, KIIT University, Bhubaneswar 751024, India

Received 21 August 2013; Revised 20 October 2013; Accepted 11 November 2013

Academic Editor: Sebastian Ventura

Copyright © 2013 Singh Vijendra and Sahoo Laxman. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Linked References

  1. L. C. Hamilton, Modern Data Analyis: A First Course in Applied Statistics, Brooks/Cole, 1990.
  2. D. H. Fisher, “Iterative optimization and simplification of hierarchical clustering,” in Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining (KDD ’95), pp. 118–123, Montreal, Canada, 1995.
  3. J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888–905, 2000. View at Publisher · View at Google Scholar · View at Scopus
  4. N. Kumar and K. Kummamuru, “Semisupervised clustering with metric learning using relative comparisons,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 4, pp. 496–503, 2008. View at Publisher · View at Google Scholar · View at Scopus
  5. M. H. Dunham, Data Mining: Introductory and Advanced Topics, Prentice Hall, 2003.
  6. A. Thalamuthu, I. Mukhopadhyay, X. Zheng, and G. C. Tseng, “Evaluation and comparison of gene clustering methods in microarray analysis,” Bioinformatics, vol. 22, no. 19, pp. 2405–2412, 2006. View at Publisher · View at Google Scholar · View at Scopus
  7. K. Wang, Z. Du, Y. Chen, and S. Li, “V3COCA: an effective clustering algorithm for complicated objects and its application in breast cancer research and diagnosis,” Simulation Modelling Practice and Theory, vol. 17, no. 2, pp. 454–470, 2009. View at Publisher · View at Google Scholar · View at Scopus
  8. J. Sander, M. Ester, H. Kriegel, and X. Xu, “Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 169–194, 1998. View at Scopus
  9. B. R. S. A. Young, Efficient algorithms for data mining with federated databases [Ph.D. thesis], University of Cincinnati, 2007.
  10. G. Sheikholeslami, S. Chatterjee, and A. Zhang, “WaveCluster: a wavelet-based clustering approach for spatial data in very large databases,” Journal of Very Large Data Bases, vol. 8, no. 3-4, pp. 289–304, 2000. View at Scopus
  11. J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco, Calif, USA, 2011.
  12. J. McQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the 5th Berkeley Symposium on Mathematics, Statistics, and Probabilistics, vol. 1, pp. 281–297, 1967.
  13. E. M. Voorhees, “Implementing agglomerative hierarchic clustering algorithms for use in document retrieval,” Information Processing and Management, vol. 22, no. 6, pp. 465–576, 1986. View at Scopus
  14. T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: an efficient data clustering method for very large databases,” in Proceedings of the ACM SIGMOD International Conference on Management of Data of ACM SIGMOD Record, vol. 25, no. 2, pp. 103–114, June 1996.
  15. S. Guha and R. Rastogi, “CURE: an efficient clustering algorithm for large databases,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, vol. 27, no. 2, pp. 73–84, ACM Press, 1998.
  16. F. Korn, B. Pagel, and C. Faloutsos, “On the “dimensionality curse” and the ‘self-similarity blessing–,” IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 1, pp. 96–111, 2001. View at Publisher · View at Google Scholar · View at Scopus
  17. K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, “When is nearest neighbors meaningful,” in Proceedings of the 7th International Conference on Database Theory (ICDT '99), pp. 217–235, London, UK, 1999.
  18. B. W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman and Hall, London, UK, 1986.
  19. M. Ester, H. P. Kriegel, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD '96), pp. 226–223, Portand, Ore, USA, 1996.
  20. A. Hinneburg and H. Gabriel, “DENCLUE 2. 0: fast clustering based on kernel density estimation,” in Advances in Intelligent Data Analysis VII, pp. 70–80, Springer, Berlin, Germany, 2007.
  21. A. H. Pilevar and M. Sukumar, “GCHL: a grid-clustering algorithm for high-dimensional very large spatial data bases,” Pattern Recognition Letters, vol. 26, no. 7, pp. 999–1010, 2005. View at Publisher · View at Google Scholar · View at Scopus
  22. W. Wang, J. Yang, and R. Muntz, “STING: a statistical information grid approach to spatial data mining,” in Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB '97), vol. 97, pp. 186–195, 1997.
  23. C. C. Aggarwal, C. Procopiuc, J. L. Wolf, and J. S. Park, “Fast algorithms for projected clustering,” in Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, vol. 28, no. 2, pp. 61–72, 1999.
  24. A. Patrikainen and M. Meila, “Comparing subspace clusterings,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 7, pp. 902–916, 2006. View at Publisher · View at Google Scholar · View at Scopus
  25. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, “Automatic subspace clustering of high dimensional data for data mining applications,” in Proceedings of the ACM-SIGMOD Conference on the Management of Data, vol. 27, no. 2, pp. 94–105, 1998.
  26. K. Kailing, H. P. Kriegel, and P. Kroger, “Density-connected subspace clustering for high-dimensional data,” in Proceedings 4th SIAM International Conference on Data Mining (SDM '04), p. 4, 2004.
  27. C. C. Aggarwal and P. S. Yu, “Finding generalized projected clusters in high dimensional spaces,” in Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, vol. 29, no. 2, pp. 70–81, 2000.
  28. H. Kriegel, P. Kröger, M. Renz, and S. Wurst, “A generic framework for efficient subspace clustering of high-dimensional data,” in Proceedings of the 5th IEEE International Conference on Data Mining (ICDM '05), pp. 250–257, November 2005. View at Publisher · View at Google Scholar · View at Scopus
  29. M. J. Zaki, M. Peters, I. Assent, and T. Seidl, “CLICKS: an effective algorithm for mining subspace clusters in categorical datasets,” Data and Knowledge Engineering, vol. 60, no. 1, pp. 51–70, 2007. View at Publisher · View at Google Scholar · View at Scopus
  30. J. H. Friedman and J. J. Meulman, “Clustering objects on subsets of attributes,” Journal of the Royal Statistical Society B, vol. 66, no. 4, pp. 815–839, 2004. View at Publisher · View at Google Scholar · View at Scopus
  31. K. Woo, J. Lee, M. Kim, and Y. Lee, “FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting,” Information and Software Technology, vol. 46, no. 4, pp. 255–271, 2004. View at Publisher · View at Google Scholar · View at Scopus
  32. Y. Chu, J. Huang, K. Chuang, D. Yang, and M. Chen, “Density conscious subspace clustering for high-dimensional data,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 1, pp. 16–30, 2010. View at Publisher · View at Google Scholar · View at Scopus
  33. C. H. Cheng, A. W. Fu, C. H. Cheng, A. W. Fu, and Y. Zhang, “Entropy-based subspace clustering for mining numerical data,” in Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 84–93, 1999.
  34. S. Goil, H. S. Nagesh, and A. Choudhary, “MAFIA: efficient and scalable subspace clustering for very large data sets,” in Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 443–452, 1999.
  35. H. S. Nagesh, S. Goil, and A. Choudhary, “Adaptive grids for clustering massive data sets,” in Proceedings of the 1st SIAM International Conference on Data Mining (SDM '01), pp. 1–17, 2001.
  36. C. M. Procopiuc, M. Jones, P. K. Agarwal, and T. M. Murali, “A Monte Carlo algorithm for fast projective clustering,” in Proceedings of the 2002 ACM SIGMOD International Conference on Managment of Data, pp. 418–427, June 2002. View at Scopus
  37. B. L. Milenova and M. M. Campos, “O-Cluster: scalable clustering of large high dimensional sata sets,” in Oracle Data Mining Technologies, pp. 290–297, Oracle Corporation, 2002.
  38. L. Jing, M. K. Ng, and J. Z. Huang, “An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 8, pp. 1026–1041, 2007. View at Publisher · View at Google Scholar · View at Scopus
  39. E. K. K. Ng, A. W. Fu, and R. C. Wong, “Projective clustering by histograms,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 3, pp. 369–383, 2005. View at Publisher · View at Google Scholar · View at Scopus
  40. A. Elke, C. Böhm, H. P. Kriegel, P. Kröger, I. M. Gorman, and A. Zimek, “Detection and visualization of subspace cluster hierarchies,” in Advances in Databases: Concepts, Systems and Applications, pp. 152–163, springer, Berlin, Germany, 2007.
  41. S. Robert and P. H. A. Sneath, Principles of Numerical Taxonomy, W. H. Freeman, 1963.
  42. D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, 1989.
  43. E. R. Hruschka, R. J. G. B. Campello, A. A. Freitas, and A. C. P. L. F. de Carvalho, “A survey of evolutionary algorithms for clustering,” IEEE Transactions on Systems, Man and Cybernetics C, vol. 39, no. 2, pp. 133–155, 2009. View at Publisher · View at Google Scholar · View at Scopus
  44. J. Handl and J. Knowles, “An evolutionary approach to multiobjective clustering,” IEEE Transactions on Evolutionary Computation, vol. 11, no. 1, pp. 56–76, 2007. View at Publisher · View at Google Scholar · View at Scopus
  45. T. Kohonen, Self-Organizing Maps, vol. 30, Springer, 2001.
  46. D. E. Goldberg and K. Deb, “A comparative analysis of selection schemes used in genetic algorithms,” in Foundations of Genetic Algorithms, vol. 51, pp. 61801–62996, Urbana, 1991.
  47. J. E. Baker, An analysis of the effects of selection in genetic algorithms [Ph.D. thesis], Graduate School of Vanderbilt University, Nashville, Tenn, USA, 1989.
  48. J. Yeh and J. C. Fu, “A hierarchical genetic algorithm for segmentation of multi-spectral human-brain MRI,” Expert Systems with Applications, vol. 34, no. 2, pp. 1285–1295, 2008. View at Publisher · View at Google Scholar · View at Scopus
  49. J. H. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, Mich, USA, 1975.
  50. D. Thierens and D. Goldberg, “Elitist recombination: an integrated selection recombination GA,” in Proceedings of the 1st IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Intelligence (ICEC '94), pp. 508–512, June 1994. View at Scopus
  51. J. H. Friedman, L. B. Jon, and A. F. Raphael, “An algorithm for finding best matches in logarithmic expected time,” ACM Transactions on Mathematical Software, vol. 3, no. 3, pp. 209–226, 1977.