Table of Contents
International Scholarly Research Notices
Volume 2014, Article ID 414013, 11 pages
http://dx.doi.org/10.1155/2014/414013
Research Article

Canonical PSO Based -Means Clustering Approach for Real Datasets

1Heritage Institute of Technology, Kolkata, West Bengal 700 107, India
2Institute of Engineering & Management, Kolkata, West Bengal 700 091, India

Received 14 June 2014; Revised 19 September 2014; Accepted 2 October 2014; Published 13 November 2014

Academic Editor: Francesco Camastra

Copyright © 2014 Lopamudra Dey and Sanjay Chakraborty. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

“Clustering” the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms.