Table of Contents Author Guidelines Submit a Manuscript
Mathematical Problems in Engineering
Volume 2013, Article ID 641927, 10 pages
http://dx.doi.org/10.1155/2013/641927
Research Article

Active Semisupervised Clustering Algorithm with Label Propagation for Imbalanced and Multidensity Datasets

1School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
2Gansu Computing Center, Lanzhou 730000, China

Received 27 August 2013; Revised 11 October 2013; Accepted 13 October 2013

Academic Editor: Gelan Yang

Copyright © 2013 Mingwei Leng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The accuracy of most of the existing semisupervised clustering algorithms based on small size of labeled dataset is low when dealing with multidensity and imbalanced datasets, and labeling data is quite expensive and time consuming in many real-world applications. This paper focuses on active data selection and semisupervised clustering algorithm in multidensity and imbalanced datasets and proposes an active semisupervised clustering algorithm. The proposed algorithm uses an active mechanism for data selection to minimize the amount of labeled data, and it utilizes multithreshold to expand labeled datasets on multidensity and imbalanced datasets. Three standard datasets and one synthetic dataset are used to demonstrate the proposed algorithm, and the experimental results show that the proposed semisupervised clustering algorithm has a higher accuracy and a more stable performance in comparison to other clustering and semisupervised clustering algorithms, especially when the datasets are multidensity and imbalanced.