Dataset Papers in Biology
Volume 2013 (2013), Article ID 364725, 9 pages
Dataset Paper

First Y-Short Tandem Repeat Categorical Dataset for Clustering Applications

1Center for Computer Sciences, Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA (UiTM), 40450 Shah Alam, Selangor, Malaysia
2International University College of Arts and Science, L1.10 Cova Square, Jalan Teknologi, Kota Damansara PJU5, 47810 Petaling Jaya, Selangor Darul Ehsan, Malaysia

Received 9 October 2012; Accepted 8 November 2012

Copyright © 2013 Ali Seman et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


The Y-chromosome short tandem repeat (Y-STR) data are mainly collected for a performance benchmarking result in clustering methods. There are six Y-STR dataset items, divided into two categories: Y-STR surname and Y-haplogroup data presented here. The Y-STR data are categorical, unique, and different from the other categorical data. They are composed of a lot of similar and almost similar objects. This characteristic of the Y-STR data has caused certain problems of the existing clustering algorithms in clustering them.