Table of Contents
Dataset Papers in Biology
Volume 2013 (2013), Article ID 364725, 9 pages
http://dx.doi.org/10.7167/2013/364725
Dataset Paper

First Y-Short Tandem Repeat Categorical Dataset for Clustering Applications

1Center for Computer Sciences, Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA (UiTM), 40450 Shah Alam, Selangor, Malaysia
2International University College of Arts and Science, L1.10 Cova Square, Jalan Teknologi, Kota Damansara PJU5, 47810 Petaling Jaya, Selangor Darul Ehsan, Malaysia

Received 9 October 2012; Accepted 8 November 2012

Academic Editors: V. Grolmusz and L. Nanni

This dataset has been dedicated to the public domain using the CC0 waiver.

Dataset http://dx.doi.org/10.7167/2013/364725/dataset

Dataset

Dataset Item 1 (Table). This table consists of 751 objects of Y-STR haplogroup belonging to the Ireland Y-DNA Project (http://www.familytreedna.com/public/IrelandHeritage/). After filtration, this table is composed of only five haplogroups: E (24), G (20), L (200), J (32), and R (475). Note that the raw data are approximately 3419 data divided into 29 groups. The values in the parentheses indicate the number of objects belonging to that particular group. This table is considered as having a lower degree of similarity of objects among themselves, which indicates that the objects in the table are considerably distant to each other. In the table, the first column is the Kit Number followed by the 25 markers. Note that the Kit Number is actually the extended Kit Number that combined a prefix of its haplogroup name separated by the dash and followed by the original Kit Number.

  • Column 1: Kit Number
  • Column 2: DYS393
  • Column 3: DYS390
  • Column 4: DYS19 (394)
  • Column 5: DYS391
  • Column 6: DYS385a
  • Column 7: DYS385b
  • Column 8: DYS426
  • Column 9: DYS388
  • Column 10: DYS439
  • Column 11: DYS389I
  • Column 12: DYS392
  • Column 13: DYS389II
  • Column 14: DYS458
  • Column 15: DYS459a
  • Column 16: DYS459b
  • Column 17: DYS455
  • Column 18: DYS454
  • Column 19: DYS447
  • Column 20: DYS437
  • Column 21: DYS448
  • Column 22: DYS449
  • Column 23: DYS464a
  • Column 24: DYS464b
  • Column 25: DYS464c
  • Column 26: DYS464b

Dataset Item 2 (Table). This table consists of 267 objects of Y-STR haplogroup obtained from the Finland DNA Project (http://www.familytreedna.com/public/Finland). After filtration, this table is composed of only four haplogroups: L (92), J (6), N (141), and R (28). Note that the raw data are approximately 906 data divided into 7 groups. The values in the parentheses indicate the number of objects belonging to that particular group. This table is considered as having a lower degree of similarity of objects among themselves, which indicates that the objects in the table are considerably distant to each other. In the table, the first column is the Kit Number followed by the 25 markers. Note that the Kit Number is actually the extended Kit Number that combined a prefix of its haplogroup name separated by the dash and followed by the original Kit Number.

  • Column 1: Kit Number
  • Column 2: DYS393
  • Column 3: DYS390
  • Column 4: DYS19 (394)
  • Column 5: DYS391
  • Column 6: DYS385a
  • Column 7: DYS385b
  • Column 8: DYS426
  • Column 9: DYS388
  • Column 10: DYS439
  • Column 11: DYS389I
  • Column 12: DYS392
  • Column 13: DYS389II
  • Column 14: DYS458
  • Column 15: DYS459a
  • Column 16: DYS459b
  • Column 17: DYS455
  • Column 18: DYS454
  • Column 19: DYS447
  • Column 20: DYS437
  • Column 21: DYS448
  • Column 22: DYS449
  • Column 23: DYS464a
  • Column 24: DYS464b
  • Column 25: DYS464c
  • Column 26: DYS464

Dataset Item 3 (Table). This table consists of 263 objects obtained from the Y-haplogroup project (http://www.worldfamilies.net/yhapprojects). After filtration, this final table is composed of only three haplogroups: Group G (37), Group N (68), and Group T (158). Note that the raw data are approximately 516 data taken from haplogroups G, N, and T. The values in the parentheses indicate the number of objects belonging to that particular group. This table is considered as having a lower degree of similarity of objects among themselves, which indicates that the objects in the table are considerably distant to each other. In the table, the first column is the Kit Number followed by the 25 markers. Note that the Kit Number is actually the extended Kit Number that combined a prefix of its haplogroup name separated by the dash and followed by the original Kit Number.

  • Column 1: Kit Number
  • Column 2: DYS393
  • Column 3: DYS390
  • Column 4: DYS19 (394)
  • Column 5: DYS391
  • Column 6: DYS385a
  • Column 7: DYS385b
  • Column 8: DYS426
  • Column 9: DYS388
  • Column 10: DYS439
  • Column 11: DYS389I
  • Column 12: DYS392
  • Column 13: DYS389II
  • Column 14: DYS458
  • Column 15: DYS459a
  • Column 16: DYS459b
  • Column 17: DYS455
  • Column 18: DYS454
  • Column 19: DYS447
  • Column 20: DYS437
  • Column 21: DYS448
  • Column 22: DYS449
  • Column 23: DYS464a
  • Column 24: DYS464b
  • Column 25: DYS464c
  • Column 26: DYS464

Dataset Item 4 (Table). This table consists of 236 objects combining four surnames: the Donald surname (112), the Flannery surname (64), the Mumma surname (42), and the William surname (18). The Donald surname data were obtained from Clan Donald’s DNA Projects (http://dna-project.clan-donald-usa.org/). The raw data are approximately 896 data. The Flannery surname data were obtained from the Flannery Clan Y-DNA project (http://www.flanneryclan.ie/). The raw data are approximately 896 data. The Mumma surname data were obtained from the Mumma-Moomaw Project (http://www.mumma.org/). The raw data are approximately 78 data. The William surname data were obtained from the Williams DNA Project (http://williams.genealogy.fm/). The raw data are approximately 626 data taken from 94 groups. The values in the parentheses indicate the number of objects belonging to that particular group. This table is considered as having a higher degree of similarity of objects among themselves, which indicates that the objects in the table are considerably similar or almost similar to each other. In the table, the first column is the Kit Number followed by the 25 markers. Note that the Kit Number is actually the extended Kit Number that combined a prefix of its surname separated by the dash and followed by the original Kit Number.

  • Column 1: Kit Number
  • Column 2: DYS393
  • Column 3: DYS390
  • Column 4: DYS19 (394)
  • Column 5: DYS391
  • Column 6: DYS385a
  • Column 7: DYS385b
  • Column 8: DYS426
  • Column 9: DYS388
  • Column 10: DYS439
  • Column 11: DYS389I
  • Column 12: DYS392
  • Column 13: DYS389II
  • Column 14: DYS458
  • Column 15: DYS459a
  • Column 16: DYS459b
  • Column 17: DYS455
  • Column 18: DYS454
  • Column 19: DYS447
  • Column 20: DYS437
  • Column 21: DYS448
  • Column 22: DYS449
  • Column 23: DYS464a
  • Column 24: DYS464b
  • Column 25: DYS464c
  • Column 26: DYS464

Dataset Item 5 (Table). This table consists of 112 objects belonging to the Philips DNA project (http://www.phillipsdnaproject.com/). After filtration, the final data are composed of only 8 family groups: Group 2 (30), Group 4 (8), Group 5 (10), Group 8 (18), Group 10 (17), Group 16 (10), Group 17 (12), and Group 29 (7). Note that the raw data are approximately 341 data taken from 64 groups. The values in the parentheses indicate the number of objects belonging to that particular group. This table is considered as having a higher degree of similarity of objects among themselves, which indicates that the objects in the table are considerably similar or almost similar to each other. In the table, the first column is the Kit Number followed by the 25 markers. Note that the Kit Number is actually the extended Kit Number that combined a prefix of its surname separated by the dash and followed by the original Kit Number.

  • Column 1: Kit Number
  • Column 2: DYS393
  • Column 3: DYS390
  • Column 4: DYS19 (394)
  • Column 5: DYS391
  • Column 6: DYS385a
  • Column 7: DYS385b
  • Column 8: DYS426
  • Column 9: DYS388
  • Column 10: DYS439
  • Column 11: DYS389I
  • Column 12: DYS392
  • Column 13: DYS389II
  • Column 14: DYS458
  • Column 15: DYS459a
  • Column 16: DYS459b
  • Column 17: DYS455
  • Column 18: DYS454
  • Column 19: DYS447
  • Column 20: DYS437
  • Column 21: DYS448
  • Column 22: DYS449
  • Column 23: DYS464a
  • Column 24: DYS464b
  • Column 25: DYS464c
  • Column 26: DYS464

Dataset Item 6 (Table). This table consists of 112 objects belonging to the Brown Surname project (http://brownsociety.org/). After filtration, the data are composed of only 14 family groups: Group 2 (9), Group 10 (17), Group 15 (6), Group 18 (6), Group 20 (7), Group 23 (8), Group 26 (8), Group 28 (8), Group 34 (7), Group 44 (6), Group 35 (7), Group 46 (7), Group 49 (10), and Group 91 (6). Note that the raw data are approximately 543 data taken from 126 groups. The values in the parentheses indicate the number of objects belonging to that particular group. This table is considered as having a higher degree of similarity of objects among themselves, which indicates that the objects in the table are considerably similar or almost similar to each other. In the table, the first column is the Kit Number followed by the 25 markers. Note that the Kit Number is actually the extended Kit Number that combined a prefix of its surname separated by the dash and followed by the original Kit Number.

  • Column 1: Kit Number
  • Column 2: DYS393
  • Column 3: DYS390
  • Column 4: DYS19 (394)
  • Column 5: DYS391
  • Column 6: DYS385a
  • Column 7: DYS385b
  • Column 8: DYS426
  • Column 9: DYS388
  • Column 10: DYS439
  • Column 11: DYS389I
  • Column 12: DYS392
  • Column 13: DYS389II
  • Column 14: DYS458
  • Column 15: DYS459a
  • Column 16: DYS459b
  • Column 17: DYS455
  • Column 18: DYS454
  • Column 19: DYS447
  • Column 20: DYS437
  • Column 21: DYS448
  • Column 22: DYS449
  • Column 23: DYS464a
  • Column 24: DYS464b
  • Column 25: DYS464c
  • Column 26: DYS464