Table of Contents
Dataset Papers in Biology
Volume 2013, Article ID 854869, 12 pages
http://dx.doi.org/10.7167/2013/854869
Dataset Paper

Detection of Introns in Eukaryotic Small Subunit Ribosomal RNA Gene Sequences

1CNRS UMR 7138, Systématique, Adaptation, Évolution, Université de Nice-Sophia Antipolis, Parc Valrose, BP 71, 06108 Nice Cedex 02, France
2UMR 7138, Systématique, Adaptation, Évolution, Université de Nice-Sophia Antipolis, Parc Valrose, BP 71, 06108 Nice Cedex 02, France
3CNRS UMR 7144, Laboratoire Adaptation et Diversité en Milieu Marin, Place Georges Teissier, 29680 Roscoff, France
4Station Biologique de Roscoff, Université Pierre et Marie Curie-Paris 6, Place Georges Teissier, 29680 Roscoff, France

Received 27 April 2012; Accepted 20 May 2012

Academic Editors: A. G. de Brevern, D. R. Flower, and M. L. Raymer

Copyright © 2013 Dipankar Bachar et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The gene encoding SSU-rRNA sequences is the tool of choice for phylogenetic analyses and environmental biodiversity analyses of bacteria, Archaea but also unicellular Eukaryota. In Eukaryota, gene sequences may often be interrupted by long or several introns. Searching in GenBank release 188, we found descriptions of 3638 such sequences. Using a database of 180 000 SSU-rRNA sequences well annotated for taxonomy and a C++ program written for that purpose, we computed the presence of 18 691 introns (among which the 3638 described introns). Filtering on length and sequence quality, 3646 sequences were retained. These introns were clustered; clusters were analyzed for the presence of single or multiple clades at various levels of taxonomic depth, allowing future analyses of horizontal transfers. Various analyses of the results are provided as tabulated files as well as FASTA files of described or computed introns. Each sequence is annotated for cellular location (nuclear, chloroplast, and mitochondria), positions at which they were found in the SSU-rRNA sequences and taxonomy as provided by GenBank.