About this Journal Submit a Manuscript Table of Contents
BioMed Research International
Volume 2013 (2013), Article ID 502827, 11 pages
http://dx.doi.org/10.1155/2013/502827
Research Article

ASPic-GeneID: A Lightweight Pipeline for Gene Prediction and Alternative Isoforms Detection

1Centre Nacional d’Anàlisi Genòmica (CNAG), Parc Científic de Barcelona, 08028 Barcelona, Spain
2Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari, 70126 Bari, Italy
3Istituto di Biomembrane e Bioenergetica del Consiglio Nazionale delle Ricerche (CNR), 70126 Bari, Italy
4Centre de Regulació Genòmica (CRG), 08003 Barcelona, Spain
5Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
6Centro di Eccellenza in Genomica Comparata, Università degli Studi di Bari, 70126 Bari, Italy

Received 16 June 2013; Revised 1 August 2013; Accepted 4 August 2013

Academic Editor: Tao Huang

Copyright © 2013 Tyler Alioto et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

New genomes are being sequenced at an increasingly rapid rate, far outpacing the rate at which manual gene annotation can be performed. Automated genome annotation is thus necessitated by this growth in genome projects; however, full-fledged annotation systems are usually home-grown and customized to a particular genome. There is thus a renewed need for accurate ab initio gene prediction methods. However, it is apparent that fully ab initio methods fall short of the required level of sensitivity and specificity for a quality annotation. Evidence in the form of expressed sequences gives the single biggest improvement in accuracy when used to inform gene predictions. Here, we present a lightweight pipeline for first-pass gene prediction on newly sequenced genomes. The two main components are ASPic, a program that derives highly accurate, albeit not necessarily complete, EST-based transcript annotations from EST alignments, and GeneID, a standard gene prediction program, which we have modified to take as evidence intron annotations. The introns output by ASPic CDS predictions is given to GeneID to constrain the exon-chaining process and produce predictions consistent with the underlying EST alignments. The pipeline was successfully tested on the entire C. elegans genome and the 44 ENCODE human pilot regions.