Review Article

Metagenomics: Retrospect and Prospects in High Throughput Age

Table 3

A brief description of bioinformatic tools commonly employed for postsequencing analysis of metagenomic sequence data.

Postsequencing taskBioinformatic toolBrief descriptionURLReference

Metagenomic assembly toolMetaVelvetDecomposes a de Bruijn graph into individual subgraphs on the basis of coverage (abundance) difference and graph connectivity.
Overcomes the limitation of a single-genome assembler to misidentify sequences from highly abundant species as repeats.
Results in higher N50 scores than any single-genome assembler.
http://metavelvet.dna.bio.keio.ac.jp/[63]
Meta-IDBA Implies partitioning the de Bruijn graph into isolated components of different species by grouping similar regions of similar subspecies and partitioning the graph into components based on the topological structure of the graph.http://i.cs.hku.hk/~alse/hkubrg/projects/metaidba/[64]
GenovoUses Bayesian approach and generative probabilistic model of read generation which works by discovering likely sequence reconstructions under the model.
Algorithm used is iterated conditional modes (ICM) algorithm, which maximizes local conditional probabilities sequentially.
http://cs.stanford.edu/group/genovo/[90]
Bambus 2Uses mate-pair information during the assembly process which is not used by Meta-IDBA, MetaVelvet, and Genovo.
Algorithms operate on a contig graph generation followed by orientation, positioning, and simplification for proper scaffolding.
http://amos.sf.net. [91]

Short read alignment and mapping to reference genomeBowtieAn ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences which employs Burrows-Wheeler index based on the full-text minute-space (FM) index having low memory footprint (1.3 GB only)
also supports gapped, local, and paired-end alignment modes.
http://bowtie-bio.sourceforge.net/index.shtml[92]
BWAEmployed for mapping low-divergent sequences against a large reference genome.
Has three-algorithm mode for different read length.
For Illumina sequence reads up to 100 bp size algorithm BWA-backtrack is used, while algorithms, BWA-SW and BWA-MEM, meant for longer sequences ranged from 70 bp to 1 Mbp.
http://bio-bwa.sourceforge.net/[93]
SOAP 3Fast, accurate, and sensitive GPU-based short read aligner which delivers high speed and sensitivity simultaneously.
Found to take less than 30 seconds to align one million read pairs onto the human reference genome, much faster than BWA and Bowtie.
http://www.cs.hku.hk/2bwt-tools/soap3-dp/[94]
mrsFASTA cache oblivious mapper that is designed to map short reads to reference genome.
mrsFAST maps short reads with respect to user defined error threshold.
http://sfu-compbio.github.io/mrsfast/[95]

Microbial diversity analysisMLSTExploits unambiguous nature and electronic portability of nucleotide sequence data for the characterization of microorganisms.http://www.mlst.net/[96]
AxiomeStreamlines and manages analysis of small subunit (SSU) rRNA marker data in QIIME and mothur.
Has a companion graphical user interface (GUI) and is designed to be easily extended to facilitate customized research workflows.
http://neufeld.github.com/axiometic[97]
PHACCSUses the contig spectrum from shotgun DNA based on modified Lander-Waterman algorithm sequence assemblies to predict structure of viral communities and make predictions about diversity.http://phaccs.sourceforge.net/[98]

Functional annotationRAMMCAPAn ultrafast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours.http://weizhong-lab.ucsd.edu/rammcap/cgi-bin/rammcap.cgi[99]

Gene annotation/gene callingFragGeneScanCombines sequencing error models and codon usages in a hidden Markov model to improve the prediction of protein-coding region in short reads.http://omics.informatics.indiana.edu/FragGeneScan/[100]
MetaGeneMarkAn ab initio gene prediction tool with updated heuristic models designed for metagenomic sequences.http://exon.gatech.edu/meta_gmhmmp.cgi[101]
MetaGeneAnnotatorPrecisely predicts all kinds of prokaryotic genes from a single or a set of anonymous genomic sequences having a variety of lengths.
Integrates statistical models of prophage genes in addition to those of bacterial and archaeal genes and also uses a self-training model from input sequences for predictions.
http://metagene.cb.k.u-tokyo.ac.jp/[102]

BinningTETRABased on statistical analysis of tetranucleotide usage patterns in genomic fragments which automate the task of comparative tetranucleotide frequency analysis and outperform (G+C) content based analysis.http://www.megx.net/tetra/index.html[103]
MetaCluster 5.0A two-round binning method that separates reads of high-abundance species from those of low-abundance species in two different rounds and aims at identifying both low-abundance and high-abundance species in the presence of a large amount of noise due to many extremely low-abundance species.
Uses a filtering strategy to remove noise from the extremely low-abundance species.
http://i.cs.hku.hk/~alse/MetaCluster/[104]
PhymmUses interpolated Markov models (IMMs) to characterize variable-length oligonucleotides typical of a phylogenetic grouping.http://www.cbcb.umd.edu/software/phymm/[105]

Automated platforms/servers for comparative and functional analysis of metagenomic sequence data MG-RASTMG-RAST (the Metagenomics RAST) server is an automated analysis platform which provides upload, quality control, automated annotation, and analysis for prokaryotic metagenomic shotgun samples.http://metagenomics.anl.gov[65]
MetAMOSAn open source and modular metagenomic assembly and analysis pipeline leveraging over 20 existing tools with some new tools integrated as well.
Entire pipeline is built around the unique features provided by the metagenomic scaffolder Bambus 2.
https://github.com/treangen/MetAMOS[66]
MEGAN 4Released in 2011 for taxonomic analysis, comparative analysis, and functional analysis methods based on the SEED and KEGG (Kyoto Encyclopedia for Genes and Genomes)http://www-ab.informatik.uni-tuebingen.de/software/megan[106]
IMG/MA data management and analysis system for microbial community genomes (metagenomes) hosted at the Department of Energy’s (DOE) Joint Genome Institute (JGI).
IMG/M consists of metagenome data integrated with isolate microbial genomes from the Integrated Microbial Genomes (IMG) system.
http://img.jgi.doe.gov/cgi-bin/m/main.cgi[67]
CAMERAProvides access to raw environmental sequence data, with associated metadata, precomputed annotation, and analyses.
Integrates tools for gene prediction and annotation, clustering, assembly sequence quality control, functional and comparative genomics applications, and many other downstream analysis tools.
http://camera.calit2.net[68]
GALAXYA publicly available web service, with software system that provides support for analysis of genomic, comparative genomic, and functional genomic data through a framework that gives experimentalists simple interfaces to powerful tools while automatically managing the computational details.http://galaxyproject.org[69]