Biotechnology Research International

Review Article

Metagenomics: Retrospect and Prospects in High Throughput Age

Table 3

A brief description of bioinformatic tools commonly employed for postsequencing analysis of metagenomic sequence data.


Postsequencing task	Bioinformatic tool	Brief description	URL	Reference

Metagenomic assembly tool	MetaVelvet	Decomposes a de Bruijn graph into individual subgraphs on the basis of coverage (abundance) difference and graph connectivity. Overcomes the limitation of a single-genome assembler to misidentify sequences from highly abundant species as repeats. Results in higher N50 scores than any single-genome assembler.	http://metavelvet.dna.bio.keio.ac.jp/	[63]
	Meta-IDBA	Implies partitioning the de Bruijn graph into isolated components of different species by grouping similar regions of similar subspecies and partitioning the graph into components based on the topological structure of the graph.	http://i.cs.hku.hk/~alse/hkubrg/projects/metaidba/	[64]
	Genovo	Uses Bayesian approach and generative probabilistic model of read generation which works by discovering likely sequence reconstructions under the model. Algorithm used is iterated conditional modes (ICM) algorithm, which maximizes local conditional probabilities sequentially.	http://cs.stanford.edu/group/genovo/	[90]
	Bambus 2	Uses mate-pair information during the assembly process which is not used by Meta-IDBA, MetaVelvet, and Genovo. Algorithms operate on a contig graph generation followed by orientation, positioning, and simplification for proper scaffolding.	http://amos.sf.net.	[91]

Short read alignment and mapping to reference genome	Bowtie	An ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences which employs Burrows-Wheeler index based on the full-text minute-space (FM) index having low memory footprint (1.3 GB only) also supports gapped, local, and paired-end alignment modes.	http://bowtie-bio.sourceforge.net/index.shtml	[92]
	BWA	Employed for mapping low-divergent sequences against a large reference genome. Has three-algorithm mode for different read length. For Illumina sequence reads up to 100 bp size algorithm BWA-backtrack is used, while algorithms, BWA-SW and BWA-MEM, meant for longer sequences ranged from 70 bp to 1 Mbp.	http://bio-bwa.sourceforge.net/	[93]
	SOAP 3	Fast, accurate, and sensitive GPU-based short read aligner which delivers high speed and sensitivity simultaneously. Found to take less than 30 seconds to align one million read pairs onto the human reference genome, much faster than BWA and Bowtie.	http://www.cs.hku.hk/2bwt-tools/soap3-dp/	[94]
	mrsFAST	A cache oblivious mapper that is designed to map short reads to reference genome. mrsFAST maps short reads with respect to user defined error threshold.	http://sfu-compbio.github.io/mrsfast/	[95]

Microbial diversity analysis	MLST	Exploits unambiguous nature and electronic portability of nucleotide sequence data for the characterization of microorganisms.	http://www.mlst.net/	[96]
	Axiome	Streamlines and manages analysis of small subunit (SSU) rRNA marker data in QIIME and mothur. Has a companion graphical user interface (GUI) and is designed to be easily extended to facilitate customized research workflows.	http://neufeld.github.com/axiometic	[97]
	PHACCS	Uses the contig spectrum from shotgun DNA based on modified Lander-Waterman algorithm sequence assemblies to predict structure of viral communities and make predictions about diversity.	http://phaccs.sourceforge.net/	[98]

Functional annotation	RAMMCAP	An ultrafast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours.	http://weizhong-lab.ucsd.edu/rammcap/cgi-bin/rammcap.cgi	[99]

Gene annotation/gene calling	FragGeneScan	Combines sequencing error models and codon usages in a hidden Markov model to improve the prediction of protein-coding region in short reads.	http://omics.informatics.indiana.edu/FragGeneScan/	[100]
	MetaGeneMark	An ab initio gene prediction tool with updated heuristic models designed for metagenomic sequences.	http://exon.gatech.edu/meta_gmhmmp.cgi	[101]
	MetaGeneAnnotator	Precisely predicts all kinds of prokaryotic genes from a single or a set of anonymous genomic sequences having a variety of lengths. Integrates statistical models of prophage genes in addition to those of bacterial and archaeal genes and also uses a self-training model from input sequences for predictions.	http://metagene.cb.k.u-tokyo.ac.jp/	[102]

Binning	TETRA	Based on statistical analysis of tetranucleotide usage patterns in genomic fragments which automate the task of comparative tetranucleotide frequency analysis and outperform (G+C) content based analysis.	http://www.megx.net/tetra/index.html	[103]
	MetaCluster 5.0	A two-round binning method that separates reads of high-abundance species from those of low-abundance species in two different rounds and aims at identifying both low-abundance and high-abundance species in the presence of a large amount of noise due to many extremely low-abundance species. Uses a filtering strategy to remove noise from the extremely low-abundance species.	http://i.cs.hku.hk/~alse/MetaCluster/	[104]
	Phymm	Uses interpolated Markov models (IMMs) to characterize variable-length oligonucleotides typical of a phylogenetic grouping.	http://www.cbcb.umd.edu/software/phymm/	[105]

Automated platforms/servers for comparative and functional analysis of metagenomic sequence data	MG-RAST	MG-RAST (the Metagenomics RAST) server is an automated analysis platform which provides upload, quality control, automated annotation, and analysis for prokaryotic metagenomic shotgun samples.	http://metagenomics.anl.gov	[65]
	MetAMOS	An open source and modular metagenomic assembly and analysis pipeline leveraging over 20 existing tools with some new tools integrated as well. Entire pipeline is built around the unique features provided by the metagenomic scaffolder Bambus 2.	https://github.com/treangen/MetAMOS	[66]
	MEGAN 4	Released in 2011 for taxonomic analysis, comparative analysis, and functional analysis methods based on the SEED and KEGG (Kyoto Encyclopedia for Genes and Genomes)	http://www-ab.informatik.uni-tuebingen.de/software/megan	[106]
	IMG/M	A data management and analysis system for microbial community genomes (metagenomes) hosted at the Department of Energy’s (DOE) Joint Genome Institute (JGI). IMG/M consists of metagenome data integrated with isolate microbial genomes from the Integrated Microbial Genomes (IMG) system.	http://img.jgi.doe.gov/cgi-bin/m/main.cgi	[67]
	CAMERA	Provides access to raw environmental sequence data, with associated metadata, precomputed annotation, and analyses. Integrates tools for gene prediction and annotation, clustering, assembly sequence quality control, functional and comparative genomics applications, and many other downstream analysis tools.	http://camera.calit2.net	[68]
	GALAXY	A publicly available web service, with software system that provides support for analysis of genomic, comparative genomic, and functional genomic data through a framework that gives experimentalists simple interfaces to powerful tools while automatically managing the computational details.	http://galaxyproject.org	[69]