Research Article

Objective and Comprehensive Evaluation of Bisulfite Short Read Mapping Tools

Table 1

Detailed comparison of different bisulfite short reads mapping tools.

ProgramsYear Algorithmic Technique usedLanguageAlignerInputOutputMin./Max. read lengthMismatchesIndelsGaps Single/Paired-endMulti-threadedNondirectional

ERNE-bs52012Hash genome indexing uses a 5-letter (Cm, Cu) for storing methylation information and uses a weighted context-aware Hamming distance to identify a T coming from an unmethylated C. C++Nonegz/bz2/fastq/fastaBAM/
SAM
up to 600 bp1 every 15 bp (-errors arg)YesYesbothYesNo

BatMeth2012FM index integrates mismatch counting, list filtering and mismatch stage filtering and fast mapping onto two indexes. Perl/C++NonefastaNANAup to 5 (- ) in a readNoNoYesYesYes

BiSS2012Reference genome hashing, local Smith-Waterman alignmentPerlNonefasta/fastq/gz/SAM/BAMSAM/BAM/Next GenMapup to 4096 bp(- from 0 to 1) in a read Default YesYesYesYesNo

Bismark2011FM-Index enumerates all possible T to C conversionPerlBowtie/Bowtie2fasta/fastqBAM/SAMBowtie: up to 1000 bp Bowtie 2: unlimited0 or 1 in a seed (- )YesYesbothYesYes

BS-Seeker22013FM-Index enumerates all possible T to C conversionPythonBowtie2/Bowtie/SOAP/RMAPfasta, fastq, qseq, pure sequenceBAM/SAM/BS-Seeker50–500 bpup to 4 per read (- )YesYesSingleNoYes

BS-Seeker2010FM-Index, enumerates all possible T to C conversion, converts the genome to 3 letters, and uses Bowtie to align readsPythonBowtiefasta, fastq, qseq, pure sequenceBAM/SAM/BS_Seeker50–250 bpup to 3 per read (- )YesNoSingleNoYes

BSMAP2009hashing of reference genome and bitwise masking tries all possible T to C combinations for readsPythonSOAPfasta/fastq/
SAM
SAM/txtup to 144 bpup to 15 in a read (- )up to 3 bpbothYesYes

RMAP2008Wildcard matching for mapping Ts, incorporates the use of quality scores directly into the mapping processC++fastq/fastaBEDunlimitedup to 10 in a read (- )NoNobothNoNo

BRAT-BW2012Converts a TA reference and CG reference; two FM indices are built on the positive strand of the reference genomeC++Text file with input file names in fastq, sequence onlytxt32 bp-unlimitedunlimited NoNobothYesYes

MAQ2008Builds multiple hash tables to index the reads, scans the reference genome against the hash tables to find hitsPerl/C/C++fastqmaqUp to 63  bpup to 3 per readYes, - NobothNoNo

PASH2010Implements -mer level alignment using multipositional hash tablesCfastqTxt/SAMNAYesYesNoSingleNoNo

Novo-align2010Hashing genomeC/C++fastqSAM/BAMup to 8 per read, 16 for paired end readsYesYesup to 7 bp on single end readsBothNoYes

Methyl-coder2011FM-Index, all Cs converted to TsC/C++/PythonGSNAP/bowtiefastq/fastaBAM/SAMBowtie: up to 1000 bpYesNoYesbothNoNo

GSNAP2005 -mer hashing of reference genomeC/Perlgzip/fastq, fasta/bzip2SAM/GSNAP14–250 bpYesYesYesbothyesNo

BFAST2009Uses multiple indexing strategies: hashing and suffix array of the reference genomeCfastq/bz2/gzipSAMNAYesYesYesbothYesYes

Segemehl2008Enhanced suffix arrays to find exact and inexact matches. Align to read using Myers bitvector algorithmC/C++fastaSAMunlimitedYes(- )YesbothYesNo

BFAST does not have a direct option for bisulfite mapping, users have to convert Cs to Ts in both a reference genome and reads and then align converted reads to the converted reference genome.
*Parenthesis in mismatches column indicates parameter for mismatches in a program.
*1A min percentages of matches per read.