Advances in Bioinformatics

Advances in Bioinformatics / 2009 / Article

Research Article | Open Access

Volume 2009 |Article ID 749027 |

Firas Swidan, Ron Shamir, "Assessing the Quality of Whole Genome Alignments in Bacteria", Advances in Bioinformatics, vol. 2009, Article ID 749027, 8 pages, 2009.

Assessing the Quality of Whole Genome Alignments in Bacteria

Academic Editor: Bhaskar Dasgupta
Received29 Apr 2009
Accepted28 Aug 2009
Published15 Nov 2009


Comparing genomes is an essential preliminary step to solve many problems in biology. Matching long similar segments between two genomes is a precondition for their evolutionary, genetic, and genome rearrangement analyses. Though various comparison methods have been developed in recent years, a quantitative assessment of their performance is lacking. Here, we describe two families of assessment measures whose purpose is to evaluate bacteria-oriented comparison tools. The first measure is based on how well the genome segmentation fits the gene annotation of the studied organisms; the second uses the number of segments created by the segmentation and the percentage of the two genomes that are conserved. The effectiveness of the two measures is demonstrated by applying them to the results of genome comparison tools obtained on 41 pairs of bacterial species. Despite the difference in the nature of the two types of measurements, both show consistent results, providing insights into the subtle differences between the mapping tools.

1. Introduction

With the dramatic increase in the number of sequenced genomes, comparative genome analysis has become increasingly common. Evolutionary, genetic, and genome rearrangement studies require as a first step the comparison of two whole genomes, referred to as genome mapping or whole genome alignment, with subsequent analyses dependent on the quality of the mapping [13]. This mapping (Figure 1) usually consists of a segmentation of the two genomes into fragments, and the matching of fragments between the two genomes according to type of evolutionary relations: for example, orthology, paralogy, segmental deletions and insertions (segmental indels), or rearrangement [46].

Though numerous mapping procedures have been proposed in recent years, few objective criteria have been suggested for quantitatively assessing them. Early methods relied mainly on gene annotation for building the mappings: for example, [7] was based on P-quasi grouping, [810] tried to identify contiguity and gene clusters, [11] used an alignment-like approach, and [12] relied on gene correspondence. Later methods utilized the increase in the available genomic sequences resulting from advances in genome assembly techniques [13]: examples are BlastZ [14] which was applied on the mouse and human genomes [15], and the genome-rearrangement approaches in [1619]. Methods that addressed bacterial genomes include Mauve [20], its predecessor GRIL [21], MAGIC [22] (see Section 2 for details), and [23]. Methods were recently developed to evaluate the accuracy of alignments on the whole genome level [2]. In contrast, we describe two methods that evaluate the quality of the mapping as a whole, including its induced segmentation of the genomes and the inferred relations between the resulting fragments (Figure 1).

We present two families of simple, biologically intuitive measures for quantitatively assessing the quality of bacterial genome mapping. The first family relies on the assumption that evolutionary changes are not likely to disrupt genes; hence, a better genome mapping should have fewer gene disruptions and its induced segmentation should show a better fit to known gene annotations. The second family uses two factors: segmentation size (the number of fragments in the mapping), and whole genome conserved percentage (the number of exact base matches in the global alignment of the mapping's induced fragments divided by the genome size). Clearly, a mapping with more fragments will have a higher conserved percentage.

The power of the measures is demonstrated by applying them to the results of the genome mapping tools MAGIC [22] and Mauve [20] on pairs of bacterial species. Both MAGIC and Mauve were designed specifically for bacterial genome mapping. (Tools designed for eukaryotic genome mapping—for example, CHAIN-NET [15], FISH [24], GRIMM-Synteny [19], SLAGAN [16], TBA [25], and the more recent approach of [18]—are geared toward handling much larger genomes possessing extremely different characteristics.) As we will show, the measures are capable of discriminating between the results of the mapping tools, and providing quantitative estimates on their performance.

2. Materials and Methods

2.1. Mapping Tools

The following is an overview of MAGIC and Mauve. Detailed descriptions are given in [20, 22] and on the websites for MAGIC and for Mauve. MAGIC C/C++ implementation version and Mauve version were used (the most recent versions at the time the comparisons were performed).

A brief sketch of the two algorithms. Procedures for mapping between genomes can be conceptually divided into two phases: a pre-processing phase aimed at finding maximal local similarities between the genomes, and a mapping phase aimed at inferring one-to-one correspondences out of these similarities. In MAGIC, a linear pipeline of global and local alignments is used to compute a comprehensive set of maximal similar regions in the two genomes. This phase is initialized with a set of anchors between the two genomes. Normally, MAGIC uses annotated genes as anchors, but here anchors were provided from Mauve (see below) in order to avoid bias in the assessment scores. The result of the pre-processing phase is then iteratively clustered into reorder-free (RF) regions, while resolving conflicts between its different entries based on contextual hints. In Mauve, the pre-processing phase calculates maximal unique matches (MUMs) and filters them according to their lengths. Then, in the mapping phase, overlaps between the MUMs are resolved in a pairwise fashion, and locally collinear blocks (LCBs) are calculated based on iterative breakpoint analysis. Out of the final set of LCBs, Mauve calculates its backbones—one-to-one LCB correspondences containing no big gaps. These backbones were used here also as anchors for MAGIC.

MAGIC and Mauve use different approaches to filtering mobile DNA elements or mobilome [26, 27]. Mobile DNA content is high in some of the compared bacteria, reaching up to (e.g., in Streptococcus pyogenes [22]). To make handling mobile DNA as comparable as possible, MAGIC's mobile DNA filtering step was deactivated, and its length threshold for discarding entries at the beginning of the mapping phase was increased from 200 bp to 1000 bp (see [22] for details). This change is disadvantageous to MAGIC as it forces it to deviate from the native settings. In addition, both MAGIC and Mauve were run with their default parameters. On average, MAGIC's run takes seconds, while Mauve's takes seconds.

2.2. Gene Annotation and Gene Disruptions (GDs)

Gene annotations for the different prokaryotic organisms are obtained through KEGG (Kyoto Encyclopedia of Genes and Genomes) [28].

To reduce the sensitivity of the GD scores to gene end annotation errors, we counted a breakpoint induced from the mapping as disrupting a gene only if it was located inside the gene and at a considerable distance ( 10% of the gene’s length) from its end.

2.3. Conserved Percentages

The end result of MAGIC and Mauve is a mapping between rearrangement-free segments in one genome to their counterparts in the second genome. To calculate conserved percentages for our purposes, these segments were globally aligned in a post-processing step and orthologous segments were identified using the procedure for calculating backbones described in [20]. The conserved percentage was defined as the number of base matches in these fragments divided by the size of the genome (Section 3.2).

2.4. Statistical Tests

The differences observed in the measures defined in Section 3 are quantified by two statistical tests: the one-sided sign (binomial) test [29] and the one-sided Wilcoxon signed-rank test [30] (see also [31] for more details). The null hypothesis for these tests states that the results of Mauve are at least as good as those of MAGIC. The significance level is set to for both tests.

3. Results

We present each measure and its results. More details about the mapping tools, gene annotations, and statistical tests can be found in Section 2.

3.1. Gene Annotation-Based Measure

Given a mapping between two genomes, we used the induced segmentation in each of the genomes to evaluate the quality of the mapping. Each of the two segmentations was assigned a Gene Disruption (GD) score denoting how many genes the segmentation disrupts. We say that a segmentation disrupts a gene if it has a segment end within the gene that is sufficiently far from the gene's ends (see Section 2.2). The GD-score of the mapping is defined as the sum of the GD-scores of both segmentations.

The GD-scores for the pairs (Table 1) are summarized in Figure 2(a). In pairs, either MAGIC or Mauve was assigned a non-zero GD-score. MAGIC did not disrupt any genes in five pairs, and Mauve in four. MAGIC's GD-scores ranged from to , while Mauve's GD-scores ranged from to . MAGIC's score was lower than or equal to Mauve's on all pairs: MAGIC had lower scores in pairs, and in the four remaining pairs both methods were assigned a score of . The results were significantly in favor of MAGIC ( -value ; sign test). The mean GD-score values were and for MAGIC and Mauve, respectively; the difference is statistically significant ( -value ; Wilcoxon test).

Pair Organism Size MAGIC Mauve
Seg. Ratio Seg. Ratio

1Anaplasma marginale St. Maries119768774 0.35 940.37
Anaplasma phagocytophilum HZ1471282

2Bacillus cereus ATCC 145795411809630.77 2600.77
Bacillus cereus E33L (zebra killer)5300915

3Blochmannia floridanus70555710.6710.67
Blochmannia pennsylvanicus791654

4Bacteroides fragilis YCH46527727458 0.87 1460.88
Bacteroides fragilis NCTC 93435205140

5Bartonella henselae Houston-1193104712 0.73 370.74
Bartonella quintana Toulouse1581384

6Bordetella pertussis Tohama I4086189147 0.79 1890.8
Bordetella bronchiseptica RB505339179

7Buchnera aphidicola APS64068110.76 20.76
Buchnera aphidicola Sg641454

8Campylobacter jejuni subsp. jejuni NCTC 11168164148130.9 80.9
Campylobacter jejuni RM12211777831

9Clostridium perfringens 1330314309 0.82 940.83
Clostridium perfringens SM1012897393

10Cyanobacteria bacterium Yellowstone A-Prime (Synechococcus sp. JA-3-3Ab)2932766330 0.67 5610.71
Cyanobacteria bacterium Yellowstone B-Prime (Synechococcus sp. JA-2-3 a(2-13))3046682

11Dehalococcoides ethenogenes 195146972020 0.7 470.71
Dehalococcoides sp. CBDB11395502

12Ehrlichia ruminantium Welgevonden (South Africa)151635520.96 130.96
Ehrlichia ruminantium Gardel1499920

13Francisella tularensis subsp. tularensis SCHU S41892819540.96 670.96
Francisella tularensis subsp. holarctica OSU181895727

14Haemophilus influenzae Rd KW20 (serotype d)1830138130.87 340.87
Haemophilus influenzae 86-028NP (nontypeable)1913428

15Helicobacter pylori 266951667867250.87 670.87
Helicobacter pylori J991643831

16Listeria monocytogenes EGD-e (serotype 1/2a)2944528140.79 470.79
Listeria innocua CLIP 11262 (serotype 6a)3011208

17Legionella pneumophila Lens3345687140.86 640.86
Legionella pneumophila Paris3503610

18Mycoplasma genitalium G-3758007660.56 130.56
Mycoplasma pneumoniae M129816394

19Mycoplasma hyopneumoniae 232892758160.94 370.94
Mycoplasma hyopneumoniae 7448920079

20Mycobacterium tuberculosis H37Rv, laboratory strain441153230.99 100.99
Mycobacterium tuberculosis CDC1551, clinical strain4403837

21Neisseria meningitidis MC58 (serogroup B)227235114 0.83 3210.87
Neisseria meningitidis Z2491 (serogroup A)2184406

22Nitrobacter winogradskyi Nb-2553402093109 0.52 3010.53
Nitrobacter hamburgensis X144406967
23Psychrobacter arcticum 273-42650701270.69 1480.69
Psychrobacter cryohalolentis K53059876

24Pseudomonas fluorescens Pf-57074893182 0.54 3910.55
Pseudomonas fluorescens PfO-16438405

25Prochlorococcus marinus SS120 (subsp. marinus CCMP1375)175108032 0.46 840.48
Prochlorococcus marinus MIT 93121709204

26Rhodopseudomonas palustris CGA0095459213170 0.59 3920.61
Rhodopseudomonas palustris HaA25331656

27Rickettsia prowazekii Madrid E111152316 0.69 270.7
Rickettsia felis URRWXCal21485148

28Streptococcus agalactiae 2603 (serotype V)2160267120.9 200.9
Streptococcus agalactiae A909 (serotype Ia)2127839

29Staphylococcus aureus subsp. aureus N315, meticillin-resistant (MRSA)2814816130.93 830.93
Staphylococcus aureus subsp. aureus MW22820462

30Shigella flexneri 301 (serotype 2a)4607203220.98 380.98
Shigella flexneri 2457T (serotype 2a)4599354

31Staphylococcus haemolyticus JCSC1435268501595 0.49 1750.51
Staphylococcus saprophyticus subsp. saprophyticus ATCC 153052516575

32Streptococcus pneumoniae TIGR4216084280.92 1100.92
Streptococcus pneumoniae R62038615

33Streptococcus pyogenes MGAS8232 (serotype M18)189501723 0.91 1300.93
Streptococcus pyogenes SSI-1 (serotype M3)1894275

34Streptococcus thermophilus CNRZ106617962266 0.96 310.97
Streptococcus thermophilus LMG183111796846

35Salmonella enterica serovar Typhi CT18480903740.99 80.99
Salmonella enterica serovar Typhi Ty24791961

36Synechococcus sp. WH 810224344281880.11 3230.11
Synechococcus elongatus PCC63012696255

37Thermus thermophilus HB2718948777 0.92 320.93
Thermus thermophilus HB81849742

38Tropheryma whipplei Twist92730320.98 370.98
Tropheryma whipplei TW08/27925938

39Xanthomonas campestris pv. campestris ATCC 339135076188570.68 2540.68
Xanthomonas campestris pv. vesicatoria5178466

40Xylella fastidiosa 9a5c2679306250.86 2660.86
Xylella fastidiosa Temecula12519802

41Yersinia pestis CO92465372823 0.87 480.99
Yersinia pestis KIM4600755

3.1.1. Dependency of Scores on Segmentation Size

Because a mapping with more segments can disrupt more genes, we analyzed the GD-score dependency on the segmentation size (the number of fragments in the mapping) by first normalizing the score according to the segmentation size, and then linearly fitting the score to the segmentation size.

Figure 2(b) presents the results for the normalization. Here, the scores were divided by the segmentation size. The normalized GD-scores of Magic and Mauve ranged from to . (The maximum possible value of occurs if two mapped segments disrupt two genes in each of the genomes.) MAGIC had lower normalized scores in pairs, while Mauve had lower ones in pairs. MAGIC's advantage was significant ( -value , sign test). The mean normalized GD-scores of MAGIC and Mauve were and , respectively; the difference is significant ( -value , Wilcoxon test).

Figure 3 shows a linear fitting of the scores to the segmentation sizes. The linear fitting is constrained to pass through the origin ( segments imply score). The estimated slopes are and for MAGIC and Mauve, respectively ( and , resp.).

3.2. Segmentation-Matching Based Measure

This measure assesses the coverage of the mapping and its accuracy at the single base level. The conserved percentage of a genome is defined as the number of exact base matches in the segments' alignments—as dictated by the mapping—divided by the genome size. The conserved percentage of a pair of genomes is the mean of their conserved percentages. Increasing the number of segments in both genomes allows more freedom in the correspondence between them, which can improve their conserved percentage. Hence, a mapping with both a smaller segmentation size and a greater conserved percentage is deemed superior.

Table 1 gives the segmentation sizes and conserved percentages obtained for MAGIC and Mauve. Figure 4 plots, for each pair, the difference between the values of Mauve and MAGIC for the two criteria. The average MAGIC and Mauve segmentation sizes were and respectively. There are pairs on which MAGIC dominates Mauve, and one pair on which both methods reported equal results. On the rest of the pairs, MAGIC reported a smaller segmentation size and a smaller conserved percentage. On these pairs, the difference in the segmentation size could be as high as in favor of MAGIC (average of ). The difference in conserved percentages, on the other hand, could be as high as in favor of Mauve (average of ).

Figure 5 gives the conserved percentages divided by the segmentation sizes. MAGIC's ratios were greater than Mauve's in pairs, a statistically significant result ( -value , sign test). Their means were and , respectively, a significant difference ( -value ; Wilcoxon test).

Finally, to further investigate cases where neither method dominated the other, we artificially constrained MAGIC to output a mapping as close in size as possible to Mauve's on the nonconclusive pairs. In this exercise, MAGIC dominated Mauve on pairs and Mauve dominates MAGIC on pairs (out of the nonconclusive pairs). In total, MAGIC dominated Mauve on pairs (along with the previous pairs), a result that is statistically significant ( -value , sign test).

4. Discussion

We presented two types of measures for assessing genome mapping results. Both are very simple, biologically intuitive, and easy to compute. Unlike other evaluation methods that try to estimate the accuracy of genome alignments, such as [2], the criteria presented here aim to provide global measures of the mapping quality. As the results show, the measures consistently discriminate between the two mapping methods that we tested, favoring MAGIC over Mauve.

The GD-score is a function of the segmentation size and the imprecision of the mapping tool: (the higher the segmentation size or the imprecision, the higher the GD-score). The imprecision is a property of the mapping tool, and is independent of the compared pair. It reflects the tool's tendency to either miscalculate fragment ends or to report erroneous correspondences between segments. The segmentation size , on the other hand, depends on both the tool and the compared pair. Though and are directly measurable, is hidden. Estimating it is possible thanks to the reasonable linear fit between the GD-score and the segmentation size, which implies that can be well approximated with a linear function. The estimation is then carried out by normalizing the GD-score and by linear regression. In general, the GD-score may also be affected by minor gene end annotation errors. This effect, however, was minimized by counting only gene disruptions that occur considerably inside the annotated gene region (Section 2), which also increases the tolerance of the measure to subtle mapping errors near the fragment ends.

Our results on the benchmark set of bacteria pairs suggest that the GD-score is capable of discriminating between the two tools: MAGIC reports lower regular and normalized GD-scores (both with statistical significance) and also has a smaller slope in the linear fit.

The GD-scores linear fitting and the normalization results indicate that MAGIC and Mauve disruption rates (genes disrupted per segment) are below , compared to a theoretical maximum of . Given the high density of genes in bacterial genomes (after the correction discussed in Section 2, genes encompass more than of the studied genomes), the disruption rate under a random model assumption is expected to be greater than . Thus, the GD-scores indicate that, as expected, the results of both MAGIC and Mauve are better than random.

The segmentation-matching measure provides an additional estimate of the quality of the mapping. Unlike the GD-score, it requires no external information other than the mapping. Yet, like the GD-score, it reflects the imprecision of the mapping tool. Here, inaccuracy in identifying the fragment ends results in a smaller conserved percentage, while an erroneous correspondence between segments increases the segmentation size. For pairs where the measure provides no clear preference for one of the two compared tools, we suggest dividing the conserved percentage by the segmentation size. This ratio reflects the imprecision of the mapping, as it decreases when additional inaccuracies are introduced at fragment ends or when erroneous correspondences are made.

The dominance criterion for this measure, that is, favoring mappings with smaller segmentation size and greater conserved percentage, is fulfilled by MAGIC on pairs. On one pair both methods report equal results, and on the remaining pairs the results are not conclusive: MAGIC has both smaller segmentation size (a difference of 78 on average) and smaller conserved percentage ( on average). When the conserved percentage is divided by the segmentation size, MAGIC fares better in out of the pairs (with statistical significance). When MAGIC is constrained to output a mapping of size as close as possible to Mauve's, analysis of the nonconclusive pairs leads to similar conclusions. This observation is notable, since the constraint is expected to be disadvantageous for MAGIC as it forces MAGIC to use an inferior configuration of parameters compared to its default settings.

Since the GD-scores rely explicitly on gene annotations, the evaluated methods should not depend (implicitly or explicitly) on gene annotation for building the genome mapping. For this reason, instead of using gene annotations of KEGG orthologs as seeds in MAGIC, all the above analyses used the Mauve seeds instead. In fact, MAGIC's performance improves according to all the above criteria if its default seeds are used (results not shown). Mauve backbones are fed as initial anchors to MAGIC, further demonstrating that MAGIC's calculated mapping has better quality than its input mapping, in agreement with observations made in [22].

Our main goal here was to define and test some basic measures for evaluating mapping quality. We tested and demonstrated these measures on two mapping tools, but they can be readily used to compare other algorithms. We hope that the availability of established quality measures will advance the important challenge of genome-wide mapping.


This work was supported by a postdoctoral fellowship of the Edmond J. Safra Bioinformatics Program to FS, and also by the Raymond and Beverly Sackler Chair in Bioinformatics and the Israel Science Foundation (Grant no. / ) to RS.


  1. G. Bourque, E. M. Zdobnov, P. Bork, P. A. Pevzner, and G. Tesler, “Comparative architectures of mammalian and chicken genomes reveal highly variable rates of genomic rearrangements across different lineages,” Genome Research, vol. 15, no. 1, pp. 98–110, 2005. View at: Publisher Site | Google Scholar
  2. A. Prakash and M. Tompa, “Measuring the accuracy of genome-size multiple alignments,” Genome Biology, vol. 8, no. 6, article R124, 2007. View at: Publisher Site | Google Scholar
  3. D. Sankoff, “Rearrangements and chromosomal evolution,” Current Opinion in Genetics and Development, vol. 13, no. 6, pp. 583–587, 2003. View at: Publisher Site | Google Scholar
  4. W. M. Fitch, “Distinguishing homologous from analogous proteins,” Systematic Zoology, vol. 19, no. 2, pp. 99–113, 1970. View at: Google Scholar
  5. W. M. Fitch, “Homology: a personal view on some of the problems,” Trends in Genetics, vol. 16, no. 5, pp. 227–231, 2000. View at: Google Scholar
  6. E. V. Koonin, “Orthologs, paralogs, and evolutionary genomics,” Annual Review of Genetics, vol. 39, pp. 309–338, 2005. View at: Publisher Site | Google Scholar
  7. W. Fujibuchi, H. Ogata, H. Matsuda, and M. Kanehisa, “Automatic detection of conserved gene clusters in multiple genomes by graph comparison and P-quasi grouping,” Nucleic Acids Research, vol. 28, no. 20, pp. 4029–4036, 2000. View at: Google Scholar
  8. R. Overbeek, M. Fonstein, M. D'Souza, G. D. Push, and N. Maltsev, “The use of gene clusters to infer functional coupling,” Proceedings of the National Academy of Sciences of the United States of America, vol. 96, no. 6, pp. 2896–2901, 1999. View at: Publisher Site | Google Scholar
  9. R. Overbeek, M. Fonstein, M. D'Souza, G. D. Pusch, and N. Maltsev, “Use of contiguity on the chromosome to predict functional coupling,” In Silico Biology, vol. 1, no. 2, pp. 93–108, 1999. View at: Google Scholar
  10. I. B. Rogozin, K. S. Makarova, J. Murvai et al., “Connected gene neighborhoods in prokaryotic genomes,” Nucleic Acids Research, vol. 30, no. 10, pp. 2212–2223, 2002. View at: Google Scholar
  11. Y. I. Wolf, I. B. Rogozin, A. S. Kondrashov, and E. V. Koonin, “Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context,” Genome Research, vol. 11, no. 3, pp. 356–372, 2001. View at: Publisher Site | Google Scholar
  12. M. Kellis, N. Patterson, B. Birren, B. Berger, and E. S. Lander, “Methods in comparative genomics: genome correspondence, gene identification and regulatory motif discovery,” Journal of Computational Biology, vol. 11, no. 2, pp. 319–355, 2004. View at: Publisher Site | Google Scholar
  13. N. Nagarajan, T. D. Read, and M. Pop, “Scaffolding and validation of bacterial genome assemblies using optical restriction maps,” Bioinformatics, vol. 24, no. 10, pp. 1229–1235, 2008. View at: Publisher Site | Google Scholar
  14. S. Schwartz, W. J. Kent, A. Smit et al., “Human-mouse alignments with BLASTZ,” Genome Research, vol. 13, no. 1, pp. 103–107, 2003. View at: Google Scholar
  15. W. J. Kent, R. Baertsch, A. Hinrichs, W. Miller, and D. Haussler, “Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes,” Proceedings of the National Academy of Sciences of the United States of America, vol. 100, no. 20, pp. 11484–11489, 2003. View at: Publisher Site | Google Scholar
  16. M. Brudno, S. Malde, A. Poliakov et al., “Glocal alignment: finding rearrangements during alignment,” Bioinformatics, vol. 19, supplement 1, pp. 54i–62i, 2003. View at: Publisher Site | Google Scholar
  17. C. Chauve and E. Tannier, “A methodological framework for the reconstruction of contiguous regions of ancestral genomes and its application to mammalian genomes,” PLoS Computational Biology, vol. 4, no. 11, article e1000234, 2008. View at: Publisher Site | Google Scholar
  18. C. Lemaitre, E. Tannier, C. Gautier, and M.-F. Sagot, “Precise detection of rearrangement breakpoints in mammalian chromosomes,” BMC Bioinformatics, vol. 9, no. 1, article 286, 2008. View at: Publisher Site | Google Scholar
  19. P. A. Pevzner and G. Tesler, “Genome rearrangements in mammalian evolution: lessons from human and mouse genomes,” Genome Research, vol. 13, no. 1, pp. 37–45, 2003. View at: Google Scholar
  20. A. E. Darling, B. Mau, F. R. Blattner, and N. T. Perna, “Mauve: multiple alignment of conserved genomic sequence with rearrangements,” Genome Research, vol. 14, no. 7, pp. 1394–1403, 2004. View at: Publisher Site | Google Scholar
  21. A. E. Darling, B. Mau, F. R. Blattner, and N. T. Perna, “GRIL: genome rearrangement and inversion locator,” Bioinformatics, vol. 20, no. 1, pp. 122–124, 2004. View at: Publisher Site | Google Scholar
  22. F. Swidan, E. P. C. Rocha, M. Shmoish, and R. Pinter, “An integrative method for accurate comparative genome mapping,” PLoS Computational Biology, vol. 2, no. 8, article e75, 2006. View at: Google Scholar
  23. F. Lemoine, O. Lespinet, and B. Labedan, “Assessing the evolutionary rate of positional orthologous genes in prokaryotes using synteny data,” BMC Evolutionary Biology, vol. 7, no. 1, article 237, 2007. View at: Publisher Site | Google Scholar
  24. P. P. Calabrese, S. Chakravarty, and T. J. Vision, “Fast identification and statistical evaluation of segmental homologies in comparative maps,” Bioinformatics, vol. 19, no. 90001, supplement 1, pp. 74i–80i, 2003. View at: Publisher Site | Google Scholar
  25. M. Blanchette, W. J. Kent, C. Riemer et al., “Aligning multiple genomic sequences with the threaded blockset aligner,” Genome Research, vol. 14, pp. 708–715, 2004. View at: Google Scholar
  26. L. S. Frost, R. Leplae, A. O. Summers, and A. Toussaint, “Mobile genetic elements: the agents of open source evolution,” Nature Reviews Microbiology, vol. 3, pp. 722–732, 2005. View at: Publisher Site | Google Scholar
  27. E. V. Koonin and Y. I. Wolf, “Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world,” Nucleic Acids Research, vol. 36, no. 21, pp. 6688–6719, 2008. View at: Google Scholar
  28. M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno, and M. Hattori, “The KEGG resource for deciphering the genome,” Nucleic Acids Research, vol. 32, no. 90001, pp. D277–D280, 2004. View at: Google Scholar
  29. C. J. Clopper and E. S. Pearson, “The use of confidence or fiducial limits illustrated in the case of the binomial,” Biometrika, vol. 26, pp. 404–413, 1934. View at: Google Scholar
  30. F. Wilcoxon, “Individual comparisons by ranking methods,” Biometrics, vol. 1, pp. 80–83, 1945. View at: Google Scholar
  31. M. Hollander and D. A. Wolfe, Nonparametric Statistical Methods, John Wiley & Sons, New York, NY, USA, 1999.

Copyright © 2009 Firas Swidan and Ron Shamir. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

More related articles

790 Views | 452 Downloads | 4 Citations
 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

We are committed to sharing findings related to COVID-19 as quickly as possible. We will be providing unlimited waivers of publication charges for accepted research articles as well as case reports and case series related to COVID-19. Review articles are excluded from this waiver policy. Sign up here as a reviewer to help fast-track new submissions.