Abstract

The tempo and mode of evolutionary change during speciation have remained contentious until recently. While much of the evidence claiming speciation is an abrupt and rapid process comes from fossil data, recent molecular phylogenetics show that the background of gradual evolution is often broken by accelerated rates of molecular evolution during speciation. However, what kinds of genes affect or are affected by speciation remains unexplored. Our analysis of 4843 protein-coding genes in five species of the Drosophila melanogaster subgroup shows that while ~70% of genes follow clock-like evolution, between 17–19.67% of loci show signatures of accelerated rates of evolution in recently formed species. These genes show 2-3-fold higher rates of substitution in recently diverged species compared to older species. This fraction of loci affects a diverse range of functions. Only a small proportion of reproductive genes experience speciation-related accelerated changes but many sex-and -reproduction related genes show an interesting pattern of persistent rapid evolution suggesting that sex-and-reproduction related genes are under constant selective pressures. The identification of loci associated with accelerated evolution allows us to address the mechanisms of rapid evolution and speciation, which in our study appears to be a combination of both selection and rapid demographical changes.

1. Introduction

The tempo and mode of evolutionary change during speciation have remained a contentious issue for more than five decades. Evidence for abrupt and rapid changes during speciation came from fossil evidence but lacked mechanistic explanations for such a process [17]. Since the mid-1970s, molecular phylogenetic studies began to associate increased genetic changes with speciation events suggesting that rates of molecular evolution might be altered during speciation [811]. This trend has been recently confirmed by molecular phylogenies using large numbers of genes and a wide variety of taxa, explicitly showing that speciation is accompanied by accelerated rates of molecular evolution [1215].

What remains unknown is the mode of change during speciation, that is, the kind of genetic changes associated with speciation. More specifically, the numbers and kinds of protein coding genes that change during speciation remain unexplored. Accelerated evolution is only observed in a fraction of genes analyzed by Pagel et al. [13] as well as in other genome-wide estimates of molecular divergences [1618]. A more systematic search to identify protein-coding genes that experience accelerated evolution during speciation will allow us to directly address the mode of evolutionary change during speciation, that is, divergences, in what kinds of genes, affect or are affected by speciation.

Taxa-wide evidence of sex- and reproduction-related genes evolving rapidly in sibling species [19, 20] has raised the possibility that sex-related genes might preferentially experience elevated rates of evolution during the speciation process, or may even drive speciation by causing reproductive isolation among diverging populations. This framework is supported by the fact that almost all candidate “speciation” genes identified, so far, are mainly sex related (with the exception of a few genes with other functions), evolve rapidly between closely related species, often show signatures of adaptive evolution and have been invoked in the rapid evolution of hybrid sterility in different organisms [2132]. After controlling for incomplete lineage sorting in the melanogaster subgroup, we put this framework to test by analyzing 4843 protein coding genes in the Drosophila transcriptome for signatures of speciation-related accelerated changes.

Divergence trends between recently formed species and old species provide a proxy to detecting signatures of speciation-related accelerated changes. Molecular divergences are expected to be proportional to the duration of a species’ existence; newly formed species will have accumulated less molecular divergence compared to older species. Signatures of speciation-related accelerated evolution will be manifested as relatively higher rates of molecular evolution in newly formed species compared to species with longer postspeciation divergence times. Accordingly we asked the following questions: (1) are rates of molecular evolution in protein-coding genes affected by speciation events? That is, do genes show unexpectedly elevated rates of evolution in newly formed species relative to older species? (2) do sex-related genes preferentially exhibit accelerated speciation-related changes relative to non-sex related genes? And (3) do genes with accelerated rates of evolution show evidence of positive selection?

2. Methods

2.1. Rationale

If evolutionary rates of protein coding genes are not affected by speciation and evolve at a constant rate, we should be able to find a correlation between the length of time that species have diverged and the proportion of molecular divergence between these species. Species diverged for longer lengths of time would have accumulated proportionally higher amounts of genetic changes relative to species diverged for shorter periods of time. Conversely, if rates of molecular evolution were indeed affected by speciation, then this correlation will be broken and newer species might show relatively higher evolutionary rates compared to older species. Our analyses therefore primarily exploit the nature of molecular changes since divergence of species pairs (recent versus older) to infer speciation-related accelerated changes.

2.2. The Phyletic System

We used a phyletic system comprised of three pairs of closely related species from the melanogaster subgroup D. simulans and D. sechellia diverged about 0.3–0.6 Mya [33], D. melanogaster and D. simulans diverged about 4.3–6.5 Mya, and D. yakuba and D. erecta diverged about 8.1–12.7 Mya [34]. Divergence times for D. simulans-D. melanogaster and D. yakuba-D. erecta were recently reestimated by using over 2977 nuclear genes and by implementing a novel genomic-mutation distance approach correcting for codon bias [34] whereas the divergence times for the D. simulans-D. sechellia split were estimated using a small number of genes [33]. All of these divergence dates broadly concur with other independent estimates using mitochondrial loci (reviewed in Powell 1997), and there is also some general concordance with what little paleontological evidence is available for Drosophila (see [34]). We also used D. pseudoobscura and D. persimilis from the D. obscura group diverged for about 0.85 Mya [34]. These two species belong to the Obscura group that diverged 55 Mya from the melanogaster group [34], which brings about problems of saturation in and gene expression differences [35]. Nevertheless, this species pair diverged for roughly the same amount of time as D. simulans-D. sechellia, a comparison of rates of protein evolution between the two species pairs will be useful to determine if acceleration in rates of evolution is commonly found among newly derived species.

2.3. Estimating Differences in Rates of in relation to Speciation

According to the divergence times recently reported by Tamura et al. [34], D. erecta and D. yakuba have diverged for an estimated length of time that is ~2-3 times greater than the divergence time between D. melanogaster and D. simulans. Given the neutrally expected linear relationship between genetic divergence and time, we should therefore expect ~2-3 times greater divergence in genes between D. yakuba and D. erecta relative to D. melanogaster and D. simulans. We calculated the expected rate of molecular divergence (rate of nonsynonymous mutations per nonsynonymous sites and rate of synonymous mutations per synonymous sites ) using the relationship given below. Since our interest ultimately was to determine differences in rates of protein evolution between the different species pairs we focused more on . The ratio of nonsynonymous divergence () between two-species pairs, a recently diverged species pair 1 and an older species pair 2, must be proportional to the ratio of their divergence times (), such that: where, for a given species pair, = divergence time of the more recently diverged pair and = divergence time of the species with longer post-speciation divergence time. In our data, for example, = 4.3–6.5 Mya (mel-sim) and = 8.1–12.7 Mya (yak-ere), or, = 0.4–0.6 Mya (sim-sec) and = 8.1–12.7 Mya (yak-ere) (see Figure 1). We similarly calculated the ratio of synonymous divergence to species divergence time in all species pairs. Because of existing uncertainties in the divergence time estimates, particularly for the newer species [36, 37], we worked with the estimated range of divergence times for each of the species pairs by incorporating the published upper and lower limits of divergence time estimates from Tamura et al. [34]. For instance, in comparisons between D. melanogaster-D. simulans versus D. yakuba-D. erecta pairs, the upper divergence-time limit represents a scenario where D. melanogaster and D. simulans diverged 4.3 Mya and D. yakuba and D. erecta diverged 12.7 Mya. Similarly, the lower divergence time limit represents a scenario where D. melanogaster and D. simulans diverged 6.5 Mya, and D. yakuba and D. erecta diverged 8.1 Mya. When log2 () is plotted against log2 (), genes evolving in a clock-like manner (i.e., ) would fall within these divergence time-ranges. Genes that fall above the divergence time-ranges indicate accelerated evolution in the newer species and those falling below the lower limit are considered to evolve slowly in the newly formed species or to have much higher rates of evolution in the older species lineage.

Comparisons using the D. simulans-D. sechellia species pair grossly overestimated the number of genes in the accelerated rate category (94.89% of the genes fall under the accelerated category Figure 3(a)). This is an unlikely scenario given that the range of falls well above the divergence times plotted in this graph. We believe that this is most likely due to a gross underestimation of the divergence time between D. simulans and D. sechellia. The phylogenetic relationships in the simulans triad (D. simulans, D. mauritiana, and D. sechellia) have been contentious and unresolved [35] and the sole source of recent divergence times estimates using molecular data comes from Kliman et al.’s study [33], which used a small number of genes. We took a very generalized approach to reevaluate the divergence time for the D. simulans-D. sechellia split using estimated from our dataset (Figure 2) in order to replot Figure 3(a). Relative to nonsynonymous divergence, we expect the ratio of synonymous divergence-to-species divergence times for most loci to fall within the clock-like category for most species comparisons (as it did in the previous comparison). We therefore applied the boundaries conforming to the clock-like category from D. melanogaster-D. simulans versus D. yakuba-D. erecta comparisons (Figure 2) to the D. simulans-D. sechellia species pair. We were able to arrive at a generalized estimate of 1.16–3.05 Mya for the D. simulans-D. sechellia split which appears more likely (Kumar 2007, pers comm.). We therefore employed a slightly more stringent approach by applying 95% confidence intervals to the existing divergence boundaries (using ratios), which consequently incorporated previous outliers (Table 1).

2.4. Sequences and Rate Analyses

All sequences (D. melanogaster, D. simulans, D. sechellia, D. yakuba, D. erecta, D. pseudoobscura, and D. persimilis) were obtained from the recently sequenced genomes available on FlyBase (Drosophila 12 Genomes consortium 2007, www.flybase.org). Sequences were aligned according to the corresponding protein alignment using CLUSTALW ver. 1.8 [38]. In order to remove any potential bias due to incomplete lineage sorting effects [39, 40], for each gene we compared the likelihood of trees differing in the placement of D. yakuba and D. erecta using PAML and restricted our analysis to genes for which the best tree involved D. yakuba and D. erecta as sister species. Pairwise estimates of and were determined using the program codeml in PAML [41]. Estimates of , , and ω along each lineage using branch site models and outputs of models M7 (neutral) versus M8 (positive selection) were also determined using PAML [42] and retrieved from a recent genome-wide analysis [43]. We were able to compute estimates of divergence to time ratio (see above) for 4843 orthologs between D. melanogaster-D. simulans versus and D. yakuba-D. erecta, and 4327 orthologs between D. simulans-D. sechellia versus D. yakuba - D. erecta comparisons as well as for 3988 genes from D. pseudoobscura-D. persimilis versus all melanogaster species pairs.

2.5. Classification of Genes according to Site of Expression

We classified genes according to their tissues of expression (testis, ovary, and head) by using the NCBI EST database (NCBI, www.ncbi.nlm.nih.gov/UniGene/). Genes that could not be classified into any tissue category were referred as unspecified. Based on EST data, we were able to classify tissue of expression for 3040 out of 4843 genes for comparisons involving D. melanogaster-D. simulans versus D. yakuba-D. erecta, 2686 genes for comparisons involving D. simulans-D. sechellia versus D. yakuba-D. erecta (Table S1).

3. Results

3.1. Accelerated Rates of Molecular Evolution in Newly Formed Species

Molecular divergence estimates, with respect to species divergence times, showed that protein coding genes fell into three distinct rate categories: (1) accelerated evolution in younger species: genes showing higher rates of molecular divergence in newly formed species relative to species diverged for longer lengths of time. (2) clock-like evolution: genes showing molecular divergence that corresponds to species divergence times, and (3) slow evolution in younger species: genes showing lower rates of molecular divergence in newly formed species in comparison to species with longer divergence times (Figure 2). Table 1 summarizes the fraction of genes that fall under each rate category for every species pair compared.

Most genes in our dataset (61–74%) fell into the clock-like rate category in all comparisons, indicating that evolutionary trajectories of most genes are unaffected by speciation events and their rates remained constant (Table 1). A small but discernable fraction of genes (17–19%) showed signatures of speciation-related accelerated evolution. Nonsynonymous divergences in these genes were 2-3-fold higher in newly formed species compared to older species (Table 1, Figures 2 and 3). A plot of and estimates between D. melanogaster-D. simulans versus D. yakuba-D. erecta shows that the distribution of estimates is quite distinct between the accelerated, clock-like, and slow-rate categories while the distribution of estimates is not (Figures 2(b) and 2(c)). This quite clearly indicates that elevated proportions of is not always accompanied by correspondingly elevated proportions of in genes with evidence of accelerated evolution, which is generally a sign of selection driven changes. We also obtained similar results using protein divergence estimates instead of the ratios of nucleotide divergence data See files in Supplementery Material available online at doi: 10.4061/2011/595121. About 8–19% of protein coding genes that fell into the slow rate category showed extremely low levels of nonsynonymous divergence in newly formed species compared to older species (e.g., Avg , Avg , Figures 2, 4, and 5). This may be indicative of genes that have remained conserved in the evolution of the recently diverged species but that have diverged substantially with time in the older lineages.

These results quite unambiguously indicate two important trends: firstly, despite variances associated with divergence estimates, the 2-3-fold higher nonsynonymous divergence in newly formed species compared to species with much longer divergence times clearly represents increases in rates of protein evolution either during or immediately after speciation. Secondly, accelerated rates of molecular evolution are most apparent in nonsynonymous divergence and not in synonymous divergence (see Table 1, Figures 2 and 4), a broadly accepted sign of selection driven changes.

3.2. Higher Representation of Sex-Related Genes in the Accelerated and Clock-Like Rate Categories

Identifying the range and, particularly, the kinds of loci in each rate category (accelerated, clock-like and slow) provides a starting point to broadly address the effects of demographic factors and selection during speciation. Demographic factors (drift, bottlenecks etc.) would affect a wide variety of loci whereas selection driven divergence would only affect specific functional classes of genes, such as sex-related genes which are expected to drive reproductive isolation in diverging populations [45, 46]. Due to lack of functional information for a large number of Drosophila genes, we used tissue of expression as a general and presumable indication of function (testis and ovary as reproductive tissues versus head, a presumably nonreproductive tissue). We tested the null hypothesis that genes expressed in each tissue-type are equally distributed within the accelerated, clock-like and slow rate categories. Because the expression data was determined using D. melanogaster data, only species from the D. melanogaster subgroup were analyzed. Given the long divergence time between the Obscura group and the melanogaster group (55 Mya), it is likely that patterns of gene expression may be radically different in D. pseudoobscura and D. persimilis.

In the D. melanogaster-D. simulans versus D. yakuba-D. erecta comparison, a significantly higher proportion of testis-specific genes occupy the accelerated and clock-like rate categories compared to the slow-rate category (χ²  testis specific = 8.08 and 6.78, df = 1, and and 0.0277, respectively, and a Bonferroni correction was applied, Table 2, [44]). Genes expressed in all three tissues—testis, ovary and head mostly fall in the accelerated category compared to both slow and clock-like rate categories (χ²  testis, ovary-and-head, = 5.96 df = 1, , χ² = 19.99, df = 1, , Bonferroni corrections were applied [44], Table 2). No significant differences were observed in the proportions of all other genes classes between rate categories.

In the D. simulans-D. sechellia/D. yakuba-D. erecta comparison, a significantly small proportion of testis-specific genes fall into the accelerated category compared to clock-like and slow rate categories (χ² = 10.04 and 13.91, df = 1, and , a Bonferroni correction was applied, Table 2).

3.3. “Persistence” of Rapid Evolution in Testis-Specific Genes over Time

That most testis-specific genes fall under the clock like rate category and only a small proportion fall in the accelerated rate category would appear to contradict earlier evidence that testis specific genes are in general evolving rapidly [20, 47, 48]. We hypothesized that a “persistence” of rapid rates of evolution in testis specific genes (rapid rates of evolution over time) will explain why testis-specific genes largely occupy the clock-like rate category. Pagel et al. [13] also found such persisting elevated rates of evolution in many lineages. Persistence of rapid rates of evolution can be verified if testis specific genes in both younger and older lineages have high substitution rates in both the accelerated and clock-like rate categories compared to other genes. To test this, we first looked for a global difference in and between tissue categories as well as between rate categories (Figure 5). We also performed more detailed analyses of differences in and between each tissue category using a Tuckey HSD test. Invariably, testis-specific genes show significantly higher and compared to genes in all other tissues in both younger and older species pairs ( for both, Figures 4(a), 4(b), and 4(c)) (additional file 3). This implies that testis genes in the clock-like category are evolving faster than other genes but at a constant rate, whereas those in the accelerated rate category have experienced elevated rates of evolution in response to stronger selection during or after speciation. This is an important trend as it reveals a tempo of molecular evolution in the different classes of genes.

3.4. Genes Evolving under Positive Selection in the Accelerated, Clock-Like, and Slow Rate Categories

We compared the proportion of genes showing evidence of positive selection (ω > 1.0) among the different rate categories by implementing site models, branch-site and branch models in PAML [42]. Applying models M7 versus M8, none of the rate categories showed overrepresentation of genes evolving positively (slow versus accelerated, χ² = 0.88, df = 1, , accelerated, slow versus clock-like χ² = 0.93 and 2.46, df = 1, and 0.35 after Bonferroni corrections, Table 3). Using the branch-site model however, we observed a significant over-representation of genes showing positive selection in D. simulans in the accelerated rate category compared to the clock-like and slow rate categories (χ² = 20.82 and 28.6, df = 1, and , respectively after Bonferroni correction). Branch model tests also show a greater proportion of genes with foreground ω > background ω in the accelerated rate category in the D. simulans branch and the branch leading to the D. melanogaster clade (Table 3). A list of genes detected to be evolving under positive selection in each lineage can be found in additional file 4. Several genes involved in sensory stimuli, immune response, gametogenesis (spermatogenesis), transcription regulation, and hybrid incompatibilities (including Hmr, [21] are amongst the genes that show relatively large ω estimates. Population genomic study of six D. simulans strains compared to D. melanogaster [49] found significant evidence of directional selection in genes affecting reproduction or spermatogenesis. Among the 1270 genes that show evidence of adaptive evolution from Singh [46], 505 are found in our comparison of D. melanogaster-D. simulans/D. yakuba-D. erecta. Among the 505 genes, 360 fall in the clock-like category, 108 in the accelerated category and 37 fall in the slow rate category. We detect a significant enrichment of genes under positive selection among the clock-like category χ² = 7.25, df = 1, ) while we found a significant paucity among the slow rate category (χ² = 40.05, df = 1, ). No significant effect is observed for the genes classified as rapidly evolving in the younger species (χ² = 2.72, df = 1, ).

3.5. Effect of Local Recombination Rates

Gene evolution may be influenced by their chromosomal location [50, 51] as well as by local recombination rates [43]. Using recombination rates in D. melanogaster computed by Hey and Kliman [52], we found significantly higher average recombination rates for genes in the accelerated rate category compared to the clock-like and slow categories in the D. melanogaster-D. simulans/D. yakuba-D. erecta comparison (Kruskal Wallis rank sum test, and , respectively, a Bonferroni correction was applied). There was no significant difference between the clock-like and slow rate categories (Kruskal Wallis rank sum test, , a Bonferroni correction was applied). The D. simulans-D. sechellia/D. yakuba-D. erecta comparison showed no such effect of recombination in all rate categories (Kruskal Wallis rank sum test, , 0.30 and for clock-like versus accelerated, accelerated versus slow, clock-like versus slow, respectively after Bonferroni correction).

4. Discussion

4.1. What Kinds of Genes Change during Speciation?

Despite recent evidence linking accelerated rates of molecular evolution to speciation events [12, 13], the kinds of protein coding genes that might experience accelerated rates of evolution during or immediately after speciation require investigation [15]. Are rates of evolution in all genes likely to be altered during or even after speciation? If not, what genes do show speciation associated changes? Answers to these questions will be crucial to understanding the mechanism(s) driving speciation and the molecular evolutionary consequences of speciation.

This study analyzes the ratio of molecular evolutionary rates to the ratio of species divergence times, therefore clock-like evolution should not be taken as a sign of gradual evolution; it only implies a constant rate of evolution. This is exemplified by the large representation of testis-specific genes in the clock-like rate category, which are in fact evolving much faster than other genes in the same category. Furthermore, Drosophila is one of those groups that lack detailed fossil records and incomplete taxon sampling is an obvious but unavoidable problem in this, or any study of this nature. Apart from extinct taxa, undiscovered and undescribed taxa are an additional problem. We are also lacking the genome sequence of D. mauritiana, a member of the D. simulans-D. sechellia-D. mauritiana triad as well as the genome of D. santomea which recently split from D. yakuba [53]. These factors are bound to affect true assessments of speciation-related changes in rates of molecular evolution. However, within the currently accepted phylogenetic network of the melanogaster subgroup, our study provides an approach to identify genes that change during speciation and out results report the fraction and identity of genes in the Drosophila transcriptome that show signatures of accelerated speciation-related changes that can be further investigated.

4.2. Evidence for Accelerated Evolution

In our study, evidence of speciation-related accelerated evolution is detected in a small but discernable fraction of protein coding genes. This corresponds to what Pagel et al. [13] found across taxa. But what is striking is that the acceleration is most apparent in nonsynonymous divergence. That dS in most genes fall under the clock-like category as opposed to dN is not entirely surprising, as we expect changes to be more sensitive to selection and may therefore show higher variance. Nevertheless, the 2-3 fold higher estimates of dN in the recently diverged species compared to species with much longer divergence times is a strict deviation from clock-like evolution despite the variance that may be involved. More importantly, our survey reveals that the more recently diverged species (D. simulans-D. sechellia) have a slightly greater proportion of genes that show accelerated rates of evolution (Table 1), further strengthening the case for a causal link between acceleration in rates of molecular evolution and speciation.

4.3. Evidence for Persistent Rapid Evolution in Sex Genes

Sexually reproducing organisms are influenced both by natural and sexual selection. Widespread rapid evolution of sex-related genes in Drosophila genomes [18] imply that regardless of how speciation occurred, sex and reproduction related genes that are causally involved in establishing reproductive isolation would be constantly under strong selection [46]. Therefore we would not expect to find an over-representation of sex-related genes in the accelerated rate category alone. This is specifically illustrated by the persistence of rapid rates of evolution in testis specific genes in our study. In addition, testis-specific genes and genes expressed in all three tissues are over-represented in the accelerated and clock-like rate category in the D. melanogaster-D. simulans pair (Table 2). But in the more recently diverged D. simulans-D. sechellia pair, only testis-specific genes are overrepresented in the clock-like and slow rate categories (Table 2). These results support a scenario where sex-related genes are under constant but higher selective pressure.

4.4. Factors Driving Accelerated Evolution during Speciation

We find no evidence of widespread positive selection in the accelerated rate category. The higher proportion of nonsynonymous divergence we observe in the accelerated rate category cannot be ruled out due to relaxed constraint or accumulation of slightly deleterious mutations [16, 54, 55]. The only tenable links to acceleration that we find appear to be influenced by local recombination rates (higher in the D. melanogaster-D. simulans comparison but lower in the D. simulans-D. sechellia comparison) and a relatively small proportion of genes with lineage specific positive selection (Table 3). Variation in recombination rates between species can be an important force driving divergence, as new allele combinations can be produced at different rates within species [56]. Rate of recombination also appears to be positively correlated with levels of polymorphism and high polymorphism would be expected to correlate with levels of divergence [57]. Nonetheless, sex-related genes involved in gametogenesis, hybrid incompatibilities as well as genes involved in metabolism and sensory functions do show evidence of positive selection. This implies a loci-specific role of sex genes and sexual selection in speciation along with some ecological specialization. Our data therefore indicates that acceleration in rates of evolution is not purely a consequence of strong selection but most likely a combination of factors, such as demographic processes as well as some form of selection (a few genes in the accelerated category are evolving positively, Table 3). These data are interesting particularly in D. sechellia, which is a relatively young species and one that is likely to have undergone founder speciation (see [33]).

According to the founder effect model of speciation [3], speciation occurs as a consequence of major demographic processes in which bottlenecks play an important role. In such a case, a large number of loci involved in a wide range of functions would be affected (but see [58] for a more recent critical analysis of bottlenecks and speciation). Our results as well as those of Pagel et al. [13] do not support the notion of a genetic revolution as a consequence of speciation; the range of loci affected during speciation is rather limited. However, loci affecting a wide range of functions show accelerated rates of evolution (Figures 1, 2, and 3, Tables 1 and 2). This supports a scenario where, as founding populations adapt to new ecological niches the evolution of modified or new behaviors affecting sex and reproduction as well as other ecological adaptations can occur rapidly [5962]. D. sechellia has evolved an intricate ecological relationship with its plant host Morinda citrifolia, on which D. sechellia females oviposit their eggs [63]. Compounds in the pulp of the immature M. citrifolia fruit are toxic to other species of the melanogaster subgroup but not to D. sechellia larve that feed and grow on it [63]. A large proportion of head-expressed genes show accelerated evolution in the D. simulans-D. sechella pair, which might reflect rapid behavioral and sensory modifications that occurred during the host-plant specialization between D. sechellia and M. citrifolia. Genes involved in sensory perception (Gr2a, Gr21a, Gr43a, Obp50a, Or67d, king-tubby, CG32683), sensory organ development (amos, Brd, mib1, Oseg1, Poxn), detoxification (kraken), metabolism, and oogenesis (cup, retn, kel, spir, Tm1) found to be involved in host specialization by Dworkin and Jones [64] are among those that show evidence of accelerated speciation-related rates of evolution in our study.

The D. sechellia data demonstrates the importance of founder-effect and subsequent ecological divergence in driving speciation and as a consequence, causing elevated rates of evolution of relevant protein coding genes associated with the speciation event. Such cases of speciation driven by founder-effect and subsequent resource specialization are not uncommon in Drosophila [45, 59, 65]. D. sechellia, D. mauritiana and D. santomea (a recently described sibling species to D. yakuba) are all insular species within the melanogaster subgroup. Among these, D. sechellia and D. santomea show resource specialization having evolved special ecological relationships of utilizing specific host plants either for food or oviposition [53, 6365], while their parent lineages do not. Such species-specific ecological adaptations are likely to influence accelerated evolution in the newly formed species but not necessarily in the parent lineages. Our study support this reasoning; we find ~400 genes in the accelerated rate category in the D. simulans versus D. sechellia comparison; all of which fall in the clock-like rate category in the D. melanogaster versus D. simulans comparison (see additional file 5). Many of these genes, as mentioned above, have been found to be involved in host-plant specialization [64]. Further detailed study of such genes between D. sechellia and D. simulans, as well as the inclusion of D. santomea genome (when completed) in a similar study should shed some light on our claim. However, it should be noted that while D. sechellia is isolated from D. simulans, D. santomea and its sibling species D. yakuba are sympatric on the Island of San Tome and there is substantial divergence of sexual traits between the two species [53, 6669]. We might therefore expect to see higher proportions of sex-related genes as well as genes involved in ecological specializations to show signatures of accelerated evolution between D. santomea and D. yakuba, much like what we observed between the sympatric D. melanogaster and D. simulans in our study.

5. Conclusion

Signatures of speciation-related accelerated rates of evolution are detected in all newly evolved species analyzed in this study. This is not observed as a widespread genomic signature but is restricted to a fraction of the genome that affects widely different functions. The range and kinds of loci in this rapidly evolving fraction identified in this study complements the recently reported punctuational effect observed across a variety of taxa [13, 15] and improves our focus of studying the genetics of speciation. Population genetic studies of these ‘candidate’ genes can establish the nature of selective forces driving their elevated rates of evolution. In our case, we find a demography-selection driven effect. Testis specific genes and sex-related genes show persistently high rates of evolution, indicating that sexual selection is a constant pressure in diverging and established populations. In the D. simulans-D. sechellia pair, our data reinforce previous evidence that founder effect and ecological specialization played important roles in their speciation process. Our results support the growing appreciation that speciation is often driven by a combination of demographic fluctuations, ecological adaptation, and sexual selection and can seldom be attributed to one single important factor [45, 70].

Acknowledgments

The authors are grateful to the AAA group for making the genomes data available to them. Richard Morton, Alberto Civetta, Sudhir Kumar, Harilaos Lessios, and two anonymous reviewers provided useful and critical comments that greatly improved this manuscript. The authors would also like to thank Ben Evans, Jonathan Stone, and Osamu Miura for very useful discussions, comments, and criticisms on previous versions of this manuscript. This study was supported by a National Science and Engineering Research Grant to R.S.S. and a Smithsonian Institute Molecular Evolution Postdoctoral Fellowship to S. Jagadeeshan. S. Jagadeeshan and W. Haerty contributed equally to this work.

Supplementary Materials

Additional file 1: Comparison between D. melanogaster – D. simulans / D. yakuba – D. erecta.

Additional file 2: Distribution of protiendivergence log2 (P1/P2) to divergencetime ratio log2 (T1/T2), between D. melanogaster D. simulans vs. D. yakuba D. erecta (Supplementary Table 1). The upper limit of the clockwise category is given by the log2 ratio of of divergence times (6.5/8.1) and the lower limit by the log2 ratio 4.3/12.7. Color codes: Red circlesaccelerated evolution, Grey – clocklike, green slow evolution categories.

Additional file 3: Dendrogram of average ω estimates of genes falling under slow, clock-like and accelerated rate categories. Branch lengths represent the proportion of genes with ω values greater than background ω values. P-values for the different tests were corrected using a Bonferroni correction prior to the comparisons. mel: D. melanogaster, sim: D. simulans, sec: D. sechellia, simsecmel: branch leading to the D. melanogaster clade, yak: D. yakuba, ere: D. erecta, yakere, branch leading to the D. yakuba clade, ana: D. ananassae.

Additional file 4: Genes with evidence of positive selection using site models (M8 vs M7) from PAML.

Supplementary data 5: Comparison of gene classification in between the D. simulans - D. Sechellia / D. yakuba – D. erecta and D. simulans – D. melanogaster / D. yakuba – D. erecta analyses.

  1. Supplementary Material
  2. Supplementary Material
  3. Supplementary Material
  4. Supplementary Material
  5. Supplementary Material