Abstract

MicroRNAs (miRNAs) are short noncoding RNAs that regulate gene expression through translational inhibition or mRNA degradation by binding to sequences on the target mRNA. miRNA regulation appears to be the most abundant mode of posttranscriptional regulation affecting 50% of the transcriptome. miRNA genes are often clustered and/or located in introns, and each targets a variable and often large number of mRNAs. Here we discuss the genomic architecture of animal miRNA genes and their evolving interaction with their target mRNAs.

1. Introduction

MicroRNAs (miRNAs) are short noncoding RNAs that regulate gene expression by binding to sequences on the target mRNA (reviewed by [17]. Gene silencing initiates when the miRNA, located within an RNA-Induced Silencing Complex (RISC), directs binding to complementary sequences on the mRNA’s untranslated region (UTR). The miRNA-mRNA recognition binding sequences are short, usually 6–8 nt [811]. Inhibition ofgene expression takes place via facilitated mRNA degradation, mRNA cleavage, or interference with translation.

2. Generation of miRNA Genes

2.1. miRNA Gene Origins

During animal evolution there were distinct, characterized phases of large scale genome duplications [1214]. miRNA origin, as well, is traced back to genomic episodes dominated by large duplication events which coincide with the advent of bilaterians, vertebrates, and (placental) mammals [15]. The current wealth of miRNA genes results, additionally, from specific duplication events of miRNA clusters [16, 17] and from mechanisms such as the integration of repetitive genetic elements [18].

2.2. The Gatekeeper of the miRNA Biogenesis

The transcription of miRNA genes is controlled by enhancer-promoter elements comparable with those of protein-coding genes [19]. Additional regulation of miRNA expression is obtained through posttranscriptional processing [20], RNA A-to-I editing [21, 22], selective export into the cytoplasm [23, 24], and subcellular localization [25] (Figure 1 (see [2, 2643]) also see review by [44]).

While several mechanisms control miRNA's expression along its biogenesis pathway, it seems that the rate limiting step in acquiring a novel miRNA is the recognition of the RNA secondary structure by Drosha. This stems from the fact that mammals express only several hundred miRNAs from myriad amounts of expressed RNA secondary structures [16, 4547]. Thus, processing of miRNA precursor by the microprocessor is probably the gatekeeper of the miRNA biogenesis pathway, which allows for only a portion of the transcribed RNA hairpins to be further processed down the miRNA biosynthetic pathway. Analysis of miR-220 recent evolution provides an intriguing example for a gene that apparently did not encode for a miRNA but became competent for Drosha-dependent microprocessing. miR-220, which contains sequences of a tubulin gene, was probably originally processed from an antisense strand (see [48, 49]) of tubulin, which folds back into a proper stem-loop structure in human but not in other vertebrates [15]. Comparative studies of the tubulin antisense strand sequence may shed light on the reasons for which human Drosha enables microprocessing while in other species it is skipped. Though canonical miRNA bioprocessing is Drosha-dependent, a novel splicing-dependent [33] mechanism was suggested recently to bypass initial steps of microprocessing [33, 50, 51]. Despite the functional robustness of miRNA secondary structures in light of accumulating mutations it is still not clear what the precise requirements for passing the gatekeeper of miRNA biogenesis are.

3. The Genomic Architecture of miRNA Genes and Their Expression

Two characteristics of miRNA genes stand out in regard to genomic organization of protein-coding genes. First, miRNA genes are often found in clusters (30%–42%) [5254]. Additionally, miRNA genes are often embedded within introns (25% or more) [27, 5559].

3.1. Chromosomal Organization of miRNA Genes

In accordance with genomic duplication events that accompany evolution of species, we see a correlation between the number of miRNA genes and chromosome length. miRNA gene number per chromosome also correlates with the protein-coding gene density (Figure 2(a) and 2(b)). This indicates that integration and/or maintenance of miRNA genes roughly follows protein-coding genes.

However, Homo Sapiens chromosomes 14, 19, and are exceptionally enriched for miRNA genes. Chromosomes 14 and 19 both possess a single miRNA cluster, accounting for 93% and 80% of the total number of miRNA genes on each chromosome, respectively [16, 61]. The cluster on chromosome 14 is located in the human imprinted domain (14 q32) where only maternally inherited miRNAs are expressed [62]. Chromosome 19 hosts the primate-specific “500” cluster [16], a recently emerging, placental-specific cluster [16]. The chromosome, on the other hand, does not have one large cluster, but it exhibits rapid emergence of smaller miRNA clusters due to frequent tandem duplications and nucleotide substitutions [17]. We note that despite the parallel evolution of miRNAs in animals and plants, miRNA clusters were observed in both kingdoms [63].

3.2. Clusters of miRNA Genes

Plausibly, employment of an already existing functional promoter by new miRNA genes is an efficient way to express new miRNAs, eliminating the need for de novo establishment of promoter-enhancer sequences upstream of the miRNA gene (such as in [6466]). This may be the rational underlying miRNA aggregation into polycistronic miRNA clusters and for their genomic preference for introns of transcribed genes (see Figure 2(c)). The consequence on the genomic level is that many miRNAs within up to 50 kb DNA fragment tend to be coexpressed [54, 55]. Amplification of an ancestral miRNA inside a cluster [54, 56] could contribute to the effective dosage of a given expressed miRNA homolog. However, at lower copy number gene dose does not seem to be a powerful predictor of expression levels (Figure 2(d)). The most likely interpretation is that the magnitude of promoter activity probably dominates regulation of miRNA expression. The correlation between miRNA gene copy number and expression level that was noted in some cases [67] may nonetheless suggest that when miRNA copy number is high ( 3; Figure 2(d)), it may also serve to impact the expression level.

3.3. Intronic miRNA Genes

At least 25% of miRNA genes are hosted in introns of both protein-coding and noncoding RNAs (Figure 2(c) also see [27, 56, 59]). This is a striking feature of noncoding RNAs (reviewed by [68]), plausibly implying that some noncoding RNAs have developed a functional relationship with their host genes [38, 69]. The use of the same promoter-enhancer system enables coupling of miRNA expression with its host gene, therefore not surprisingly frequently seen [27, 55, 70]. When derived from the same primary transcript, it appears that pri-miRNA maturation by the microprocessor and pre-mRNA editing by the Spliceosome can either coexist independently or interconnect. While some studies imply that these processes hardly interact [59], others have shown strong interactions initiating at transcription [7173]. Overall, given the tight proximity of these cellular events in time and space it is hard to imagine how these functional complexes avoid each other. Further analysis would be required to determine the extent of this interaction and whether this is true for all given transcripts [74].

3.4. Functional Expression of miRNAs and Their Host Genes

While mRNA/miRNA derived from the same transcript may simply reflect an efficient use of a promoter-enhancer cassette [59], in a subset of cases a coordinated expression of an miRNA-protein pair from the same genomic locus may reflect a genetic interaction. For example, platelets contain two cAMP phosphodiesterases (PDEs)—PDE2A and PDE3A—each regulating a specific intracellular pool of cAMP [75]. miR-139 that is hosted in an intron of the PDE2A targets PDE3A (TargetScanS, see [76]), implying that the miRNA expression from PDE2A regulates the balance between the two isoforms. Similarly, miR-208 is encoded by an intron of the cardiac-specific alpha myosin heavy chain (MHC) gene, a major cardiac contractile protein. Alpha MHC responses to stress and hypothyroidism [35, 77] partially by coexpressing miR-208. The miRNA targets and downregulates beta MHC expression [70]. Thus, the precision in regulating an miRNA and a gene product may be hardwired into the genomic organization, to promise proper balance in their opposing or collaborating functions.

4. Generation of miRNA Targets and Their Interaction with miRNAs

4.1. Reciprocal Evolutionary Interaction between miRNAs and Their Targets

Our current understanding of miRNA binding sites suggests that a stretch of 6 nucleotide “seed” region, matching between the end of the miRNA and the mRNA UTR, may suffice for regulation by miRNAs [9, 10, 76, 78]. Because changes in cis sequences often dominates rewiring of genetic networks, [79] it is likely that the UTR of mRNA targets change their repertoire of seed matches faster than the highly conserved transacting miRNAs. This can be intuitively explained merely because the large number of targets affected by mutations in any given miRNA gene acts as a stabilizing element on the miRNA itself. So given a virtually fixed population of miRNAs, targets gain and lose binding sites in a way that supports their controlled miRNA expression. This can be viewed as an evolutionary reciprocal interaction between the miRNA and its accumulating targets. After miRNA emergence, once a critical number of targets are functionally regulated by the miRNA, stabilization of its primary sequence is gained [80], while at the same time, stabilizing selection decreases variation in target seed match [28, 81, 82].

The target set size is also dramatically affected by the nucleotide composition of the new miRNA, and, as mentioned above, this characteristic affects the average selective pressure on the miRNA itself [78]. Given a set of 17 000 UTRs (“Known Genes” in the UCSC genome browser database), some 2000 UTRs would randomly have a single binding site for a heptamer seed composed of A/U residues. This number falls to only 200 seed matches with a G/C-only seed content and is somewhere in between ( 800) for a mixed nucleotide composition (equal number of A/U and G/C). Once emerged, the set of targets affected by a novel miRNA is subject to selective pressure which molds the transcriptome such that binding sites would either be acquired or lost. In fact, selective loss of seed matches, to a level below the randomly predicted baseline, dubbed “anti-targets” [9, 83], provides strong support for the evolutionary power underlying the structure of miRNA binding sites (also see [84]).

The reciprocal interaction between miRNAs and their targets gets an additional perception when looking at this relationship in viral miRNAs. Several viruses express miRNAs for controlling specific cellular genes or pathways. For this purpose, most cellular mRNA targets of viral miRNAs identified to date play a role in either regulation of apoptosis or host antiviral immune response. miRNAs are suitable for a viral genome expression as they are short and compact. In addition, they can be generated more readily than proteins against new target genes and do not elicit any antigenic response. Their evolutionary flexibility is based on the high mutation rates of the viruses. This leads to modifications in the miRNA genes themselves, and thus even the largest virus family containing miRNAs (herpesvirus) shows little conservation between their miRNAs. It also indicates that it is unlikely that host miRNA targets viral mRNAs as these would mutate away from disruptive regulation (also see [85, 86]).

4.2. The Large Variation in miRNA Target Sites

Conserved complementarities to a minimal hexamer region (matching nt 2–7 of the miRNA) [8] indicates that once a seed match emerges, it becomes functional. If the binding is preferentially beneficial, it might serve as a favorable and directional intermediate species. Within Tetrapods, the average number of predicted conserved sites per miRNA is at the range of 200 (Figure 3(a), TargetScanS, plotted for Human miRNAs). However, the number of targets is skewed to the higher values, while the upper and lower 10-percentiles regulate more than 450 or less than 50 genes, respectively (also see [87, 88]). Comparative genomics suggests that ancient miRNAs have on average twofold more targets than newly generated ones (compare 453 to 194, resp.). Some discrepancies result from misestimating miRNA antiquity or overlapping miRNA functional sites. Specifically, the age of some miRNA genes might have been misestimated, as cross-species orthologues searches are not exhausted yet. miR-761, for example, identified only in mouse [57] is in fact conserved in six other mammals (including human and opossum; see [89] also see miRviewer at http://people.csail.mit.edu/akiezun/miRviewer/). Alternatively, overlapping functional sites shared by miRNAs and other regulatory factors may bias the distribution of targets. For example, pre-existing “scaffolds” of other regulatory systems could serve as anchors for miRNA binding. In the case of miR-16, a component of the AU-rich mediated deregulation of mRNA stability [90], the miRNA is a late addition onto a mechanism that was probably functional in the common ancestor of yeast [91], before the innovation of miRNAs. In this train of thought, some transcriptional termination or pause sites [92, 93] overlap with miRNA seed-matches (miR-525 and miR-488). In human, Alu transposable elements exhibit complementarities in some of their regions to almost 30 human miRNAs [94]. In other instances, the attempt to avoid specific protein binding domains in the UTRs may expel miRNA binding sites. For example, UTRs may avoid miR-518a seed (which has only 26 predicted conserved targets) because it perfectly matches the proline and acidic rich (PAR) protein binding sequence [95]. Other miRNA interference events may involve binding to promoters via antisense transcription, which is estimated to be as common as 15% in the human genome [96]. Overlapping sequences as such might coincide with promiscuous promoter-associated functions of small RNAs [36] or increase in transcription [97]. Plausibly a selective pressure to avoid the binding of the aryl hydrocarbon receptor (AhR) [98] onto miR-521 sites (AhR and miR-521 share the same sequence) may explain how miRNAs of similar antiquity and A/U content (compare to miR-520 h) dramatically vary in their predicted numbers of conserved targets (compare 8 to 400, resp.; both miRNAs are part of the same primary transcript, BF773110). It is noteworthy that the low number of miR-521 targets cannot be explained by a conflict of expression in a broad set of tissues since miR-521 is expressed only in placenta.

4.3. Unique Features of miRNAs with Most Number of Targets

In order to further explore the characteristics of miRNAs with extreme number of targets we compared the group of miRNAs with the largest number of targets to that with the least number of targets (Figure 3(b), shaded red and green, resp.). We found some correlation between miRNA conservation and its potential number of predicted targets. This correlation is emphasized in the conserved target sets where human-to-mouse conserved miRNAs have on average 197 predicted conserved targets; human-to-dog conserved miRNAs have 245, and human-to-chicken conserved miRNAs 453. miRNAs with the largest number of targets tend to be expressed mostly from one arm of the pre-miRNA hairpin (they do not exhibit both and arm expression) and are often expressed at higher levels and in a broader set of tissues compared to miRNAs with the least number of targets (also see [99]).

miRNAs with the largest number of targets are A/U-rich. The average A/U percentage within the seed of the top 20 miRNAs with the largest number of targets is 57%, compared to 41% for those with the least number of targets. This may be required for weaker secondary structures in the target mRNA and for ongoing accessibility [11]. Consistently, a general mutational trend (in the human genome) from G-to-A and C-to-T is more abundant than the reverse direction [100]. Analysis of human Single Nucleotide Polymorphisms (SNPs) on a representative chromosome (chromosome 1; 661 SNPs) confirms that the majority of polymorphisms generating new potential miRNA binding sites are G-to-A and C-to-T substitutions (occurring 1.7-fold more than the reverse direction). Interestingly, the two most pronounced examples of target polymorphic changes are G-to-A mutations [39, 101].

In summary, miRNA gene integration and maintenance roughly follow protein-coding genes. After emergence, the miRNA gene sequence is refined through an evolutionary reciprocal interaction with its accumulating targets, and these later stabilize the miRNA when reaching a large enough number of functional targets. Finally, overlapping functional sites shared by miRNAs and other regulatory factors may facilitate or inhibit miRNA target formation and thus influence miRNA target set size.

5. A Timescale for miRNA Target-Site Evolution

It would take several million years for a specific 7-mer binding site to evolve from a complete null binding sequence [102]. However, miRNA binding sites evolve from existing sequences, and based on these partial binding sequences, (“almost-binding” sites or “pre-seed” sites), a corrected estimated time for a miRNA binding site to emerge is 0.2 million years (Durrett R., personal communications). For example, a 5 nt pre-seed site will appear every 1024 nt ( ) or even 20 times more often since the position of the 5 nt within the 7 nt is not restricted and may also include inserts. Thus, a 1 kb UTR will contain several potential pre-seed sequences. A human specific miRNA that is absent even from the chimp genome should be roughly 6 million years old (last estimated split between human and chimp). Given 0.2 million years required for a 7-mer binding site to evolve, around 30 perfect 7-mer binding sites are expected. For an miRNA that is traced back to mouse (split more than 100 million years ago from human), about 500 conserved targets per miRNA are reasonable. This simplified calculation might indicate that, given a spontaneous mutation rate, there should be a direct correlation between the age of an miRNA and the number of targets it possesses and also to the number of duplicated events of the same miRNA site on one transcript. Eventually, it is not enough for the mutation to occur—it should also be maintained in the population after exhibiting a strong selective pressure towards a favorable regulation which can only take place when an miRNA and its targets are spatially and temporally coexpressed [83, 103]. This calculation allows us to set the general time line of events for miRNA formation. Nevertheless there are many outstanding exceptions of small and large miRNA target repertoires (also see Figure 4).

Websites Used

Ensembl: http://www.ensembl.org/

GenBank: http://www.ncbi.nlm.nih.gov/

miRBase: http://microrna.sanger.ac.uk/

miRNAminer: http://groups.csail.mit.edu/pag/mirnaminer/

miRviewer: http://people.csail.mit.edu/akiezun/miRviewer/

Patrocles: http://www.patrocles.org/

TargetRank: http://hollywood.mit.edu/targetrank/

TargetScanS: http://www.targetscan.org/

UCSC genome browser: http://genome.ucsc.edu/.

Acknowledgments

The authors thank the following people for commenting on their manuscript: Brad Friedman, Alex Stark, Iftach Nachman, Robin Friedman, Eric Wang, Rickard Sandberg, Etgar Levy-Nissenbaum, and the Shomron lab members. They thank Rick Durrett and Deena Schmidt for assistance in statistical calculations. Work at the NS lab is supported by the Israeli Ministry of Health and the Kunz-Lion Foundation. EH is the incumbent of the Helen and Milton A. Kimmelman Career Development Chair. Work at the EH lab is supported by grants from the JDRF, ISF, ISf-Legacy, GIF, the Benoziyo Center for Neurological Disease, the Estate of Flourence Blau and the Wolfson Family Charitable trust for miRNA.