Infections with protozoa parasites are associated with high burdens of morbidity and mortality across the developing world. Despite extensive efforts to control the transmission of these parasites, the spread of populations resistant to drugs and the lack of effective vaccines against them contribute to their persistence as major public health problems. Parasites should perform a strict control on the expression of genes involved in their pathogenicity, differentiation, immune evasion, or drug resistance, and the comprehension of the mechanisms implicated in that control could help to develop novel therapeutic strategies. However, until now these mechanisms are poorly understood in protozoa. Recent investigations into gene expression in protozoa parasites suggest that they possess many of the canonical machineries employed by higher eukaryotes for the control of gene expression at transcriptional, posttranscriptional, and epigenetic levels, but they also contain exclusive mechanisms. Here, we review the current understanding about the regulation of gene expression in Plasmodium sp., Trypanosomatids, Entamoeba histolytica and Trichomonas vaginalis.

1. Introduction

For all cells, regulation of gene expression is a fundamental mechanism for development, homeostasis, and adaptation to the environment. In eukaryotes, every step in the process of gene expression is subject to dynamic regulation, including structural changes of chromatin, transcription of DNA into RNA, processing of the transcript, its transport to cytoplasm, and translation of messenger RNA (mRNA) into protein.

Activation of gene expression requires that cells alleviate nucleosome-mediated repression by means of activator proteins that modify chromatin structure. The activation process displaces or remodels chromatin and opens up regions of the DNA for the binding of regulatory proteins. Chromatin associated with transcription (active chromatin) is generally associated with a range of histone modifications including H3 acetylation of lysine 9 (H3K9) and H3 methylation of lysines 4, 36, and 79 (H3K4me, H3K36me, and H3K79me), whereas heterochromatin (inactive chromatin) appears to be marked by H3 methylation of lysines 9 and 27 (H3K9me and H3K27me) as well as H4 lysine 20 (H4K20me) [1].

Transcription is the process by which an RNA molecule is synthesized from a DNA template. This process can be divided into three discrete steps: initiation, RNA chain elongation, and termination. Although any step of this process may be controlled, transcription initiation is the stage that usually is most highly regulated. Transcription initiation requires that a complex of proteins called general transcription factors bind to DNA through elements named promoters. The promoters for protein-coding genes consist of core and proximal promoter elements located from 20 to 3000 base pairs upstream from the transcription start site, depending on the organism. In metazoan, the core promoter contains one or typically more sequence motifs, including the TATA box, Inr, TFIIB recognition element (BRE), and downstream core promoter element (DPE) [2]. Together, these DNA elements recruit components of the transcription preinitiation complex (PIC) that facilitate the positioning and assembly of the RNA polymerase II (RNA pol II) on the promoter.

Thus, transcription initiation is mediated by the concerted action of transcription factors along with the RNA pol II transcriptional machinery, a diversity of coregulators that bridge the DNA-binding factors to the transcriptional machinery, different chromatin remodeling factors that change the nucleosome structure, and a group of enzymes that catalyze covalent modification of histones and other proteins [3].

The synthesis of mRNAs is carried out by the RNA polymerase, which associates transiently not only with the template, but also with many different proteins, including general transcription factors. Finally, the chain of mRNA that results from the direct transcription, called the primary transcript, undergoes modification, sometimes quite extensively, before it can be translated by ribosomes into protein.

Infections with protozoon parasites cause high mortality and morbidity in developing countries and cause an increasing threat to human health. The lack of vaccines against most of the major parasitic diseases has made chemotherapy the only option for treatment. However, resistance to a large number of antiparasitic drugs currently in use causes major health problems. A better understanding to the molecular mechanisms that control the expression of parasitic genes involved in transmission success, pathogenicity, immune evasion, and drug resistance may help to develop novel therapeutic strategies. The progress of DNA transfection techniques and the associated tools have now boosted functional analyses in these microbial eukaryotes. In addition, the sequencing of their genomes and the microarray assays offer the opportunity to undertake comparative genomics of the genes involved in regulation of gene expression. Here, we review the general concepts that have emerged with regards to gene expression regulation in Plasmodium sp. Trypanosomatids, Entamoeba histolytica, and Trichomonas vaginalis.

2. Regulation of Gene Expression in Plasmodium sp.

Human malaria is caused by the infection of four species of Plasmodium, P. ovale, P. vivax, P. malariae, and P. falciparum, which are transmitted by the bites of female Anopheles mosquitoes. Malaria affects upwards 400 million persons worldwide and produces over 2.5 million deaths annually [4].

The life cycle of Plasmodium consists of multiple rounds of asexual replication in the human and both asexual and sexual reproductions in mosquitoes. During the different stages, the parasite undergoes a series of morphological changes needed to infect to and replicate in different organs and cells of both mosquitoes and humans. The morphological transformations carried out by this parasite during its life cycle imply a high degree of regulation of gene expression. In addition, the immune evasion and drug-resistance mechanisms also depend upon finely tuned and accurate control of mRNA transcription.

2.1. Genomes

The Plasmodium genomes are estimated to contain 23–27 million bases, 14 chromosomes, and approximately 5,500 genes, including many members of multigene families likely to be associated with immune evasion and antigenic variation [58]. Plasmodium genomes have a high A/T content (P. falciparum 79.6%, P. vivax 67.7%), and possibly those genes that are exceptionally rich in A/T content may be more recombinogenic and more likely to be involved in immune evasion [9]. About 77% of the genes are conserved across the different species of Plasmodium. However, there are some differences among species; for example, in P. falciparum many of the multigene families involved in immune evasion are located near the ends of chromosomes and are often transcriptionally silent, whereas members of multigene families in P. knowlesi, primarily a monkey parasite, are spread across the chromosomes and are not strictly subtelomeric [8].

2.2. Transcriptomes

RNA expression profiles throughout the life cycle of P. falciparum and P. berghei, a parasite of mice, have been analyzed [1012]. These studies showed that most of the predicted ORFs are transcribed during the intraerythrocytic stage [10, 11]. Additionally, this strategy has shown that approximately 200 genes are specifically transcribed in gametocytes, 41 transcripts are specific to sporozoites, whereas 20% of the predicted ORFs were characterized as specific to the intraerythrocytic stage [11]. The analysis of transcription during the intraerythrocytic stage at one-hour time intervals showed a clustering of genes based on temporal transcript accumulation [10]; each cluster contains genes related by either function or cellular process. An equivalent analysis including sporozoite and gametocyte samples similarly described the coaccumulation of mRNA from genes functionally related [11]. Thus, the timing of expression for the majority of the clusters correlates with a known physiological demand for that process at that time, suggesting a “just-in-time” mode of control, whereby genes are only activated as their biological function becomes necessary to the parasite, after which the genes are downregulated.

Subsequent studies performed to find a correlation between mRNA and protein accumulation throughout the life cycle showed a significant delay between the maximum detection of transcript and protein abundance [12, 13]. These data, in concert with initial analyses of the parasite genome that showed a relative absence of transcription factors [14, 15], suggested a more predominant role for posttranscriptional events in the control of gene expression. However, recent data shows a more complex mechanism of control of gene expression in this parasite.

2.3. General Transcription

In Plasmodium, the transcription of protein-coding genes is generally monocistronic, although in P. falciparum there was found a bicistronic mRNA for the maebl gene, encoding an erythrocyte-binding ligand, along with a putative mitochondrial ATP synthase (PF11_0485) gene [16]. Transcription of mRNA is carried out by an RNA pol II sensitive to low concentrations of -amanitin, and searches of the genome found homologs of all 12 subunits of RNA pol II and many, but not all, of the general transcription factors (GTFs) [15, 17]. Interestingly, many of the GTFs that still remained unidentified, such as TAF3, TAF4, TAF6, TAF8, TAF9, TAF11, and TAF13 contain histone fold domains (HFD). In other eukaryotes, these HFD-containing subunits form heterodimers with TFIID and mediate promoter recognition. The absence of the HFD-containing subunits suggests an unusual architecture of the TFIID complex in this parasite.

The TATA-binding protein (TBP) of P. falciparum (PfTBP) has low identity (42%) at the primary sequence level to the archetypal yeast homologue, but it contains most residues known to be involved in DNA binding [18]. The TATA box elements recognized by PfTBP are located further upstream from transcription start sites that those of other eukaryotes [19], suggesting a scanning mechanism for the accurate recognition of the transcription initiation site. However, the vast majority of P. falciparum genes contain multiple transcription sites, and initiation occurs mainly at adenine nucleotides [20]. This flexibility in the transcription initiation may be explained by the proportional increase in TATA-box-like sequences found in AT-rich upstream sequences.

Another difference in gene transcription in Plasmodium with respect to the classical model of higher eukaryotes is that during the intraerythrocytic phase of the parasite, the preinitiation complex containing PfTBP and PfTFIIE is preassembled on promoters of all intraerythrocytic-expressed genes, independent of their transcriptional activity [21].

2.4. Cis-Regulatory Elements and Transcription Factors

Several studies using upstream regions in transfection systems to drive expression of reporter genes have shown that Plasmodium promoters contain cis-elements several hundred bases upstream of the transcriptional start site that can either activate or repress transcription [2227]. These studies also have identified DNA regions that may be involved in the temporal control of transcription activity. All the identified cis-elements share no similarity to transcription factor binding sites in other eukaryotes, and most of them contains homopolymeric ( ) tracts that appeared in a statistically significant bias over that expected by chance.

An alternative approach used to identify putative cis-regulatory elements in Plasmodium has been through in silico analysis. The aligning of upstream regions of 18 heat-shock genes of P. falciparum and the hsp86 sequences from six other Plasmodium species revealed the presence of a GC-rich sequence ([A/G]NGGGG[C/A]) called the G-box, which is present in multiple copies upstream of several heat-shock genes of different Plasmodium species [28].

Other studies based on the correlation of mRNA expression and on the conservation of cis-acting sequences among divergent species have been performed and putative regulatory elements have been found [29, 30]. A novel data-mining algorithm called OPI (ontology-based pattern identification) was utilized to generate highly associated clusters of potentially co-regulated genes [31]. Using this strategy, 34 putative regulatory motifs were found. These motifs are located upstream of co-transcribed genes encoding proteins involved in a wide variety of cellular functions including development, cell invasion and antigenic variation [32]. For example, an OPI cluster of 246 gametocyte-associated genes revealed a palindromic 10-nucleotide sequence enriched in sexual stage-associated promoters [31]. In addition, a sequence found upstream of the gene encoding the rhoptry-associated protein 3 (RAP3) that contains two PfM18.1 motifs (ATGCA[ ]GTGCA) showed specific binding to nuclear extracts in Electrophoretic Mobility Shift Assays (EMSA) [32].

Recently, by the use of three different motif-discovery programs, four motifs (G-rich, C-rich, CACA and TGTG) were identified [33]. These motifs are over-represented in the upstream regions from genes of 13 clusters expressed in the intraerythrocytic cycle of P. falciparum [33]. These motifs showed similar positions relative to the translational start site, suggesting that they have an important role in expression regulation [33]. However, the transcription factors that bind these motifs remain uncharacterized.

Initially, the use of hidden Markov models (HMM) using 51 motifs commonly distributed in eukaryotic transcription factors to search for similar proteins in the P. falciparum genome revealed that these proteins represent only 1.3% of the genome, whereas in other eukaryotes they correspond to 5.7% [15]. However, the identification of an expanded family of apicomplexan-specific transcription factors (ApiAP2) containing at least one DNA binding domain called AP2 indicate that regulation at transcription initiation level may be also a critical function in the control of gene expression in Plasmodium [34]. Each member of the 26 proteins of the ApiAP2 family contains at least one copy of a small ( 60 amino acids) domain related to DNA-binding domains found in the plant Apetala2/ethylene response transcription factors of plants [34].

The screening of a protein binding array containing all variations of 10-mer DNA sequences identified putative palindromic cis-acting sequences for two transcription factors containing AP2 DNA-binding domains (PF14_0633 and PFF0200c) [35]. The analysis of the transcription profile during intraerythrocytic development showed that after expression of PFF0200c, a transcriptional activation occurs for the majority of the genes that contain the cis-acting sequence recognized by PFF020c in their upstream regions [35], indicating an important role of this transcription factor in parasite development.

Recently, another transcription factor of P. berghei containing an AP2 DNA-binding domain (AP2-O) was characterized [27]. The AP2-O mRNA is synthesized by intraerythrocytic female gametocytes and translated during the ookinete development in the mosquito. The transcription factor specifically binds to a six-base sequence (TAGCTA) located within a short 100–400 bp region from the transcription start site of a set of genes implicated in the midgut invasion [27]. The amino acid sequence of the AP2 domain of AP2-O is highly conserved among several Plasmodium species, and the same cis-acting element was observed in the promoters of orthologous genes of P. falciparum and P. vivax implicated in midgut invasion [27].

Other transcription factors identified in Plasmodium are: a homologue of Myb1 (PfMyb1) [36], and two non-sequence specific transcriptional activators which contain the high mobility group box (HMGB) motif [37, 38]. The knockdown of Pfmyb1 expression inhibits intraerythrocytic development and produces differential expression of several genes containing homologous sequences to Myb-regulatory elements in their flanking sequences [36]. The two transcription factors containing the HMG motif are expressed during asexual (Pfhmgb1) and sexual (Pfhmgb2) stages of parasite development. Knockout of the hmgb2 homologue in P. yoelii induced altered levels of a number of gametocyte-specific transcripts and a significant reduction in oocyst development, but gametogenesis and exflagellation did occur [38].

2.5. Chromatin Structure

The epigenetic control of gene expression plays an important role in P. falciparum. Much of the work in this parasite has focused on the mechanisms that control the mutually exclusive expression of the var gene family.

Plasmodium chromosomes, like those of nearly all eukaryotes, are tightly packed in nucleosomes [39]. P. falciparum contains each of the four core histones required for nucleosome assembly: H2A, H2B, H3 and H4 as well as the histone variants H2AZ, H2Bv, H3.3 and cenH3 [40]. No gene encoding the linker histone H1 has been identified in Plasmodium. Genes for histone modification enzymes, including histone deacetylases and a GCN5-like histone acetyl transferase, have been identified in P. falciparum [41, 42]. In accordance, multiple modifications were observed in the histones of this parasite [40], indicating that in Plasmodium, as in higher eukaryotes, different modifications in histone tails work in concert to generate a transcriptional state.

Studies combining chromatin immunoprecipitation (ChIP) with microarrays (ChIP-on-chip) in P. falciparum showed a positive correlation between a greater acetylation of H3K9 on individual genes and the levels of gene expression [43]. In agreement, inhibition of activity of PfGCN5, the histone acetyl transferase that directs acetylation of H3K9, results in decreased H3K9 acetylation, a decrease in transcriptional activity of a subset of genes and a delaying progress through intraerythrocytic development [44, 45]. In contrast, clusters of silenced genes involved in virulence of P. falciparum are located in heterochromatin, and H3K9m3 is specifically associated with var gene families clustered on subtelomeric and some chromosome internal regions localized to the nuclear periphery [46]. Furthermore, disruption of the histone deacetylase homologue to Sir2 (PfSir2) causes changes in H3K9me3 that are associated with disrupted monoallelic transcription [4648], suggesting that PfSir2 is required for proper silencing. An orthologue of the HP1 protein, which participates in the formation of highly condensed chromatin by recognizing H3K9m2 and H3K9m3, was recently characterized in P. falciparum [49]. The recombinant chromodomain of PfHP1 binds to H3K9me2 and H3K9me3, but not to H3K4m or to unmodified H3 and in vitro forms stable homodimers. ChIP assays showed that PfHP1 is linked to heterochromatin of subtelomeric non-coding repeat regions [49]. Interestingly, the presence of PfHP1 is directly linked to low expression of target genes. Particularly, most of the genes down-regulated upon PfHP1 overexpression are members of variegated gene families [49]. All these results show the relevance of the PfHP1 in epigenetic regulation of exported virulence factors and phenotypic variation, and suggest that in P. falciparum are conserved the elementary components of the histone code for the regulation of gene expression.

In order to avoid the splenic clearance of infected erythrocytes, P. falciparum express a protein (PfEMP-1) on the surface of the infected erythrocytes that allows their adherence to host endothelium. The parasite genome contains approximately 60 polymorphic var genes, each encoding a different form of PfEMP-1. Immune evasion through antigenic variation depends on the ability of the parasite to exclusively express only a single var gene at a time, and to periodically switch expression to alternative var gene variants, thus altering the antigenic properties of the infected cell and avoiding recognition of antibodies directed against previously expressed forms of PfEMP-1 [50]. The mutually exclusive expression of var genes and antigenic variation result from a combination of in situ activation and reversible gene silencing. Fluorescent in situ hybridization (FISH) experiments showed that var genes present in the genome are grouped in 6–8 perinuclear clusters [5153]. The subtelomeric heterochromatic regions of P. falciparum with silent var genes are associated with high levels of H3K9me3 [46]. In contrast, the active var gene, regardless of its location on the chromosome, is localized in a euchromatic subdomain into the normally heterochromatic nuclear periphery [51, 52, 54], and it is enriched with the activation mark H3K9ac [47, 55]. In addition, the H3K4me3 modification is a predominant mark in the active var gene, but during the later stages of the life cycle, where no var genes are expressed (poised state), the promoter of the most recently active var is enriched with dimethylated H3K4 [55].

The presence of high levels of an unregulated episomal var promoter results in a downregulation of the active var gene, and when episomes are removed the parasites displayed random var gene activation [56]. These results suggest that in P. falciparum an active transcription of the variant var gene expressed in a population is necessary for the maintenance of the cellular memory through numerous cell cycles.

Similar to var genes, subtelomeric gene families of P. falciparum such as rif, stevor or Pfmc-2tm code for proteins that showed clonal variation in their expression, and a comparable epigenetic program of control has been proposed for their expression [57].

2.6. Post-Transcriptional Regulation

Translational repression is other mechanism used by Plasmodium to regulate its gene expression. A comparison of the gametocytes transcriptome with the proteomes of gametocytes and ookinetes identified nine genes for which transcripts accumulate in gametocytes but are not translated until the ookinete stage [12]. A subsequent study revealed that these transcripts were highly abundant in the cytoplasm of female gametocytes and distributed as discrete punctuated compartments, similar to the eukaryotic P bodies. Analysis of the UTR from these transcripts revealed the presence of a UUGUU motif, a known cis-acting sequence for Puf binding proteins involved in translational repression [58, 59]. In addition, two Puf proteins from P. falciparum are expressed in gametocytes and ookinetes and exhibit binding to UUGUU sequences [59].

The analysis of the proteome of P. berghei female gametocytes identified an abundantly expressed member of the DEAD-box RNA helicases termed DOZI (development of zygote inhibited), also implicated in translational repression via their RNA binding activity [60]. DOZI interacts with some Plasmodium mRNAs and disruption of its encoding gene results in a downregulation of approximately 370 genes, including some predicted to be important in the early ookinete motility, and in the abort of the development of fertilized female gametocytes [61]. Some other transcripts contain an iron-responsive element—a stem-loop structure formed at their and UTRs that binds an iron regulatory protein (PfIRPa) to inhibit translation or to modulate the mRNA stability [62].

A variant of PfEMP-1(VAR2CSA) is only expressed in the presence of a placenta, suggesting that its expression is repressed in men, children or nonpregnant women. Recently was shown that the gene encoding VAR2CSA contains a small upstream open reading frame that acts to repress translation of the resulting mRNA. The mechanism underlying this translational repression is reversible, allowing high levels of protein translation in the presence of placenta [63].

3. Regulation of Gene Expression in Trypanosomatids

Members of the Trypanosomatidae family constitute a fascinating group of flagellated protozoa. Collectively, these pathogens cause millions of deaths in tropical and subtropical regions of the world. African trypanosomes, transmitted by tsetse flies, are responsible for sleeping sickness in humans and nagana disease in cattle. The human disease takes two forms, depending on the parasite involved. Trypanosoma brucei gambiense is found in west and central Africa and represents more than 90% of reported cases of sleeping sickness, causing a chronic infection, whereas Trypanosoma brucei rhodesiense is found in eastern and southern Africa, represents less than 10% of reported cases, and causes an acute infection [64]. It was estimated that 300,000 individuals were infected in 2000 [65]. However, the access to diagnosis and treatment in many countries where African trypanosomiasis was endemic resulted in a reduction of 68% in the total number of new cases reported between 1995 and 2006 [64]. Trypanosoma cruzi transmitted by triatomine bugs is the causative organism of Chagas’s disease, which is endemic to several regions in Latin America. It is estimated that around 75 million people live in risk areas and 13 million people are currently infected in Central and South America. The global incidence of the disease is considered to be 300,000 new cases per year [66]. Leishmania is a protozoan parasite which alternates life-forms between an intracellular amastigote stage residing in vertebrate macrophages and an extracellular promastigote stage living in the digestive tract of sandflies. Leishmanial infections have diverse clinical manifestations, including cutaneous (CL), mucocutaneous (MCL), diffuse cutaneous (DCL), visceral (VL or kala-azar), postkala-azar dermal leishmaniasis (PKDL) and recidivans (LR) [67]. Leishmaniasis is a public health problem in at least 88 countries, including some of the poorest in the world [68]. The estimated global prevalence of all forms of the disease is 12 million, with 1.5–2 million added cases annually of CL and 500,000 of VL [69].

3.1. Genomes

The genome of T. brucei is 26-megabases and contains 9068 predicted genes, including approximately 900 pseudogenes and approximately 1,700 T. brucei-specific genes [70]. Large subtelomeric arrays contain an archive of 806 variant surface glycoprotein (VSG) genes used by the parasite to evade the mammalian immune system [70].

The T.cruzi genome size is 60.3 Mb organized in 41 chromosomes [71]. The diploid genome contains 22,570 protein-coding genes, of which 12,570 represent allelic pairs. The protein-coding genes are generally arranged in long clusters of tens-to-hundreds of genes on the same DNA strand. Putative function could be assigned to 50.8% of the predicted protein-coding genes [72]. At least 50% of the T. cruzi genome is repetitive sequence, consisting mostly of large gene families of surface proteins, retrotransposons and subtelomeric repeats [72]. The largest gene families encode surface proteins like mucin-associated surface proteins (MASPs), members of the trans-sialidase (TS) superfamily, the surface glycoprotein gp63 protease and some hypothetical proteins [72].

The Leishmania genome size is 34 Mb and the chromosomes range in size from 0.3 to 2.5 Mb [73, 74]. The karyotype is conserved among Leishmania species (albeit with considerable size polymorphism) and the genes are syntenic (with conservation of gene order) [7376], except that the Old World species have 36 chromosomes [73] and the New World species have 35 (L. braziliensis) or 34 (L. mexicana) [75].

A comparison of metabolic pathways encoded by the genomes of T. brucei, T. cruzi, and Leishmania major reveals the least overall metabolic capability in T. brucei and the greatest in L. major [70].

3.2. Transcriptomes

A microarray analysis comprising 21,024 different PCR products of T. brucei was utilized for the identification of genes specifically expressed in human bloodstream and in insect procyclic stages [77]. Approximately 2% of the genomic fragments exhibited significant differences between the transcript levels in the bloodstream and procyclic forms. Of 33 clones that showed overexpression in bloodstream forms, 15 contained sequences similar to those of VSG expression sites and at least six others appeared non-protein-coding. Of 29 procyclic-specific clones, at least eight appeared not to be protein-coding [77]. Other studies of the transcriptome changes in T. brucei utilized a targeted oligonucleotide microarray, representing the strongly developmentally-regulated membrane trafficking system and approximately 10% of the T. brucei genome [78]. Results showed that 6% of the gene cohort is developmentally regulated, including several small GTPases, SNAREs, vesicle coat factors and protein kinases. Therefore, substantial differentiation-dependent remodeling of the trypanosome transcriptome is associated with membrane transport. Recently Jensen et al. [79] using microarrays that contain multiple copies of multiple probes for each gene showed that approximately one-fourth of genes display differences in mRNA levels, suggesting that despite the lack of gene regulation at the level of transcription initiation, this parasite perform an extensive regulation of mRNA abundance associated with different growth stages. However, while trypanosomes regulate mRNA abundance to effect the major changes accompanying differentiation, a given differentiated state appears transcriptionally inflexible because specific gene overexpression, knockdown, altered culture conditions or chemical stress do not provoke detectable changes to steady-state mRNA levels [78].

A T. cruzi DNA microarray was used to compare the transcript profiles of six human isolates: three from asymptomatic and three from cardiac patients [80]. Seven signals were expressed differentially between the two classes of isolates, but the approximately 30-fold greater signal in cardiac strains for ND7 was the most pronounced of the group. The ND7 gene from asymptomatic isolates showed a deletion of 455 bp from nt 222 to nt 677 relative to ND7 of the CL Brener reference strain. The ND7 deletion produces a truncated product that could impair the function of mitochondrial complex I [80]. These results suggested that ND7 constitutes a valuable target for the differential diagnosis of the infective T. cruzi strain [80]. Recently, Minning et al. [81] observed that transcript abundance is also an important level of gene expression regulation in T. cruzi. The microarray analysis of gene expression during the T. cruzi life-cycle showed that relative transcript abundances for over 50% of the genes are significantly regulated during the T. cruzi life-cycle. Among the differentially regulated genes were members of paralog clusters, nearly 10% of which showed divergent expression patterns between cluster members [81].

Results from several microarray studies in Leishmania showed that there is a surprisingly low level of differentially expressed genes, ranging from 0.2% to 9% of total genes, between the amastigote and promastigote life stages [8287]. Thus, the Leishmania genome can be considered to be constitutively expressed with a limited number of genes showing stage-specific expression. Quantitative proteomic analysis of Leishmania relative protein expression showed that there is a weak correlation to gene expression [84]. Therefore, Leishmania protein expression is mainly regulated at the level of post-transcriptional mechanisms.

3.3. General Transcription

The three classical RNA polymerases, identified on the basis of their resistance to -amanitin, have been detected in T. brucei [88]. The major subunit of each of these enzymes has been cloned [89]. In most species of trypanosomes undergoing antigenic variant, with the exception T. vivax, two slightly different genes for RNA pol II subunit were found [90, 91].

In eukaryotes, RNA pol II is a protein complex of more than 500 kDa that contains 12 subunits, 5 of which (RPB5, RPB6, RPB8, RPB10 and RPB12) are shared with RNA pol I and III. The RPB1 is the largest subunit of the T. brucei enzyme and RPB1, RPB2, RPB3, and RPB11 are the functional and structural homologues of eubacterial core subunits of RNA polymerase [92]. RPB4 to RPB10 and RPB12 contribute to the ability of the enzyme to respond to activators to bind tightly with promoter regions, properly initiate RNA transcripts, and ensure efficient and accurate RNA synthesis [92]. Interestingly, the RPB5 homolog (TbRPB5) associated with RNA Pol II is different from the one previously found associated with RNA Pol I. Also two genes coding for different isoforms of TbRPB6 were identified [92], suggesting the existence of polymerase-specific isoforms for both TbRPB5 and TbRPB6.

A functional TBP was identified in T. brucei. This protein, called TRF4, was essential for cell viability and was recruited to the SL RNA gene promoter [9395]. However, the TBP lacks two of the four important phenylalanines that are responsible for bending the DNA on either side of the TATA box. The significant divergence in the trypanosome TBP may indicate functional variation in its role in transcription [93, 94].

Tandem affinity purification (TAP) assays were used to characterize the subunits that form the RNA pol II and III in L. major [96]. Mass spectrometric analysis of the complex copurified with TAP-tagged LmRPB2 identified seven RNA pol II subunits: RPB1, RPB2, RPB5, RPB7, RPB10 and RPB11. With the exception of RPB10 and RPB11, and the addition of RPB8, these were also identified using a TAP-tagged construct of one of the two LmRPB6 orthologues [96]. The latter experiments also identified the RNA pol III subunits RPC1, RPC2, RPC3, RPC4, RPC5, RPC6, RPC9, RPAC1 and RPAC2 [96]. Significantly, the complex precipitated by TAP-tagged LmRPB6 did not contain any RNApol I-specific subunits, suggesting that, unlike other eukaryotes, LmRPB6 is not shared by all three polymerases but is restricted to RNA pol II and III. In addition to these RNA pol subunits, several other proteins that copurified with RNA pol II and III complexes were identified, these include a potential transcription factor, several histones, the splicing factor PTSR-1, RNA binding proteins, and others, suggesting that they may be physically associated with the RNA pol II complex [96].

The Trypanosomatidae family possesses unusual mechanisms of gene expression such as polycistronic transcription [97, 98], trans-splicing processing of the pre-mRNA [99], RNA editing of the mitochondrial transcripts and transcription of protein-coding genes by RNA pol I [100]. In these organisms, the mature nuclear mRNAs are generated from primary transcripts by trans-splicing, a process that adds a capped 39-nucleotide miniexon or spliced leader (SL) to the termini of the mRNAs [101]. The steady-state levels of most of the mature mRNA appear to be regulated posttranscriptionally by mechanisms that involve their untranslated region sequences [102]. In T. brucei, the genes encoding the variant surface glycoprotein (VSG) and procyclic acidic repetitive protein (PARP) are transcribed by an RNA polymerase that is resistant to -amanitin, indicating that trypanosomes can transcribe protein-coding genes by RNA pol I [103].

Promoters from RNA pol I have been extensively characterized in trypanosomatids [104106], as have been some pol III promoters [107]. However, little is known about the sequences that drive the expression of protein-coding genes by RNA pol II. The apparent lack of regulation of pol II transcription and the observation that episomal molecules are transcribed in both strands have led to the hypothesis that in trypanosomatids, RNA pol II has very low specificity, and that transcription can initiate indiscriminately at several sites along the polycistronic units [108, 109]. Now it is generally accepted that the trypanosomatid genomes are organized in long polycistronic transcription units. The genes are usually separated by only a few hundred base pairs and, with a few exceptions, they do not contain introns.

3.4. Cis-Regulatory Elements and Transcription Factors

Only a few promoters have been characterized in trypanosomes: those of the ribosomal (rRNA), the procyclic acidic repetitive protein (PARP) and variant surface glycoprotein (VSG) genes of T. brucei [110, 111] and those for some small RNA genes. In addition, a cis-acting regulatory element able to determine the strandedness of transcription has been found upstream of a multidrug resistance gene in L. enriettii [112]. There is not significant sequence homology among these promoters or with any known eukaryotic promoter.

In the case of the VSG promoter, the 70 bp region preceding the transcription start site is sufficient to ensure maximal activity in transcription [113]. In the case of the ribosomal and procyclin promoters, the 70 to 1 bp region constitutes a core element whose basal activity is stimulated by an upstream control element located around position 200 bp [114]. These promoters contain two stretches, centred approximately at 60 and 35 bp, that are essential for promoter activity and whose spacing appears to be critical. Interestingly, the binding of specific proteins requires that the target DNA be single-stranded [115]. These results suggest that these promoters should be partially denatured to be functional [116].

Many groups have compared and contrasted the promoter elements of the SL RNA gene in various Trypanosomatids [117122]. In most cases, SL RNA gene promoter consists of three closely spaced short elements that are located upstream of and proximal to the transcription start site. Mutational analysis showed that nucleotides near the start site ( 1), at position 10 to 10 bp in L. amazonensis serve to direct correct transcription initiation of the SL RNA gene and an upstream element in the promoter ( 80 to 60 bp) is essential for efficient SL RNA expression in vivo [123]. The SL RNA genes in L. tarentolae are organized into two separate, head-to-tail tandem arrays, MINA and MINB [124]. The MINA array contains 60 gene copies with 363 bp repeat length of which 105 are transcribed, resulting in a 96 nt mature transcript. The MINB array has 40 copies, with a periodicity of 296 nt. The transcribed regions in both arrays are identical. The non-transcribed regions contain a bipartite promoter at 67 to 58 bp (the 60 element) and 40 to 31 bp ( 30 element) from the transcription start site in the MINA array [118, 122].

In Leptomonas seymouri a factor termed promoter-binding protein 1 (PBP-1) that specifically binds to SL RNA gene promoter in the region from 60 to 70 bp was identified [125]. PBP-1, the first sequence-specific, double stranded DNA-binding protein isolated in Trypanosomes is composed of 57, 46 and 36 kDa subunits [125]. The 46 kDa is a previously uncharacterized protein and may be unique in Trypanosomes. Its predicted tertiary structure suggests it binds DNA as part of a complex. The 57 kDa subunit is orthologous to the human Small Nuclear Activating Protein 50 (SNAP50), which is an essential subunit of the SNAP complex [125]. In human cells, the SNAP complex binds to the proximal sequence element in both RNA polymerase II- and III-dependent small nuclear RNA gene promoters.

A protein complex that specifically binds to the 60 element of the SL RNA gene promoter of L. tarentolae was identified by EMSA [126]. The complex has an estimated mass of 159 kDa and it contains a homologue of TBP (LtTBP). Both LtTBP and LtSNAP50 are found near the spliced leader RNA gene promoter and the promoters important for tRNA Ala and/or U2 snRNA gene transcription [126].

The use of ChIP-on-chip analysis to probe genome-wide transcription factor occupancy suggest that there are only 184 transcription-initiation sites for protein-coding genes in L. major [127]. This analysis also extend the understanding of the roles of TBP and SNAP50 in L. major transcription, because these proteins appear to bind to all RNA polymerase II and III promoters and appear to have identical binding patterns genome-wide, laying open the interesting possibility that the SNAP complex may serve as a general transcription factor for protein coding transcription in these organisms [126].

The transcriptional analysis of chromosome 1 from L. major Friedlin, the reference strain of the Leishmania Genome Project, revealed the presence of 79 putative genes, the first 29 of which are in a cluster on the “bottom” DNA strand, while the remaining 50 are in a cluster on the “top” strand [128]. The DNA segment between both clusters is called the “strand switch region”. Importantly, nuclear run-on analysis of chromosome 1 showed that specific transcription, leading to the production of stable transcripts, initiated within the strand switch region and proceeds bidirectionally toward the telomeres [98]. Stable-transfection studies support the presence of a bidirectional promoter in this region of chromosome 1. It also appears that nonspecific transcription takes place over the entire chromosome 1, but at a level 10-fold lower than the specific transcription initiating in the strand-switch area. -rapid amplification of cDNA ends (RACE) studies localized the initiation sites to  bp region [98]. Thus, while in most eukaryotes each gene possesses its own promoter, a single region seems to drive the expression of the entire chromosome 1 in L. major Friedlin. Although these results showed that pol II drives transcription of chromosome 1, no typical pol II promoter elements are present in the 73 bp region that separates each transcription unit. Moreover, no sequence conservation is discernible between the strand-switch area on chromosome 1 and strand switch region on other chromosomes of L. major Friedlin [98].

The transcriptional analysis by nuclear run-on of chromosome 3 was also described [129]. This chromosome contains 97 putative protein-coding genes organized into two long convergent clusters, which are separated by a tRN gene [129]. In addition, a single divergent gene is located at the “left” end of the chromosome. Data showed that pol II transcription on chromosome 3 initiates bidirectionally between the single subtelomeric gene and the adjacent 67-gene cluster and near the “right” telomere upstream of the 30-gene cluster. The tRN gene is transcribed by pol III. Transcription on both strands terminates in the tRNA-gene region [129].

Promoters from RNA pol I have been characterized in different species of Leishmania. The analysis of the ribosomal RNA (rRNA) gene promoter of L. donovani showed that these genes are organized on chromosome 27 as tandem repeats of approximately 12.5 kb. Each repeat contains the subunit rRNAs and approximately 39 copies of a 64-bp species-specific sequence [106]. The transcription initiation site was mapped to 1020 bp upstream of the 18S rRNA gene. A 349-bp sequence located between the 64-bp repeats and the 18S rRNA gene appears to contain a promoter [106]. Three domains ( 76 to 57, 46 to 27 and 6 to 4 bp, relative to the transcription initiation site) were found to mediate promoter activity, suggesting that the rRNA is not dissimilar to that of other eukaryotes [106]. Similar findings were reported in the rRNA promoters of L. major Friedlin and L. amazonensis, where the transcription initiation site of the rRNA units were localized to 1043 and 1048 bp upstream of the rRNA genes, respectively [130, 131]. The repetitive element (60 bp for L. major and 63 bp for L. amazonensis) was also identified in the intergenic spacers, and constructs containing the rRNA gene promoters were able to drive the expression of reporter genes [130, 131].

3.5. Chromatin Structure

Although gene expression in Trypanosomatids is predominantly regulated posttranscriptionally, several lines of evidence point to important roles for chromatin structure and modification in gene expression, cell cycle control and differentiation [132134]. In Trypanosomes, all classes of histones are present and DNA is packed into nucleosomes [135].

Analysis of the genomic organization of the SL RNA genes of L. tarentolae showed that a single nucleosome is positioned on its intergenic region, leaving the promoter and the transcribed gene region free of nucleosomes [136]. The array periodicity of one nucleosome per 363 bp differed from the standard heterochromatin arrangement in this species of one nucleosome per 230 bp. The array is bent further by the interaction with transcription factors. Thus, nucleosome arrangement may be vital for efficient transcription initiation of the SL RNA gene. On the other hand, ChIP-on-chip assays in L. major suggested that H3 histones at the origins of polycistronic transcription of protein-coding genes are acetylated [127]. Thus, global regulation of transcription initiation may be achieved by modifying the acetylation state of H3 histone on these origins [127].

Histone acetylation, methylation and phosphorylation have been described in T. brucei and T. cruzi [137139]. In addition, a detailed analysis of histone modifications in T. brucei showed the lack of the initial methionine residue of H2A, H2B and H4 and that the N-terminal alanine of these proteins could be monomethylated [140]. These studies also found that the histone H4 N-terminus is heavily modified, while, in contrast to other organisms, the histone H2A and H2B N-termini have relatively few modifications [140]. T. brucei expressed three distinct MYST-family members, all of which have homologues in T. cruzi and Leishmania and was described nonredundant roles for each of these histone acetyltransferases in bloodstream-forms of T. brucei [141]. HAT1 modulates telomeric silencing and is required for growth, and possibly, for DNA replication; HAT2 is required for H4K10 acetylation and growth; and HAT3 is required for H4K4 acetylation and is dispensable for growth. The orthologues in Leishmania likely have similar roles. The nonredundant functions for T. brucei HAT1-3 appear to reflect unique substrates for each acetyltransferase and further support the idea of a simplified, nonredundant histone code in these divergent parasites [141].

3.6. Post-Transcriptional Regulation

The general organization of trypanosomatids genes in polycistronic units means that most of the genes are transcribed at an equivalent rate within large polycistronic clusters; consequently, there must be present a post- transcriptional mechanism in these organisms to control the gene expression. Constitutive synthesis of the transcriptome and selection of the right messages only at the maturation step probably enables the parasite to switch gene expression rapidly to survive and adapt to a new environment. Although this system requires the permanent degradation of an important fraction of the transcriptome, trypanosomatids have avoided the burden of encoding networks of specific transcription factors and target sequences.

Post-transcriptional regulation could be exerted through sequence elements in intergenic regions. Evidence of the primary role of untranslated regions (UTRs) to determine the relative stage-specific mRNA abundance has accumulated [142, 143]. The -terminal region of VSG and procyclin transcripts regulates the expression of a reporter gene in an inverse manner, depending on the developmental form of the parasite [143]. In the case of VSG mRNA, the 97 nt sequence upstream from the polyadenylation site is responsible for these effects. The regulation occurs through a variation of mRNA abundance which is not due to a change in primary transcription. In the bloodstream form this effect is manifested by an increase in RNA stability, whereas in the procyclic form it seems to be related to a reduction in the efficiency of mRNA maturation [143]. The -end of VSG mRNA can obviate the 5- to 10-fold stimulation of transcription driven by the procyclin promoter during differentiation from the bloodstream to the procyclic form [143].

An analysis of synonymous codon bias in Trypanosomatids showed the enrichment of “favoured” codons in more highly expressed genes [144]. Consistent with translational selection, cognate tRNA genes for favoured codons are over-represented [144]. In addition, relative codon bias is conserved among orthologous genes from divergent Trypanosomatids (T. cruzi, T. brucei, L. major) even in genes thought to be expressed at low level [144]. Taken together, the results suggest that control of the level of translation is an important mechanism underlaying differential protein expression in Trypanosomatids.

4. Regulation of Gene Expression in Entamoeba histolytica

Entamoeba histolytica, an enteric protozoan parasite, is the etiologic agent of human amoebiasis. It has been estimated that every year 50 million cases of invasive amoebiasis occur and approximately 110,000 deaths, but interestingly only 10% of the infected people present disease symptoms expressed as intestinal or extraintestinal amoebiasis [145]. The molecular mechanisms participating in the parasite invasiveness are not completely understood. Diverse populations of E. histolytica, including clones derived from a particular strain, display different virulence phenotypes [146148]. In addition, long-time cultured trophozoites, which show poor capacity to produce liver abscesses in experimental animals, recover their virulence after incubation with cholesterol or with certain types of bacteria, or after their passage by Hamster livers [149152]. This behavior could be due to changes in expression of certain genes. In addition, the life cycle of this parasite involving the reversible conversion of the infective forms (cysts) to the invasive cells (trophozoites) is expected to be due to differential expression of E. histolytica genes.

4.1. Genome

The genome of this parasite consists of 23,751,783 bp distributed among 888 scaffolds and contains approximately 9,938 genes [153]. It has been difficult to determine the number of E. histolytica chromosomes, because they do not condense [154]. However, the presence of 14 independent linkage groups has been reported [155]. In addition, the genome of this parasite contains a number of circular plasmid-like molecules [154]. The intergenic regions are from 400 bp to 2.3 kb, suggesting a tight packing of genes [154]. Approximately 25% of the E. histolytica genes contain introns, with 6% of genes containing multiple introns [153]. Intron sequences are relatively short (46–115 bp), and they contain the dinucleotides GU and AG at the donor and acceptor splice sites, respectively. Around 10% of the genome consist of tRNAs genes organized in tandem arrays that vary in unit length from 490 to 1775 bp and containing from 1 to 5 tRNA genes [156]. The rRNA genes are located exclusively on extrachromosomal circular DNA molecules with an approximate size of 26 kb [157]. The genomes of E. histolytica and other species of Entamoeba have a high A/T content (77.6%), with the exception of E. moshkovskii, which has aproximately10% less A/T content [154].

4.2. Transcriptome

E. histolytica undergoes the reversible switch between the infective cysts and the invasive trophozoites. Identification of genes involved in the developmental pathway was examined by whole-genome transcriptional profiling [158]. Approximately, 15% of annotated genes are potentially developmentally regulated. Genes enriched in cysts (672 in total) included cysteine proteinases and transmembrane protein kinases. Genes enriched in trophozoites (767 in total) included genes involved in tissue invasion, putative regulators of differentiation, including possible G-protein coupled receptors, signal transduction proteins and transcription factors [158]. A number of E. histolytica stage-specific genes were also developmentally regulated in the reptilian parasite E. invadens [158], indicating that they likely have conserved functions in Entamoeba development.

The majority of human infections with E. histolytica remain asymptomatic. In some infections trophozoites invade the intestinal mucosa producing dysentery, and in a small fraction of infections, trophozoites disseminate to the liver, where they induce abscess formation. It is assumed that the ability of E. histolytica trophozoites to survive within the host and to destroy host tissues is accomplished by the specific regulation of a number of amoeba proteins.

A genome-wide transcriptional analysis of E. histolytica performed on trophozoites isolated from the colon of six infected mice and from in vitro culture revealed 523 transcripts (5.2% of all E. histolytica genes), whose expression was significantly changed in trophozoites isolated from the intestine [159]. The genes that modify their expression in trophozoites obtained from the mice encode proteins implicated in metabolism, oxygen defense, cell signaling, virulence, and antibacterial activity [159]. Control of the observed changes in the transcriptome might potentially rest with four related proteins with DNA binding domains that were down-regulated in the intestinal environment [159].

Comparison of RNA abundance between E. histolytica trophozoites isolated from liver abscesses of experimental animals and those obtained from in vitro culture found that at least seven E. histolytica genes were specifically up regulated and five were down regulated in trophozoites isolated from the livers [160]. The genes specifically up regulated encode proteins associated with heat shock, some ribosomal proteins, cyclophilin, ferredoxin 2, and the small GTPase RAB7D, whereas two of the genes down regulated encode members of a family of proteins containing repetitive stretches of sequences that are rich in lysine and glutamic acid residues [160]. All these results support the idea that host invasion requires the regulation and concerted action of a variety of amoeba proteins.

The varied outcome of infection by E. histolytica could be due also to differences in the virulence of the E. histolytica isolates. Comparison of the transcriptomes of strains (highly virulent) and Rahman (low virulent) showed 353 transcripts that exhibited at least a two-fold difference between those strains; 152 transcripts were expressed at higher levels in and 201 transcripts were more expressed in Rahman strain [161]. The genes differentially expressed included cysteine proteinases (CPs), AIG family members, and lectin light chains [161]. The genes of the cysteine proteases EhCP4, EhCP6, and EhCP7 were expressed approximately three-fold in than in Rahman strain [161]. In contrast, the expression of the genes of the cysteine proteases EhCP8, EhCP112 and EhCP3 was higher in Rahman strain than in [161]. The most striking difference was seen in the expression of EhCP3, which was approximately 100-fold higher in Rahman than [161]. Interestingly, in the non-pathogenic amoeba E. dispar, EhCP3 is expressed at higher levels compared to E. histolytica [162]. All these results validate the hypothesis that CPs have an important role in the E. histolytica virulence.

In addition, E. histolytica trophozoites require different proteins to survive when they are exposed to different environmental conditions. The gene expression pattern of E. histolytica trophozoites exposed to a heat shock stress showed a massive gene down regulation [163]. Of the 1,131 unique genes probed by the microarray, 471 (42%) were significantly repressed during the heat shock treatment. A small number of genes were up regulated by heat shock; including those that encode the heat shock proteins hsp90 and hsp70, some hypothetical proteins and regulatory factors such as BRF, and a putative reverse transcriptase [163]. Heat shock treatment also induced the transcription of some CP genes [163]. The EhCP6 gene was especially up regulated by heat shock, indicating the particular activity of this protease during stress due to its potential role in the degradation of damaged proteins. This study also found that some alleles of the genes encoding for heavy (Hgl) and light (Lgl) subunits of the Galactose/N-acetyl-D-galactosamine-inhibitable lectin (Gal/GalNac lectin) were involved in the heat shock response [163].

4.3. General Transcription

In E. histolytica very few sequences and, therefore, transcription factors have been identified, isolated, and characterized not only at structural level, but at the functional level, as well.

The transcription of protein-coding genes is monocistronic and mRNA is synthesized by an unusual RNA polymerase resistant to -amanitin [164]. The core promoter of several protein-coding genes contains three conserved motifs [165, 166], and according to the study of different E. histolytica gene promoters, the size of the functional promoter region is between 200 to 900 bp [167173]. TBP is the only member of the basal transcription machinery of this parasite characterized so far. EhTBP has 234 amino acid residues and its functional domain showed 55% sequence identity to TBP of Homo sapiens [174]. The recombinant EhTBP formed specific complexes with the consensus TATA-box sequence of E. histolytica and with other TATA-like motifs [175], indicating that this protein is more promiscuous than TBPs of human and yeast. This behavior of EhTBP probably is due to the presence of modifications in some amino acid residues involved in the binding to DNA.

4.4. Cis-Regulatory Elements and Transcription Factors

At the structural level, the core promoters of E. histolytica contain three conserved elements: the TATA-box ( –35 to 25 bp) enriched in T and A bases (TAT/GT/G/AT/G/AA/GAAC/G)—the Initiator sequence AAAAATTCA (Inr) that is overlying the transcription initiation site, and the GAAC element (AA/TGAACT) [165, 176]. The GAAC element controls the rate and site of transcription initiation, mediates the transcriptional activation by some upstream regulatory regions, and functions in a context-dependent manner [176].

The development of DNA-mediated transfection for E. histolytica [177, 178] enabled the characterization of cis-acting promoter elements required for gene expression. However, until now few promoters upstream of protein coding genes have been analyzed at structural and functional level; among them are the promoters of genes encode the heavy subunit of the Gal/GalNac lectin (hgl2 and hgl5), the multidrug resistance proteins EhPgp1 and EhPgp5, the EhADHCP complex, and the small GTPase EhRabB [167173].

The promoter of the hgl2 gene includes two regulatory elements; a sequence located 100 bp upstream of the transcription start site similar to the CCAAT-box motif found in gene promoters of some higher eukaryotes, and a sequence of 15 bp situated at 520 bp [167]. This study also showed that mutations in the putative TATA-box or in the ATTCA element reduced the promoter activity from 20 to 56% with respect to that displayed by the wild type promoter [167].

The full transcriptional activity of the hgl5 gene promoter was obtained through 272 bp upstream of the transcription initiation site [168]. Five upstream regulatory elements (UREs) were identified in this region; four of them act as positive regulatory elements: URE1 ( 49 to 40 bp), URE2 ( 69 to 60 bp), URE4 ( 189 to 160) and URE5 ( 219 to 200), whereas the URE3 motif ( 89 to 80) performs a negative regulatory activity [168]. However, URE3 functions as a positive regulatory element in the ferredoxin (fdx1) promoter region [179].

URE4 is formed by two direct repeats of nine base pair and it functions as an enhancer in the hgl5 gene [180]. Two polypeptides of 28- and 18-kDa, named EhEBP1 and EhEBP2, recognize the URE4 sequence [181]. These proteins contain two (EhEBP1) and one (EhEBP2) sequences homologous to the RNA recognition motif RRM. This domain has been found in a large number of RNA-binding proteins and in several sequence-specific DNA-binding proteins [182]. The over expression of EhEBP1 in trophozoites decreased the expression of a reporter gene under the control of the hgl5 promoter [181], demonstrating the role of the protein EhEBP1 in the transcriptional control of E. histolytica.

Using a yeast one-hybrid screen, a 22.6 kDa protein that specifically binds to URE3 (URE3-BP) was identified [183]. This protein contains two EF-hand motifs, which is the most common calcium-binding motif found in proteins, suggesting that the activity of URE3-BP may be regulated by calcium. Indeed, it was demonstrated that relatively high concentrations of calcium (100–500 mM) inhibited the DNA-binding activity of URE3-BP [183]. ChIP assays corroborated the calcium-dependent interaction of URE3-BP with both hgl5 and fdx1 promoters. Recently, was demonstrated that several genes of E. histolytica are regulated by URE3-BP [184]. The URE3 motif was found in the 59 promoter regions of the genes modulated by URE3-BP. These genes encode proteins involved in fatty acid metabolism and in potential membrane proteins, suggesting that URE3-BP could be engaged to remodel the surface of trophozoites in response to a calcium signal [184].

The analysis of the EhPgp1 gene promoter revealed that it does not contain a TATA-box motif, but it has a putative Inr sequence and several transcription initiation sites [169]. Moreover, this promoter contains sequences similar to some cis-regulatory elements of higher eukaryotes, such as C/EBP, GATA-1, OCT, and HOX motifs. Several of these elements were able to compete the DNA-binding activity of nuclear extracts in EMSA [169]. Mutational analyses of some of these elements demonstrated the functional relevance of three regions of the EhPgp1 core promoter; two of them correspond to C/EBP regulatory sites ( 54 to 43 bp and 198 to 186 bp) [185]. Nuclear proteins from trophozoites specifically bind to these C/EBP sequences, and two polypeptides of 25 and 65 kDa were recognized by anti-C/EBP antibodies [185]. These results suggest the presence of C/EBP like-proteins in E. histolytica. The other functional sequence identified in the EhPgp1 gene promoter contains repeated sequences and GATA-1, Gal4, Nit-2, and C/EBP consensus sequences [186].

In the emetine-resistant clone C2, the EhPgp5 gene displays an inducible expression pattern when trophozoites are exposed to the drug. The structural analysis of its promoter showed the presence of a TATA box at 31 bp and an Inr consensus sequence located only three nucleotides upstream from the start codon [170]. By primer extension assays, a single product mapping at the Inr sequence was detected in mRNA from clone C2 grown in the presence of 225  M of emetine (C2225). However, this product was not detected in mRNA from trophozoites grown in the absence of the drug; instead, we found a minor primer extension product at 16 bases downstream the ATG, which has no ORF [170]. These results suggest that EhPgp5 gene expression could be associated with the accurate selection of the transcription initiation site. Consensus sequences for the binding of AP-1, HOX, C/EBP, OCT-1, PIT-1, OCT-6, CF-1 and MYC were detected at the EhPgp5 promoter [170]. Gel shift competition assays showed evidence that some nuclear proteins similar to those transcription factors could be specifically recognizing DNA binding sites. Functional promoter assays showed that the EhPgp5 gene promoter was active in transfected trophozoites of clone C2 in the absence of emetine, but its activity increased when trophozoites were cultured in 40  M emetine, whereas the transcriptional activity was turned off in the drug-sensitive clone A [170]. These results suggest that emetine is an inductor of the EhPgp5 over expression in trophozoites of clone C2. Deletion analysis of the EhPgp5 promoter region delimited a fragment of 59 bp ( 170 to 110 bp) where the emetine response element could be situated [187].

The EhRabB protein is a Rab GTPase located in small vesicles that in wild-type trophozoites are translocated to plasma membrane and to phagocytic mouths during phagocytosis [188], whereas, in trophozoites deficient in phagocytosis most of these vesicles remain in the cytoplasm [189]. The EhrabB gene is located close to the Ehcp112 and Ehadh112 genes, whose products form the EhCPADH complex, which is involved in the pathogenic mechanism of E. histolytica [190]. These three genes span a 4500 bp region named virulence locus (VI) [172], providing a good model to study gene transcription regulation of virulence-related genes. The EhrabB gene is situated 332 bp upstream of the Ehcp112 gene, but in the complementary strand [188]. In silico analyses of the -flanking sequence of the EhrabB gene showed that it does not contain Inr elements or TATA-box consensus sequences, but it has a sequence similar to the GAAC element [173]. These analyses also showed the presence of sequences similar to C/EBP, GATA-1, and heat shock elements (HSE) of higher eukaryotes, and a sequence related to the URE1 motif of the hgl5 gene promoter [173]. Functional assays of the EhrabB promoter showed that: (i) the C/EBP and GATA-1 sequences may not be relevant for EhrabB gene transcription, because their removal did not show significant effect in CAT activity; (ii) a DNA region located between positions 428 to 683 bp negatively controls the EhrabB transcription; and (iii) a DNA fragment located at 257 to 428 bp, where HSE and URE1 motifs were detected, activates the EhrabB transcription [173]. Deletion of the URE1 sequence showed a decrease in the expression of the CAT reporter gene, indicating that URE1 is a cis-activating element of the EhrabB transcription [173]. Finally, functional CAT assays with a construction that includes seven HSEs to transfect E. histolytica showed an increase in CAT enzymatic activity of approximately twice in heat shocked trophozoites with respect to the activity displayed by cells maintained at C [173]. These results indicate that HSE motifs present into the EhrabB gene promoter could be functional under heat shock stress. All these results show that transcription of EhrabB is coordinated by different cis-elements that are specifically recognized by proteins under certain environmental conditions.

The heat shock transcription factors (HSTF), are proteins that under stress conditions rapidly activate and bind to the heat shock element (HSE) present in the hsp promoters. Then, this factor induces the expression of hsp genes, whose products ensure the survival of the cell during stressful conditions by providing defense against general protein damage [191]. Recently, Gomez-García et al. [192] described the in silico identification of three hstf genes (Ehhstf1, Ehhstf2, Ehhstf3) in E. histolytica. The proteins encoded by these genes posses a conserved DNA-binding domain with 24% of identity and 37% of similarity to the DNA-binding domain of different HSTFs from Homo sapiens, Gallus gallus, Mus musculus and Arabidopsis thailiana [192]. The phylogenetic tree constructed using the alignment of the EhHSTFs and HSTFs from other organisms showed a closer relationship between EhHSTF2 and EhHSTF3, while EhHSTF1 exhibits high similarity with the HSTF1 from A. thailiana and appears to be deriving from the same root than HSTF1 from human, mouse, chicken, frog and fruit fly [192].

Another transcription factor that has been identified in E. histolytica is a protein similar to the tumor suppressor protein p53 [193]. The Ehp53 protein shows 30%–54% and 50%–57% homology with important domains of p53 from human and Drosophila melanogaster, respectively. This homology included the tetramerization domain, the nuclear export signal and a nuclear localization signal. Ehp53 also contains seven of the eight DNA-binding residues and two of the four Zn2+-binding sites described for p53 [193]. Heterologous monoclonal antibodies against p53 (Ab-1 and Ab-2) recognized a single 53 kDa spot in two-dimensional gels and they inhibited the formation of DNA-protein complexes produced by the interaction of nuclear extracts of E. histolytica with an oligonucleotide containing the consensus sequence for the binding of human p53. A recombinant Ehp53 polypeptide was recognized by Ab-2 antibodies and this protein also was detected in E. moshkovskii and E. invadens [193].

A member of the high-mobility group (HMGB) was identified in E. histolytica (HMGB1) [194]. Its amino acid sequence has significant homology with HMGB proteins from a diverse range of species, like P. falciparum (53% and 58% with PfHMGB1and PfHMGB2, resp.), Schistosoma mansoni (40%), and H. sapiens (50%). Two residues have been predicted to be crucial to determine the structural DNA specificity [37], a serine at position 10 and a hydrophobic residue at position 32 according to residue numbering of D. melanogaster HMG-D. The corresponding conserved residues in EhHMGB1 are threonine at position 34 and phenylalanine at position 56 [194]. EhHMGB1 also has the acidic C-terminal tail seen in other eukaryotes. Thus, all these computational analyses supported the idea that EhHMGB1 is a bona fide HMGB protein. Moreover, recombinant EhHMGB1 shared the capacity of human HMGB1 to augment the binding of certain transcription factors to DNA, and it is localized at the nucleus [194]. Overexpression of EhHMGB1 in trophozoites led to modulation of 33 transcripts involved in a variety of cellular functions. Of these, 20 were also modulated in the mouse model of intestinal amoebiasis [159, 194]. Four genes known to be involved in virulence were modulated by the overexpression of EhMGB1, including those coding for two of the five Gal/GalNac lectin light subunits, the cysteine proteinase EhCP-A7, and a potential enterotoxic peptide [194]. These results suggest a role of EhHMGB1 in parasite adaptation to, and destruction of, the host intestine.

The signal transducers and activators of transcription (STAT) factors are cytoplasmic proteins that after tyrosine-phosphorylation form homo- or heterodimers that are translocated to the nucleus, where they bind to DNA within a well defined consensus sequence called SIE [195]. E. histolytica contains transcription factors of the STAT family, where they could potentially function downstream of receptor kinases in processes related to pathogenesis [196]. The interaction of trophozoites with collagen type I and calcium induces the expression and activation of proteins homologous to STAT1 and STAT3 [197]. Collagen induces a time dependent increase in tyrosine phosphorylation of both STAT1K and STAT1L. These proteins become tyrosine-phosphorylated as early as 15 minutes of stimulation with collagen, reaching a maximal stimulation after 120 minutes of collagen treatment [197]. When the phosphorylation status of STAT3 was explored, again, both isoforms (K and L) increase their phosphorylation content after exposure to collagen and calcium [197]. Then, there is an association between phospho-STAT1 and phospho-STAT3; these heterodimers are targeted to the nuclei and bind to SIE [197].

The Myb domain is a sequence-specific DNA binding domain that was originally identified in vertebrates. Myb is one of the largest transcription factor family in plants [198]. Myb proteins contain DNA-binding domains compose of one, two or three repeated motifs of approximately 50 amino acids surrounded by three conserved tryptophan residues [199]. A gene (Ehmyb) encoding for a 145 amino acids protein containing a Myb domain was identified in E. histolytica [200]. EhMyb protein belongs to the SHAQKY family by the presence of a single Myb DNA-binding domain (80 to 130 aa) as well as the THAKQF motif [200]. Overexpression of the EhMyb protein resulted in a transcriptome that overlapped significantly with the expression profile of amoebic cysts [158, 200]. The analysis of several promoters of genes regulated by EhMyb identified a CCCCCC motif to which nuclear proteins bind in a sequence-specific manner [200]. All results together strongly suggest that EhMyb is involved in the E. histolytica development.

4.5. Chromatin Structure

The chromatin of E. histolytica is packaged in nucleosome-like structures not unlike metazoan chromatin; however, linker regions between adjacent nucleosomes appear to be irregular in length as compared to the average 40 bp observed in metazoans [201]. E. histolytica genome encodes all the four proteins that comprise the core of histones [202204]. The amino-terminal domains of the histones although divergent from metazoan sequences, are highly basic with several lysine residues that are potential targets for acetylation by histone acetyltransferases (HATs) [205]. Additionally, E. histolytica has some members of the HATs family such as GNAT and MYST with significant similarity to GCN5, MYST, TafII250, Hat1 y Elp3 [205]. The EhGCN5 protein has an acetyltransferase domain (GNAT motif) and conserved residues involved in the interaction with the cofactor CoA as well as with the histone H3 [205]. The EhMYST protein contains two domains, at the amino-terminal it has an Agenet domain of unknown function and at the carboxy-proximal a MOZSAS domain that is a common domain of the HATs [205]. One histone deacetylase (HDAC) called EhHDAC was identified in this parasite [205]. This protein is member of the Class I family of HDACs, which are subunits o co-repressors that function in association with known repressors in response to different events [205]. Although this parasite contains genes for acetylation and deacetylation of histones, until now we do not known the mechanisms of acetylation and deacetylation in E. histolytica. However, we expect that these proteins perform a relevant activity implicated in the transcriptional control.

E. histolytica expresses a cytosine-5 DNA methyltransferase (Ehmeth), and 5-methylcytosine (m5C) was found predominantly in repetitive elements. The region of the gene encoding for the E. histolytica heat shock protein 100 (EHsp100) was isolated by affinity chromatography with 5-methylcytosine antibodies as ligand [206]. The expression of EHsp100 was induced by heat shock, 5-azacytidine (5-AzaC), an inhibitor of DNA methyltransferase and Trichostatin A (TSA), an inhibitor of histone deacetylase [206]. These data suggest that EHsp100 expression can be regulated, in addition to the initiation transcription level, by an epigenetic mechanism.

Lavi et al. [207] identified a 32 kDa nuclear protein (EhMLBP) that binds to the methylated form of a DNA segment encoding a reverse transcriptase of an autonomous non-long-terminal repeat retrotransposon (RT LINE). Deletion mapping analysis localized the DNA binding region at the C-terminal part of the protein. This region is sufficient to assure the binding to methylated RT LINE with high affinity [207]. By an affinity-based technique using the C-terminal of EhMLBP as ligand were isolated DNA sequences containing a 29-nucleotide consensus motif that includes a stretch of ten adenines [208]. Gel retardation analysis showed that EhMLBP binds to the consensus motif with a preference for its methylated form [208]. These results suggest that EhMLBP may serve as a sensor of methylated DNA.

Transcriptional silencing of the gene coding for the amoebapore a (Ehap-a) occurred following the transfection of trophozoites with a plasmid containing the promoter region of Ehap-a as well as a truncated segment of a neighboring, upstream SINE1 element that is transcribed from the opposite strand [209]. Small amounts of short (approximately 140 nt) ssRNA molecules with homology to SINE1 were detected in the silenced amoeba but no siRNA. ChIP assays using an antibody against methylated K4 of histone H3 showed a demethylation of K4 at the domain of the Ehap-a gene, indicating transcriptional inactivation [209]. Transfections of E. histolytica trophozoites which already had a silenced Ehap-a, with a plasmid containing a second gene ligated to the upstream region of Ehap-a, enabled the silencing, in-trans, of other genes of choice [210]. The nonvirulent phenotype of the gene-silenced amoeba was demonstrated in various assays and the results suggest that they may have a potential use for vaccination [211].

4.6. Post-Transcriptional Regulation

In E. histolytica, the 5’ untranslated region ( UTR) of the mRNAs are typically short in length (from 5 to 20 nucleotides) [165]. Only few mRNAs with extended UTR have been reported: Ehmcm3 (126 bp), Ehpak (265 bp), EhTBP (420 bp), and Ehcp112 (280 bp) [174, 190, 212].

Computational analysis of the UTR of a large EST and genomic sequences collection from E. histolytica revealed the presence of conserved elements like an AU-rich domain corresponding to the consensus UA(A/U)UU polyadenylation signal that could be involved in pre-mRNA polyadenylation [213]. Interestingly, the molecular organization of -UTR cis-regulatory elements of the pre-mRNA appears to be roughly conserved through evolutionary scale, whereas the polyadenylation signal seems to be species-specific in protozoan parasites and a novel A-rich element is unique for the primitive eukaryote E. histolytica [213].

Measure of the mRNA half-life of EhPgp5 in trophozoites of clone C2 grown at different concentrations of emetine showed that the stability of this mRNA is increased at high concentrations of the drug [214]. In trophozoites grown in the absence of the drug, the experimental half-life of the EhPgp5 transcript was estimated in 2.1 hours, whereas in trophozoites grown in 90 (C290) and 225  M (C2225) of emetine the half-life of the transcript was 3.1 hours, and 7.8 hours, respectively, confirming significant variations in the decay rates of the EhPgp5 mRNA in the three conditions [214]. In addition, the EhPgp5 mRNA contains a longer poly(A) tail in clone C2225 [214], and it is well known that large poly(A) tails give higher stability to mRNA and promote a more efficient translation. These results indicate that the higher levels of EhPGP5 protein in multidrug-resistant trophozoites (clone C2) could be influenced, in addition to transcriptional activation, by an increased mRNA stability.

5. Expression Regulation in Trichomonas vaginalis

Trichomonosis is a common but overlooked sexually transmitted human infection caused by Trichomonas vaginalis, a flagellated protist that resides in the urogenital tract of both sexes and can cause vaginitis in women and urethritis and prostatitis in men. The impact of this parasite in women is not only limited to vaginitis, but is also a major factor in promoting transmission of VIH, in causing low-weight and premature birth, and in predisposing women to atypical pelvis inflammatory disease, cervical cancer and infertility [215]. T. vaginalis causes an estimated 174 million sexually transmitted infections annually worldwide [216]. Among women, the prevalence of trichomonosis is estimated to have a range of 3%–48%. However, T. vaginalis infection is seldom diagnosed in men, primarily because of insensitive diagnostic test [217].

5.1. Genome

The T. vaginalis genome is estimated to be 160 Mb and about two thirds of the genome contains several repeats and transposable elements [218]. A core set of approximately 60,000 protein-coding genes was predicted, which are organized into six chromosomes [218, 219]. The analysis of the age distributions of gene families with five or fewer members indicate that the genome underwent a period of increased duplication, and possibly one or more large-scale genome duplication events [218].

Although T. vaginalis has an unusually large repertoire of genes, only 65 of them appear to have an intron [218, 220]. The positions of these introns are often conserved in orthologous genes, indicating they were present in a common ancestor of trichomonads, yeast, and metazoan. Introns that have been identified in T. vaginalis are uniformly short and characterized by a conserved 12-nt sequence ( -ACTAACACACAG- ) at the splice-site that includes the branch point (underlined) [220]. Recently were identified all five T. vaginalis spliceosomal snRNAs U1, U2, U4, U5 and U6 snRNAs [221]. Approximately 250 rDNA units were identified to one of the six T. vaginalis chromosomes [218].

5.2. Transcriptome

In order to identify the genes that are up-regulated during the interaction with target cells, a subtraction cDNA library enriched for differentially expressed genes from the parasites that were in contact with vaginal epithelial cells was obtained [222]. This strategy showed that genes encoding for the adhesins AP65 and AP33, -actinin, enolase, a putative PDI gene, a phosphoglucomutase, and a conserved GTP-binding protein (GTP-BP) were up-regulated in parasites that were in contact with target cells [222]. Genes involved in transcription and protein translation in addition to six genes with unknown functions were also upregulated [222].

The phylogenetic analyses based on the rRNA and class II fumerase gene sequences have shown that Trichomonas species formed a closely related clade, including isolates of T. gallinae, T. tenax, and T. vaginalis [223]. To identify uniquely-expressed genes of T. vaginalis that may represent determinants that contribute to urogenital virulence and pathogenesis, genes differentially expressed in T. vaginalis with respect to T. tenax, usually regarded as a harmless commensal of the human oral cavity, were identified by: (i) the screening of three independent subtraction cDNA libraries enriched for T. vaginalis genes; and (ii) the screening of a T. vaginalis cDNA expression library with patient sera that were first pre-adsorbed with an extract of T. tenax antigens [224]. Noteworthy, clones identified by both procedures were found to be up-regulated in expression in T. vaginalis upon contact with vaginal epithelial cells [222], suggesting a role for these gene products in host colonization. Semi-quantitative RT-PCR analysis of select clones showed that the genes were not unique to T. vaginalis and that these genes were also present in T. tenax, albeit at very low levels of expression [224]. Of the transcripts whose relative abundance was found to vary significantly, the AP65, GAPDH, and hypothetical protein 2 are secreted or released during growth of T. vaginalis [225]. These results suggest that T. vaginalis and T. tenax have remarkable genetic identity and that T. vaginalis has higher levels of gene expression when compared to that of T. tenax. The data may suggest that T. tenax could be a variant of T. vaginalis [224].

5.3. General Transcription

Most of the studies regarding gene expression in T. vaginalis have been focused on the promoter region of protein-coding genes. The protein-coding genes in T. vaginalis are transcribed by a RNA polymerase II resistant to -amanitin [226]. A metazoan-like TATA element appears to be absent in trichomonad promoters [227]. However, an initiator (Inr) sequence has been identified as the only known core promoter element in this organism. This element is architecturally and functionally equivalent to its metazoan counterpart [227, 228]. In addition, the Inr promoter element was found in 75% of untranslated region (UTR) sequences of the protein-coding genes, supporting its central role in gene expression [218]. The finding of the Inr sequence in this early branched eukaryote strongly indicate that this promoter element evolved early in eukaryotic evolution, and it is likely that the trichomonad transcription machinery is highly optimized for Inr function.

5.4. Cis-Regulatory Elements and Transcription Factors

The Inr element is located within 20 nucleotides upstream of the start codon and may be as close as 6 nucleotides. This motif, with the consensus sequence TCA 1Py(T/A), surrounds the transcription start site of all genes studied [228]. An Inr-binding protein, a novel 39 kDa polypeptide (IBP39), from T. vaginalis was isolated by DNA affinity chromatography [229]. IBP39 shows no sequence similarity to any known protein and consists of two domains that are connected by a proteolytically sensitive linker; the N-terminal domain that is responsible for Inr binding (IBD) and the C-terminal domain, which binds the RNA pol II large subunit C-terminal domain (CTD) [230, 231]. The search of sequences similar to IBD revealed a family of at least 100 proteins in T. vaginalis containing an IBD motif with comparable architecture to IBP39, suggesting that IBD defines a lineage-specific DNA-binding domain that is utilized by specific transcription factors in this organism [196]. Sequence divergence in the recognition helix as well as the N-terminal positively charged loop across the IBD family suggest that different versions of the domain have potentially specialized to contact a range of target sites, other than Inr [196].

In T. vaginalis, iron is an essential nutrient for growth, metabolism and as determinant in modulating expression of multiple virulence phenotypes, such as cytoadherence, phenotypic variation and resistance to complement lysis [232235]. However, the iron concentration in the human vagina is constantly changing throughout the menstrual cycle. Thus, T. vaginalis may respond to varying iron supply by means of differential gene expression mechanisms in order to survive, grow and colonize the vaginal hostile environment. Studies on the role of iron in the expression the ap65-1 gene, which encodes a 65 kDa malic enzyme that is involved in cytoadherence, demonstrated that this element plays a crucial role in transcription. Transcription of ap65-1 is critically regulated by the coordination of two similar but opposite oriented DNA regulatory regions, MRE-1/MRE-2r and MRE-2f, both of which are binding sites for multiple Myb-like proteins [236]. Myb1 protein exhibited variations in nuclear concentration with changes in the iron supply [237]. Overexpression of Myb1 in T. vaginalis resulted in repression or activation of ap65-1 transcription in iron-depleted cells at an early and a late stage of cell growth, respectively, while iron-inducible ap65-1 transcription was constitutively repressed. Myb1 protein was found to constantly occupy the chromosomal ap65-1 promoter at a proximal site, but it also selected two more distal sites only at the late growth stage [237]. Lou et al. [238] recently defined the minimal DNA-binding domain of Myb1, which consist in the sequence from Lys35 to Ser141 (tvMyb 35-141). Another Myb protein (Myb2) preferentially binds to MRE-2f than to MRE-2r [239]. The presence of iron caused the repression of the myb2 gene, and the temporal activation/deactivation of Myb2 promoter entry, which was also activated by prolonged iron depletion [239]. The over expression of Myb2 in T. vaginalis during iron-depleted conditions facilitated basal and growth-related ap65-1 transcription at similar level to that observed in iron-replete cells, whereas iron-inducible ap65-1 transcription was abolished with knockdown of Myb2 [239]. In addition, another iron-inducible nuclear protein (Myb3) that binds only to the MRE-1 element was recently identified [240]. Changes in the iron supply resulted in temporal and alternate entries of Myb2 and Myb3 into the ap65-1 promoter [240]. The over expression of Myb3 activates the basal and iron-inducible ap65-1 transcription, and in agreement, the inhibition of Myb3 expression results in a decrease of ap65-1 transcription [240]. In trophozoites overexpressing Myb3, an increased promoter entry of this protein was detected with concomitant decrease in Myb2 promoter entry under specific conditions, while Myb3 promoter entry was inhibited under all test conditions in cells overexpressing Myb2. In contrast, concomitant promoter entries by Myb2 and Myb3 diminished in cells overexpressing Myb1, except that Myb3 promoter entry was slightly affected under prolonged iron depletion [240]. All these results suggest that Myb2 and Myb3 may coactivate basal and iron-inducible ap65-1 transcription against Myb1 through conditional and competitive promoter entries.

The existence of a large number of core histone genes in T. vaginalis genome was used to the identification of common nucleotide elements which may be involved in their transcription. The search of over represented nucleotide sequence elements in the upstream sequence of T. vaginalis core histone genes revealed that these regions had three over represented motifs characterized with a string of conserved nucleotides: Motif I (TCAYWAKTT), Motif II (TTTTGGCGSS), and Motif III (TGHCAWWWWRRRYY) [241]. Four T. vaginalis core histone gene families (H2A, H2B, H3 and H4) have comparable motif architecture regardless of whether or not they are organized as gene pairs [241]. The 9 bp length Motif I is 3–7 bp upstream from the ATG initiation codon and a 5 bp length sub-motif (TCAYW) is highly similar to the Inr (TC+ 1AYW) [241]. Motif II predominately locates at the regions of 20 to 40 bp and is about 15 bp upstream from Motif I. Motif III is located at 40 to 80 upstream. Notably, the direction of Motif III related to gene transcription in H2A/H2B is opposite to that in H3/H4, indicating that Motif III may function in a direction-independent fashion [241]. These motifs are apparently enriched in the promoter region of several T. vaginalis genes and the positions of Motif II and Motif III related to translation start codon are similar with that in the promoter regions of the core histone genes, suggesting that the identified motifs are biologically meaningful transcriptional regulatory elements [241].

5.5. Chromatin Structure

T. vaginalis genome contains a large number of histone genes, and most of them organize as gene pairs in a head-to-head manner [241, 242]. It has a total of 74 functional core histones, including 11 H2A/H2B gene pairs, 6 solitary H2A, 3 solitary H2B, 19 H3/H4 gene pairs, 2 solitary H3 and 3 solitary H4 [241]. Comparison of the amino acid sequences of the T. vaginalis H3 and H4 histones with sequences from other organisms revealed a significant divergence not only from the sequences in multicellular organisms but also from the sequences in other protists [242].

T. vaginalis genome has also an expansion of genes encode both, HATs and HDAC deacetylases [196], but their role in gene expression in this parasite remain unknown.

5.6. Post-Transcriptional Regulation

The iron-responsive promoter elements that control the transcription initiation of ap-65-1 have not been found until now in other genes coding iron-regulated proteins, suggesting that regulatory iron-mediated mechanisms at post-transcriptional level may exist in T. vaginalis.

In higher eukaryotes, the IRE/IRP system is a post-transcriptional mechanism of iron regulation based on the binding of cytoplasmic iron regulatory proteins (IRPs) to iron-responsive elements (IREs) situated into the untranslated regions of mRNAs of some iron-regulated proteins. In low-iron conditions IRPs bind to IREs blocking the translation of mRNAs containing IREs in their -ends or increasing the stability of mRNAs that has IREs at their -ends [243]. TvCP4 is a cysteine protease of T. vaginalis, which amount was found 3-fold increased in iron-rich than in iron-depleted parasites. However, the Tvcp4 transcript was expressed at similar level in both conditions, suggesting that the iron-regulated expression of TvCP4 is carried out at a posttranscriptional stage [244]. The search of IRE structures in the mRNA of Tvcp4 revealed that the first 23-nt downstream of the start codon form an IRE-like stable stem-loop structure. The recombinant IRP-1 of human specifically binds to the IRE-like structure of the Tvcp4 mRNA, and cytoplasmic extracts from T. vaginalis form RNA-protein complexes with the IRE sequence of the mRNA of human ferritin [244], suggesting that this parasite posses an IRE/IRP system that control the expression of some iron-regulated proteins.

6. Concluding Remarks

Development of an organism depends upon finely-tuned and accurate control of gene expression. Recent studies have identified various biological processes involved in the regulation of gene expression in protozoa parasites. These processes required several components, such as cis-regulatory elements, transcription factors, transcription cofactors, chromatin modification proteins, and proteins involved in post-transcriptional regulation. The list of these components will continue to grow, as future studies identify additional examples of direct communications between regulatory proteins, and reveal how gene networks are regulated coordinately through these interactions. Further analyses of these regulatory mechanisms in the protozoa parasites should continue to provide broadly applicable information about their conserved functions in vivo, and new insights into the specific biological processes in which they are involved. In addition, assembled genome sequences are now available for different parasites, inclusively for some host and vectors that are part of their life cycles. These assemblies provide a powerful tool for the comparative analysis of gene regulation networks. As ever, new knowledge raises new questions. But these questions await their orderly resolution.


This work was supported by CONACyT (Mexico), ICyTDF (Mexico), and SIP-IPN 20090315 (Mexico).