About this Journal Submit a Manuscript Table of Contents
Journal of Nucleic Acids
Volume 2010 (2010), Article ID 564946, 6 pages
http://dx.doi.org/10.4061/2010/564946
Review Article

A Toolbox for Predicting G-Quadruplex Formation and Stability

1Cavendish Laboratory, University of Cambridge, JJ Thomson Avenue, Cambridge CB3 0HE, UK
2Max-Planck-Institutes Tübingen, Spemanstraßse 38, 72076 Tübingen, Germany
3Thaze Ltd, 121 Mowbray Road, Cambridge CB1 7SP, UK

Received 1 February 2010; Accepted 24 March 2010

Academic Editor: Jean Louis Mergny

Copyright © 2010 Han Min Wong et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

G-quadruplexes are four stranded nucleic acid structures formed around a core of guanines, arranged in squares with mutual hydrogen bonding. Many of these structures are highly thermally stable, especially in the presence of monovalent cations, such as those found under physiological conditions. Understanding of their physiological roles is expanding rapidly, and they have been implicated in regulating gene transcription and translation among other functions. We have built a community-focused website to act as a repository for the information that is now being developed. At its core, this site has a detailed database (QuadDB) of predicted G-quadruplexes in the human and other genomes, together with the predictive algorithm used to identify them. We also provide a QuadPredict server, which predicts thermal stability and acts as a repository for experimental data from all researchers. There are also a number of other data sources with computational predictions. We anticipate that the wide availability of this information will be of use both to researchers already active in this exciting field and to those who wish to investigate a particular gene hypothesis.

1. Introduction

It was observed in 1910 [1] that a sufficiently high concentration of guanosine could form a gel, unlike the other nucleobases, and in 1962 [2] it was discovered that four guanosine can self-assemble to form a hydrogen-bonded square, with bonds between the and positions. This structure is known as a G-tetrad or G-quartet. Like any nucleobase, there is also a strong propensity for these structures to stack on each other via - interactions, forming four-stranded helices called G-quadruplexes, with the phosphate backbone perpendicular to the plane of the G-quartets. The four strands may be from separate molecules, or they may be from only 2 or 1, with loops joining them together [37].

They form with great thermal stability, [8] and have been found experimentally to form from genomic sequences in critical regions such as telomeres, gene promoters and UTRs, [9, 10] and to have physiological effects in each of these regions. In telomeres, their formation reduces the activity of telomerase, the upregulation of which has been associated with 85% of cancers, and has led to much pharmaceutical interest [11]. G-quadruplexes in gene promoters, such as the oncogenes c-myc and c-kit, [12, 13] have been shown to control transcriptional activity in vitro, although interestingly their formation can lead to the increase or decrease of activity in different systems. It has been shown that G-quadruplex formation in the UTR can decrease translational activity, [14] and there have been suggestions of other physiological effects. A wide variety of proteins have been found to interact specifically with them, [15] and they have been shown experimentally to form in vivo [1618].

G-quadruplexes have also been employed as biosensors (e.g., for thrombin[19]) and in other nanotechnological applications (e.g., [20, 21]). Some of these uses are reviewed in [5].

In parallel with the experimental work being developed, computational techniques have also been developed to predict which sequences will form G-quadruplexes [2224]. There are a variety of different algorithmic rules that can be used to predict which sequences can form G-quadruplexes, [25, 26] although some are more widely used and accepted. There is not sufficient evidence for any of them to be held as absolutely true, and it is only recently that any work has been done to try to predict relative stabilities of possible G-quadruplex structures, rather than just whether they could form or not.

Despite this limitation, computational methods have led to a number of discoveries, including the observations that G-quadruplexes are relatively rare in the human genome, but more prevalent than expected in gene promoters [27]. Some of the computational discoveries have been recently reviewed [25, 26].

The field as a whole has grown very significantly in recent times, with a roughly exponential rise in publications (see Figure 1), including over 350 in 2009. A dedicated book has been produced, [28] together with special issues of some journals focused on this topic, and some databases on particular aspects of G-quadruplexes. A few G-quadruplex based drugs have also entered clinical trials. A series of International Conferences has been initiated, the first two hosted in Louisville, KY [29, 30]. At the first of these, it was suggested that a central and coherent website to store and provide data related to G-quadruplexes should be produced, and we volunteered to provide such a repository [29], hosted at the URL http://www.quadruplex.org/.

564946.fig.001
Figure 1: Annual publication rate in the G-quadruplex field in the past decade. Data obtained from the Web of Knowledge, searching for the term “G-quadruplex” or “G-tetraplex.” The solid line is an exponential fit to the data.

Here, we describe the features available at that website, and in particular the core databases to describe predicted G-quadruplexes, and a new tool to estimate the thermal stability of these structures computationally. We also describe the other online sources of predictive data for G-quadruplexes, so that researchers may chose the most appropriate tool for their work.

2. QuadDB—A Database of G-Quadruplex Predictions

The core quadruplex database (QuadDB, http://www.quadruplex.org/?view=quadbase) provides both static and searchable data for researchers on computationally predicted G-quadruplexes (Putative Quadruplex Sequences, PQS). These have been generated as previously described [22], using our favoured predictive algorithm, which identifies sequences on either strand of the form ( ). This has been shown experimentally to be a good predictor of in vitro G-quadruplex formation [31]. It aims to identify specific G-quadruplexes that may form, providing a testable in vitro hypothesis that can be tested using simple biophysical methods.

2.1. Quadparser

For any researcher interested in identifying PQS in specific sequences, we provide the quadparser program pre-compiled for MS Windows and Mac OS X with detailed instructions. The program is customisable, so that different patterns can be searched for. Different loop length constraints, G-tract lengths and so forth may all be set, so that the algorithm can be adjusted to fit with the particular context desired. Quadparser has a variety of output styles for different uses, and reads sequence data in FASTA format.

2.2. Data Search

The Data search section allows a researcher to identify any PQS in gene promoters (defined as the 1 kb upstream of the TSS) or UTRs for their gene of interest. The genes may be identified by ensembl ID, HGNC code or description. The output provides full details of the gene, including genomic parameters, and the location and sequence of PQS in the appropriate regions of every transcript of the gene. Links are also provided to Ensembl so the PQS may be seen in context. Figure 2 displays the output when searching the human genome for PQS in the promoter or UTRs of c-kit (HGNC nomenclature KIT). Currently, searches may be performed against the human, chimpanzee and mouse genomes.

564946.fig.002
Figure 2: Sample output from the data search component of Quadbase. The search query is shown at the bottom of the figure.
2.3. Data Download

As a convenient alternative to gene-by-gene searches or using the quadparser program, we also provide a downloadable listing of every PQS identified in various genomes. We currently offer this data for human (builds 34, 35 and 36 for back compatibility), chimpanzee (2.1), mouse (37), rat (3.4), dog (2), chicken (2), zebrafish (7), fruitfly (5.4), roundworm (180) and yeast (1.01) genomes. In each case the data provides a genomic coordinates for each PQS, together with the strand, sequence and a unique identifier. Data may be taken altogether or by chromosome.

3. Quadpredict—Predicting G-Quadruplex Stability

The thermal stability of G-quadruplexes varies with the concentration of monovalent cations, specifically and . However, even for fixed concentrations, the exact details of the sequence, and hence the structure formed, make a very large difference. G-quadruplexes can vary from those which are too unstable to form at C to those which will resist temperatures above C [31]. It is therefore necessary not just to predict which sequences can form G-quadruplexes at all, but also the stability with which such sequences can form. Such experiments are relatively easy to perform, and have led to a series of studies of different aspects of the relationship between sequence and stability [3134]. However, this does not enable prediction of unmeasured sequences, forcing researchers to make informed guesses as to the stability of novel sequences.

We recently developed [35] a Bayesian learning algorithm that is capable of making accurate predictions of thermal stability for new sequences, having been trained on a collection of measured sequences. Full details of the methodology and the parameters considered are available elsewhere [35]. We provide an interface to this system at http://www.quadruplex.org/?view=quadpredict, enabling researchers to make easy predictions of melting temperatures under various conditions for any desired sequence. Figure 3 gives an example of such predictions.

564946.fig.003
Figure 3: Sample predictions from QuadPredict, using Bayesian inference to calculate predicted melting temperatures together with predicted uncertainties.

One feature of the Bayesian inference we use is that in addition to predictions of the melting temperature, we also provide uncertainties in the values for each sequence. In general, the uncertainty increases for sequences that are highly unlike those in our training set. This therefore enables researchers to decide rationally how much faith to place in a particular prediction.

We intend to develop the training data further, and have already employed a rational active learning protocol to collect more data and reduce the uncertainties below that originally presented. We will continue to do this, and also provide an opportunity for researchers to contribute their own data, so that the Bayesian inference can be increasingly accurate. We hope that depositing data publicly may become a standard requirement for publication of G-quadruplex thermal data.

We allow researchers to discover whether particular sequences they are interested in are already in our database of measurements, with information about exactly how such an experiment was performed. We hope that these facilities will prove useful to all those working in this field. As well as those interested in biological aspects of G-quadruplexes, we feel this facility may be particularly helpful for those working in nanotechnology or materials science, providing them with a method of rationally selecting G-quadruplex-forming oligonucleotides.

4. Other G-Quadruplex Computational Tools

There are a number of other tools that may be used to predict the existence of G-quadruplexes in DNA, and links to these are provided from http://www.quadruplex.org/. Bagga and coworkers use a similar algorithm to quadparser called QGRS mapper [36]. It has different default parameters, in particular looking at sequences with fewer consecutive guanines and longer loops, but essentially looks for much the same sequences. Interestingly, it includes a scoring parameter for different possible G-quadruplexes that can be formed. Although this is loosely based on empiric evidence, it is not clear how the “G-score” produced, which ranges up to a maximum of 105, relates to stability. To the best of our knowledge, no empiric tests have been performed testing the validity of the G-score even as a ranking list, but it is still a useful formulation of established rules of thumb. As well as the QGRS mapper, which also provides the facility to search by genes, they also provide specialised databases, GRSDB2 and GRS_UTRdb, [37] for searching pre-mRNAs and UTR sequences.

Maiti and coworkers offer a site called Quadfinder, [38] which implements essentially the same algorithm as quadparser. (At the time of writing it does not appear to be functioning.) At the same institute, Chowdhury and coworkers have a site called QuadBase, [39] again using essentially the same algorithm. They focus on cross-species analysis, offering an ortholog analysis for finding conserved G-quadruplexes, across either prokaryotes (ProQuad) (and see [40]) or eukaryotes (EuQuad). It should be noted that the conservation required is by presence, and no sequence comparison is performed.

Lastly in this category is the Greglist database of potential G-quadruplex regulated genes, which lists all human genes that have a G-quadruplex in the 1 kb region upstream of the transcription start site. The quadparser algorithm is used to predict these sequences [41].

A completely different approach to G-quadruplex prediction is taken by the Maizels lab [42, 43]. Whereas other methods aim to predict specific G-quadruplex sequences, largely driven by the desire of structural biologists to have structures to study, and by the desire of medicinal chemists to have a defined form to target, the G4 calculator from Eddy and Maizels accepts that many of these structures are highly polymorphic in vivo. As a result, they do not aim to predict individual structures but look at the density of sequences likely to lead to G-quadruplex structures. Given that this is an entirely orthogonal approach, it is striking that in many cases, particularly working on the gene functions that are likely to be regulated by G-quadruplexes, very similar conclusions arise from using this approach as the quadparser model. We strongly recommend that for any large-scale genomic studies, both approaches are used to corroborate the results found.

5. Conclusions

Computational methods have been of great use in understanding the role that G-quadruplexes may play in biology, unveiling their function in gene promoters [27, 42] and in regulating translation [44]. They have also revealed that stable G-quadruplexes are generally located in nucleosome-free regions [45]. Stability predictions have been used to develop experimental methods to directly visualise G-quadruplexes using AFM [46]. We anticipate that greater availability of ever more reliable tools will both improve the quality of informatic research in this area and make it increasingly easy for experimentalists to access computational results.

Acknowledgment

The fourth author is a Research Councils UK Academic Fellow. This work was supported by the Isaac Newton Trust.

References

  1. I. Bang, “Untersuchungen über die Guanylsäre,” Biochemisce Zeitschrift, vol. 26, pp. 293–311, 1910.
  2. M. Gellert, M. N. Lipsett, and D. R. Davies, “Helix formation by guanylic acid,” Proceedings of the National Academy of Sciences of the United States of America, vol. 48, pp. 2013–2018, 1962. View at Scopus
  3. T. Simonsson, “G-quadruplex DNA structures—variations on a theme,” Biological Chemistry, vol. 382, no. 4, pp. 621–628, 2001. View at Publisher · View at Google Scholar · View at Scopus
  4. D. Sen and W. Gilbert, “Guanine quartet structures,” Methods in Enzymology, vol. 211, pp. 191–199, 1992. View at Publisher · View at Google Scholar · View at Scopus
  5. J. T. Davis, “G-Quartets 40 Years Later: from 5-GMP to molecular biology and supramolecular chemistry,” Angewandte Chemie—International Edition, vol. 43, no. 6, pp. 668–698, 2004. View at Publisher · View at Google Scholar · View at Scopus
  6. J. L. Huppert, “Four-stranded nucleic acids: structure, function and targeting of G-quadruplexes,” Chemical Society Reviews, vol. 37, no. 7, pp. 1375–1384, 2008. View at Publisher · View at Google Scholar · View at Scopus
  7. S. Burge, G. N. Parkinson, P. Hazel, A. K. Todd, and S. Neidle, “Quadruplex DNA: sequence, topology and structure,” Nucleic Acids Research, vol. 34, no. 19, pp. 5402–5415, 2006. View at Publisher · View at Google Scholar · View at Scopus
  8. J.-L. Mergny, A.-T. Phan, and L. Lacroix, “Following G-quartet formation by UV-spectroscopy,” FEBS Letters, vol. 435, no. 1, pp. 74–78, 1998. View at Publisher · View at Google Scholar · View at Scopus
  9. D. J. Patel, A. T. Phan, and V. Kuryavyi, “Human telomere, oncogenic promoter and 5-UTR G-quadruplexes: diverse higher order DNA and RNA targets for cancer therapeutics,” Nucleic Acids Research, vol. 35, no. 22, pp. 7429–7455, 2007. View at Publisher · View at Google Scholar · View at Scopus
  10. J. L. Huppert, “Four-stranded DNA: cancer, gene regulation and drug development,” Philosophical transactions. Series A, vol. 365, no. 1861, pp. 2969–2984, 2007. View at Publisher · View at Google Scholar · View at Scopus
  11. L. Oganesian and T. M. Bryan, “Physiological relevance of telomeric G-quadruplex formation: a potential drug target,” BioEssays, vol. 29, no. 2, pp. 155–165, 2007. View at Publisher · View at Google Scholar · View at Scopus
  12. S. Rankin, A. P. Reszka, J. Huppert, et al., “Putative DNA quadruplex formation within the human c-kit oncogene,” Journal of the American Chemical Society, vol. 127, no. 30, pp. 10584–10589, 2005. View at Publisher · View at Google Scholar · View at Scopus
  13. H. Fernando, A. P. Reszka, J. Huppert, et al., “A conserved quadruplex motif located in a transcription activation site of the human c-kit oncogene,” Biochemistry, vol. 45, no. 25, pp. 7854–7860, 2006. View at Publisher · View at Google Scholar · View at Scopus
  14. S. Kumari, A. Bugaut, J. L. Huppert, and S. Balasubramanian, “An RNA G-quadruplex in the 5 UTR of the NRAS proto-oncogene modulates translation,” Nature Chemical Biology, vol. 3, no. 4, pp. 218–221, 2007. View at Publisher · View at Google Scholar · View at Scopus
  15. M. Fry, “Tetraplex DNA and its interacting proteins,” Frontiers in Bioscience, vol. 12, pp. 4336–4351, 2007. View at Publisher · View at Google Scholar · View at Scopus
  16. C. Schaffitzel, I. Berger, J. Postberg, J. Hanes, H. J. Lipps, and A. Plückthun, “In vitro generated antibodies specific for telomeric guanine-quadruplex DNA react with Stylonychia lemnae macronuclei,” Proceedings of the National Academy of Sciences of the United States of America, vol. 98, no. 15, pp. 8572–8577, 2001. View at Publisher · View at Google Scholar · View at Scopus
  17. K. Paeschke, T. Simonsson, J. Postberg, D. Rhodes, and H. J. Lipps, “Telomere end-binding proteins control the formation of G-quadruplex DNA structures in vivo,” Nature Structural and Molecular Biology, vol. 12, no. 10, pp. 847–854, 2005. View at Publisher · View at Google Scholar · View at Scopus
  18. M. L. Duquette, P. Handa, J. A. Vincent, A. F. Taylor, and N. Maizels, “Intracellular transcription of G-rich DNAs induces formation of G-loops, novel structures containing G4 DNA,” Genes and Development, vol. 18, no. 13, pp. 1618–1629, 2004. View at Publisher · View at Google Scholar · View at Scopus
  19. K. Y. Wang, S. McCurdy, R. G. Shea, S. Swaminathan, and P. H. Bolton, “A DNA aptamer which binds to and inhibits thrombin exhibits a new structural motif for DNA,” Biochemistry, vol. 32, no. 8, pp. 1899–1904, 1993. View at Scopus
  20. T. C. Marsh, J. Vesenka, and E. Henderson, “A new DNA nanostructure, the G-wire, imaged by scanning probe microscopy,” Nucleic Acids Research, vol. 23, no. 4, pp. 696–700, 1995. View at Scopus
  21. M. Inoue, D. Miyoshi, and N. Sugimoto, “Development of molecular logic gates using the structural switch of telomere DNAs,” Nucleic Acids Symposium Series, no. 50, pp. 315–316, 2006. View at Scopus
  22. J. L. Huppert and S. Balasubramanian, “Prevalence of quadruplexes in the human genome,” Nucleic Acids Research, vol. 33, no. 9, pp. 2908–2916, 2005. View at Publisher · View at Google Scholar · View at Scopus
  23. A. K. Todd, M. Johnston, and S. Neidle, “Highly prevalent putative quadruplex sequence motifs in human DNA,” Nucleic Acids Research, vol. 33, no. 9, pp. 2901–2907, 2005. View at Publisher · View at Google Scholar · View at Scopus
  24. L. D'Antonio and P. Bagga, “Computational methods for predicting intramolecular G-quadruplexes in nucleotide sequences,” in Proceedings of the IEEE Computational Systems Bioinformatics Conference (CSB '04), pp. 590–591, 2004. View at Publisher · View at Google Scholar · View at Scopus
  25. J. L. Huppert, “Hunting G-quadruplexes,” Biochimie, vol. 90, no. 8, pp. 1140–1148, 2008. View at Publisher · View at Google Scholar · View at Scopus
  26. A. K. Todd, “Bioinformatics approaches to quadruplex sequence location,” Methods, vol. 43, no. 4, pp. 246–251, 2007. View at Publisher · View at Google Scholar · View at Scopus
  27. J. L. Huppert and S. Balasubramanian, “G-quadruplexes in promoters throughout the human genome,” Nucleic Acids Research, vol. 35, no. 2, pp. 406–413, 2007. View at Publisher · View at Google Scholar · View at Scopus
  28. S. Neidle and S. Balasubramanian, Quadruplex Nucleic Acids, RSC Publishing, Cambridge, UK, 2006.
  29. P. Bates, J.-L. Mergny, and D. Yang, “Quartets in G-major. The first international meeting on quadruplex DNA,” EMBO Reports, vol. 8, no. 11, pp. 1003–1010, 2007. View at Publisher · View at Google Scholar · View at Scopus
  30. W. D. Wilson and H. Sugiyama, “First international meeting on quadruplex DNA,” ACS Chemical Biology, vol. 2, no. 9, pp. 589–594, 2007. View at Publisher · View at Google Scholar · View at Scopus
  31. P. Hazel, J. Huppert, S. Balasubramanian, and S. Neidle, “Loop-length-dependent folding of G-quadruplexes,” Journal of the American Chemical Society, vol. 126, no. 50, pp. 16405–16415, 2004. View at Publisher · View at Google Scholar · View at Scopus
  32. A. Risitano and K. R. Fox, “Influence of loop size on the stability of intramolecular DNA quadruplexes,” Nucleic Acids Research, vol. 32, no. 8, pp. 2598–2606, 2004. View at Publisher · View at Google Scholar · View at Scopus
  33. P. A. Rachwal, T. Brown, and K. R. Fox, “Sequence effects of single base loops in intramolecular quadruplex DNA,” FEBS Letters, vol. 581, no. 8, pp. 1657–1660, 2007. View at Publisher · View at Google Scholar · View at Scopus
  34. J. Gros, F. Rosu, S. Amrane, et al., “Guanines are a quartet's best friend: impact of base substitutions on the kinetics and stability of tetramolecular quadruplexes,” Nucleic Acids Research, vol. 35, no. 9, pp. 3064–3075, 2007. View at Publisher · View at Google Scholar · View at Scopus
  35. O. Stegle, L. Payet, J.-L. Mergny, D. J. C. MacKay, and J. L. Huppert, “Predicting and understanding the stability of G-quadruplexes,” Bioinformatics, vol. 25, no. 12, pp. 374–382, 2009. View at Publisher · View at Google Scholar · View at Scopus
  36. O. Kikin, L. D'Antonio, and P. S. Bagga, “QGRS mapper: a web-based server for predicting G-quadruplexes in nucleotide sequences,” Nucleic Acids Research, vol. 34, pp. W676–W682, 2006. View at Publisher · View at Google Scholar · View at Scopus
  37. O. Kikin, Z. Zappala, L. D'Antonio, and P. S. Bagga, “GRSDB2 and GRS_UTRdb: databases of quadruplex forming G-rich sequences in pre-mRNAs and mRNAs,” Nucleic Acids Research, vol. 36, pp. D141–D148, 2008. View at Publisher · View at Google Scholar · View at Scopus
  38. V. Scaria, M. Hariharan, A. Arora, and S. Maiti, “Quadfinder: server for identification and analysis of quadruplex-forming motifs in nucleotide sequences,” Nucleic Acids Research, vol. 34, pp. 683–685, 2006. View at Publisher · View at Google Scholar · View at Scopus
  39. V. K. Yadav, J. K. Abraham, P. Mani, R. Kulshrestha, and S. Chowdhury, “QuadBase: genome-wide database of G4 DNA—occurrence and conservation in human, chimpanzee, mouse and rat promoters and 146 microbes,” Nucleic Acids Research, vol. 36, pp. 381–385, 2008. View at Publisher · View at Google Scholar · View at Scopus
  40. P. Rawal, V. B. R. Kummarasetti, J. Ravindran, et al., “Genome-wide prediction of G4 DNA as regulatory motifs: role in Escherichia coli global regulation,” Genome Research, vol. 16, no. 5, pp. 644–655, 2006. View at Publisher · View at Google Scholar · View at Scopus
  41. R. Zhang, Y. Lin, and C.-T. Zhang, “Greglist: a database listing potential G-quadruplex regulated genes,” Nucleic Acids Research, vol. 36, pp. 372–376, 2008. View at Publisher · View at Google Scholar · View at Scopus
  42. J. Eddy and N. Maizels, “Gene function correlates with potential for G4 DNA formation in the human genome,” Nucleic Acids Research, vol. 34, no. 14, pp. 3887–3896, 2006. View at Publisher · View at Google Scholar · View at Scopus
  43. J. Eddy and N. Maizels, “Conserved elements with potential to form polymorphic G-quadruplex structures in the first intron of human genes,” Nucleic Acids Research, vol. 36, no. 4, pp. 1321–1333, 2008. View at Publisher · View at Google Scholar · View at Scopus
  44. J. L. Huppert, A. Bugaut, S. Kumari, and S. Balasubramanian, “G-quadruplexes: the beginning and end of UTRs,” Nucleic Acids Research, vol. 36, no. 19, pp. 6260–6268, 2008. View at Publisher · View at Google Scholar · View at Scopus
  45. H. M. Wong and J. L. Huppert, “Stable G-quadruplexes are found outside nucleosome-bound regions,” Molecular BioSystems, vol. 5, no. 12, pp. 1713–1719, 2009. View at Publisher · View at Google Scholar · View at Scopus
  46. K. J. Neaves, J. L. Huppert, R. M. Henderson, and J. M. Edwardson, “Direct visualization of G-quadruplexes in DNA using atomic force microscopy,” Nucleic Acids Research, vol. 37, no. 18, pp. 6269–6275, 2009. View at Publisher · View at Google Scholar · View at Scopus