Table of Contents
ISRN Genomics
Volume 2013, Article ID 921418, 18 pages
Review Article

Critical Analysis of Strand-Biased Somatic Mutation Signatures in TP53 versus Ig Genes, in Genome-Wide Data and the Etiology of Cancer

1Melville Analytics Pty. Ltd., Unit 14, Nuggets Crossing, Jindabyne, NSW 2627, Australia
2School of Veterinary and Biomedical Sciences, Murdoch University, South Street, Murdoch, WA 6150, Australia

Received 4 September 2012; Accepted 23 October 2012

Academic Editors: S. N. Shchelkunov and A. Stubbs

Copyright © 2013 Robyn A. Lindley and Edward J. Steele. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Previous analyses of rearranged immunoglobulin (Ig) variable genes (VDJs) concluded that the mechanism of Ig somatic hypermutation (SHM) involves the Ig pre-mRNA acting as a copying template resulting in characteristic strand biased somatic mutation patterns at A:T and G:C base pairs. We have since analysed cancer genome data and found the same mutation strand-biases, in toto or in part, in nonlymphoid cancers. Here we have analysed somatic mutations in a single well-characterised gene TP53. Our goal is to understand the genesis of the strand-biased mutation patterns in TP53—and in genome-wide data—that may arise by “endogenous” mechanisms as opposed to adduct-generated DNA-targeted strand-biased mutations caused by well-characterised “external” carcinogenic influences in cigarette smoke, UV-light, and certain dietary components. The underlying strand-biased mutation signatures in TP53, for many non-lymphoid cancers, bear a striking resemblance to the Ig SHM pattern. A similar pattern can be found in genome-wide somatic mutations in cancer genomes that have also mutated TP53. The analysis implies a role for base-modified RNA template intermediates coupled to reverse transcription in the genesis of many cancers. Thus Ig SHM may be inappropriately activated in many non-lymphoid tissues via hormonal and/or inflammation-related processes leading to cancer.

1. Introduction

A major goal of this paper is to provide an explanation of the origin of the main strand-biased mutation signatures observed in the TP53 tumor suppressor gene in the many tumors likely to arise by “endogenous” mutation processes: that is to say, those cancers not caused by well-known exogenous mechanisms such as exposure to carcinogens in tobacco smoke (Benzo(a)pyrene, G-to-T), toxins in food contamination (Aflatoxin B1, G-to-T; Aristolochic acid, A-to-T), or UV radiation in sun exposure causing DNA photoproducts such as cyclobutane pyrimidine dimers, C-to-T reviewed in Soussi [1]. The TP53 mutation pattern in “All Breast Cancers” has been chosen as representative of the TP53 “endogenous pattern” as this mutation pattern appears to arise in a tissue “least accessible to carcinogens in tobacco smoke” or directly exposed to such exogenous carcinogens; see Hainaut and Pfeifer [2]. There is also a large number of TP53 point mutations in this tissue category (>1000), similar to the numbers in “All Lung Cancers,” the major comparator in the analysis. Further, this basic “endogenous” pattern is evident in many tumors outside of lung, head, neck, and oesophagus. All of these can be considered as “directly accessible to tobacco smoke carcinogens.” This choice is made despite the known complexity of breast cancer in both etiology and the diversity of histological subtypes [1] as a similar pattern is evident in mutated TP53 variants in “All Bladder Cancers” which are likely to be more directly exposed to tobacco smoke-derived carcinogenic metabolites in urine.

Our analysis shows that the underlying “endogenous” strand-biased mutation signatures in TP53, for many different non-lymphoid cancers, bear a striking resemblance to the Ig SHM pattern. This allows inferences to be drawn about the mechanistic role of TP53-mediated DNA repair regulation and base-modified RNA template intermediates coupled to reverse transcription in the genesis of many cancers. It is also consistent with the view that the normally tightly regulated mutation processes targeting VDJ genes in B lymphocytes may, following further loss of DNA damage response regulation by TP53, be inappropriately turned on in non-lymphoid tissues, for example, by hormonal and/or inflammation-related processes, leading to cancer.

2. Background

2.1. Caveat and Writing Strategy

The writing strategy of this paper is to provide as clear an introduction as possible concerning what is currently known of the immune system’s somatic mutation mechanism as we believe that this mechanism is relevant to understanding the role of somatic mutations in the pathology of cancer. This is an unexplored topic for most scientists, particularly in the field of cancer biology although it is very topical now given renewed interest in the regulation of inflammatory responses initiating both somatic mutations and thus cancer (see later). However there is a caveat to this analysis that should be highlighted right at the start: strand-biased mutation spectrums although very informative with strong inferential value with respect to molecular mechanisms provide little information about the initial events that precede malignant transformation as malignant cells grow rapidly and are exposed to strong selection. This means “first causes” cannot be precisely defined. Nevertheless the clear possibility that immunoglobulin somatic hypermutation may be one of these “first causes” promoting somatic mutation, both across the genome and in key gene regulators such as TP53, is worth pursuing in its own right as the implications have wide and interesting ramifications for the future directions of cancer research.

2.2. Somatic Hypermutation in Rearranged Ig Genes

The mechanism of SHM of Ig VDJ genes is now well understood and many molecular steps are known or can be plausibly inferred [3, 23, 24]. More recently this knowledge has been applied to the etiology of cancer. What we discovered in a preliminary analysis was that the characteristic strand-biased mutation signatures of Ig SHM were present, in toto or in part, in a number of somatic mutation datasets posted at the Welcome Trust Sanger Institute’s website run by the institute’s Cancer Genome Project [4]. Indeed the possibility has often been discussed that dysregulation of SHM driven by activation induced cytidine deaminase (AID) conversion of cytosine to uracil (C-to-U) in DNA normally confined to antigen-stimulated B lymphocytes in postantigenic Germinal Centers could lead to somatic mutations and translocations in non-Ig genes thus contributing to oncogenesis [21, 2527].

The aim here is to use these insights from the Ig SHM field to help explain the strand-biased mutation signatures in TP53 in human nonlymphoid tumors arising by as yet unknown “endogenous” mechanisms. We have analysed in detail strand-biased TP53 mutation signatures in breast, bladder, and lung cancers. These are exemplars of oncogenesis in tissues exposed directly to carcinogens in tobacco smoke (lung) or indirectly exposed to such carcinogenic metabolites (bladder via urine) versus cancers arising in tissues such as breast generally considered “as least accessible to tobacco smoke” and other known exogenous agents [1, 2, 28].

We have also analysed and compared strand-biased mutation signatures and mutation patterns in other cancerous tissues. The analytical reviews by Soussi [1] and Pfeifer and Besaratinia [28] have proved valuable and we recommend these papers and related literature be read in association with the present analysis.

2.3. Strand-Biased Mutation Signatures in Ig Genes

Strand-biased mutation signatures in Ig VDJ gene loci, particularly at A:T base pairs, have been recognised for over 25 years [29]. More recently, published data (1984–2008) on mutated mouse VDJ regions and their 3′ JH4-flanks have been analysed [3]. In this study, somatic mutation data from thirty-two independent studies are summarised and are available in a meta-analysis in the Supplementary Data. Little significant new data has been published in the SHM field since then that changes the basic patterns shown in Table 1 or their interpretation.

Table 1: Typical patterns of somatic point mutations observed at rearranged immunoglobulin variable region loci in mice.

The essence of this analysis established that the Ig SHM mutation pattern free of “PCR recombinant artefacts” reveals several significant strand-biased mutation signatures at A:T and G:C base pairs (Table 1 ). The first is at A:T base pairs where mutations of A exceed mutations of T by almost threefold, particularly the dominant strand bias of A-to-G exceeding T-to-C mutations. The second main strand bias is at G:C base pairs where mutations of G exceed mutations of C by at least 1.7-fold. The dominant strand bias here concerns G-to-A exceeding C-to-T mutations.

The distortion in the DNA sequence data contaminated with PCR-recombinant artefacts (Table 1 ) has previously masked the clear strand bias at G:C base pairs. This distortion has also made it difficult for the field to develop their understanding of the mutator mechanisms operative on A:T and G:C base pairs during Ig SHM in vivo [3, 30].

The significant presence of PCR recombinants also referred to as PCR hybrids or mosaic heteroduplexes at the end of a PCR run is due to Taq or Pfu polymerase denaturation generating incomplete extension products acting as forward and reverse primers during amplification cycles. They arise from PCR runs where amplification is from multiple similar templates using the same set of primers. For this reason, the inevitable presence of such sequence artefacts causes blunting if not complete ablation of strand-biased mutation signatures following cloning in E. coli and then sequencing of PCR inserts. Such problems are avoided by reducing PCR cycle numbers, by sequencing target VDJ genes expressed in hybridoma clones or direct sequencing of amplification products from single VDJ loci expressed in FACS sorted single B lymphocytes [3, 3032].

As a consequence of these analyses the reference strand-biased Ig SHM mutation pattern we will use as a comparator for the TP53 mutation data is shown in Table 1 . The patterns in Table 1 are essentially free of strand-biased blunting PCR hybrid artefacts.

For comparison, Table 1 shows the previously cited pattern in the SHM field where there is no strand-bias evident at G:C base pairs. Note that in Table 1 the magnitude of the ratio of mutations of A versus mutations of T (hereafter indicated as the A≫T ratio) is reduced from 2.8x to 1.9x in Table 1 . Thus there is significant blunting of the established A≫T mutation ratio (as well as the dominant and diagnostic A-to-G versus T-to-C ratio, below) and compete ablation of the lower (1.7x) yet significant G≫C strand-biased ratio (compare Tables 1 and 1 ).

In summary, we now know that the SHM reference pattern shown in Table 1 is characterised by significant strand-biases evident for all Watson-Crick complements: A-to-T versus T-to-A; A-to-C versus T-to-G; A-to-G versus T-to-C and G-to-A versus C-to-T; G-to-T versus C-to-A; G-to-C versus C-to-G.

We first addressed this issue [30] by making the point that “the synthesis of a mutated cDNA copy of the transcribed strand (TS) off the pre-mRNA template, and replacement of the original TS with the cDNA is inevitably strand-biased (see Figure 1).” This was underpinned by the finding that the error-prone Y family DNA polymerase-eta (η), an enzyme shown to be at least involved in translesion DNA repair, reviewed in Goodman [33], was an efficient reverse transcriptase at low enzyme-to-RNA template ratios in vitro [7]. It is now firmly established that Pol-η is the only DNA polymerase involved in physiological Ig SHM in vivo [13, 16].

Figure 1: Explanation for TP53 strand-biased mutations in the “Endogenous Patterns” outlined in the text based on the reverse transcriptase model of Ig SHM, namely, transcription-coupled DNA and RNA deamination followed by reverse transcription, adapted and modified from Figure 5 in Steele, 2009, [3] following Steele and Lindley, 2010, [4]. Shown are some hypothesised DNA and RNA intermediates highlighted for the generation of the main strand-biased mutation signatures involving A-to-G, G-to-A, G-to-T, and G-to-C. Black lines are DNA strands, red lines are mRNA, and blue lines are cDNA strands copied off mRNA by a cellular reverse transcriptase such as DNA polymerase η. Steps (a) through (d) show various mutated DNA and RNA intermediates and substrate complexes for both deamination reactions, 8oxoG modifications in RNA, Wu and Li., 2008 [5] and cDNA synthesis (it is not known if 8oxoG sites are preferred in unpaired loops or dsRNA regions). In overview, mutations are first introduced at the DNA level by AID/APOBEC family-mediated C-to-U deaminations and then uracil DNA glycosylase (UNG)-generated abasic sites in the TS (which can further mature into single strand nicks via the action of AP endonuclease (APE1). These template sites are transcribed into mRNA by RNA Pol II generating G-to-A and G-to-C modifications, respectively, in the mRNA, Kuraoka et al, 2003 [6] which on reverse transcription, integration, and DNA replication result in G-to-A and G-to-C mutations in the NTS. Separately adenosine-to-inosine (A-to-I) RNA editing events at WA targets in the nascent ds mRNA stem loops may be copied back into DNA by reverse transcription via Pol-η, Franklin et al., 2004 [7]. Also shown are 8oxoG modifications in mRNA which on reverse transcription, integration and DNA replication would result in strand-biased G-to-T transversions on the NTS. The strand invasion (?) and integration of newly synthesised cDNA TS (?) are hypothesized necessary steps. In more detail (a) RNA Pol II introduces mutations in mRNA as it copies the AID/APOBEC lesions in TS DNA [6], concurrently A-to-I RNA edited sites appear in RNA stem(-loops) forming in nascent mRNA near the transcription bubble [Steele et al., 2006 [8] or 8oxoG modifications via reactive oxygen species [5]. (b) Formation of RT-priming substrate (DNA polymerase-η) by annealing of nicked TS strand with an exposed 3′-OH end. This could arise due to excision at a previous AID-mediated abasic site or an excision introduced by endonuclease activity associated with the MSH2-MSH6 heterodimer engaging a U:G mispaired lesion. (c) Extension of new TS by cDNA synthesis from the 3′-OH end copying the already base modified mRNA template (with I base pairing preferentially, like G, with C; and 8oxoG mispairing with A). (d) Then an unknown and indeterminant number of steps involving strand invasion (?), heteroduplex formation and/or resolution of heteroduplex (?), full length copying of newly synthesized transcribed strand (?) cDNA.

The SHM field now accepts that Pol-η mutates A:T base pairs, particularly A-sites at certain WA hotspots where the target A is preceded 5′ by A or T (=W). With the analysis of most published experimental data 1984–2008 (see legend Table 1) a unifying explanation can be provided for the central role of base-modified RNA template intermediates and cellular reverse transcription in the generation of all the Watson-Crick strand biases displayed in Table 1 .

In the updated version of the reverse transcriptase model (RT model SHM, initially proposed in Steele and Pollard [34]) the two different sets of mutation strand biases at A:T and G:C base pairs can, we believe, be explained by a common core mechanism (Figure 1): emergence of an error-filled mRNA intermediate followed by reverse transcription via DNA polymerase-η [3]. In the case of the A≫T strand bias the model proposes a combination of adenosine-to-inosine (A-to-I) pre-mRNA editing by ADAR1 deaminase [8] and A-to-T and A-to-C biases via the RT activity of Pol-η during the cDNA synthesis step. In the case of G≫C strand bias it proposes that RNA mutations (G-to-A, G-to-C) generated by RNA Polymerase II (RNAPII), transcribing a DNA template (TS) carrying AID-lesions (uracils and abasic sites), are copied back to DNA by the RT activity of Pol-η. This can be considered a form of “transcriptional mutagenesis” as proposed in Figure in Hanawalt and Spivak [35] but coupled now to DNA fixation by reverse transcription.

According to the RT model, for the G≫C mutation signatures the G-to-A versus C-to-T strand bias results from rA being incorporated into RNA opposite unrepaired dU on the TS and the G-to-C versus C-to-G strand bias results from rC being incorporated into RNA opposite an abasic site on the TS [6]. For the G-to-T versus C-to-A strand bias this would be the alternative pyrimidine substitution (rU) if rC is not inserted opposite an abasic site (note: modified G residues in DNA due to reactive oxygen species such as 8oxoG are not known to play a role in physiological SHM in vivo [9] and below).

Recently, we have applied this updated RT model for DNA diversification to both the etiology of strand-biased mutation patterns in many non-lymphoid cancers [4] and, amongst other hypotheses, to the origin of the established genome-wide strand bias for A-to-G over T-to-C in transcribed regions of the human genome [36]. The RNA modifications we are considering are nonbulky simple changes to base pairing, such as adenosine-to-inosine deamination in RNA [8] or putative 8oxoG modifications at G residues in RNA [4]. For Ig SHM and in our genome-wide diversification analysis, we have strongly argued against conventional explanations of strand biases (mainly the A-to-G versus T-to-C bias) for repair of non-bulky DNA lesions caused by differential DNA repair of transcribed (TS) as opposed to nontranscribed (NTS) strands during transcription-coupled repair (TCR). Indeed, critical evaluation of the TCR field has so far not provided evidence to support a TCR-mediated mechanism for strand biases arising from non-bulky DNA lesions such as C-to-U and abasic sites [36]. Further, in the case of the repair of 8oxoG lesions in DNA the Bohr group have convincingly shown that there is no transcriptional strand bias in their repair [37].

2.4. Origin of the A-to-G versus T-to-C Strand Bias

The A-to-G versus T-to-C strand bias is a common strand bias in many somatic and germline mutation data sets. This strand bias is found not only in all SHM data sets in mouse and human VDJ genes, in families of similar human germline IgV segments (in Matsuda et al. [38], EJS unpublished analysis) but also in almost all TP53 cancer data sets where A/T mutations have not been significantly suppressed or ablated (presumably by genetic deficiencies affecting the mismatch repair (MMR) machinery as in colorectal, stomach, oesophagus adenocarcinomas, skin, rectum, and colon cancers, below). And in all cases examined, A-to-G mutations are enriched at some but not all A-site hotspots where the A target is preceded by a 5′ A or T (=W).

Key evidence supporting an RNA template intermediate model for the prominent A-to-G versus T-to-C mutation strand bias derives from an IgV mRNA-stem loop computational analysis where the RNA substrate for ADAR1 mediated A-to-I deamination was modelled and tested on the somatic mutation data set of the rearranged light chain encoding VκOx1 passenger transgene [8]. Thus, in an RNA-based pathway for immunoglobulin SHM, A-to-I RNA editing causes A-to-G transitions since I like G pairs with C. The adenosine deaminases (ADARs) are known to preferentially edit A sites that are preceded by an A or U (W) in double-stranded RNA substrates [39]). We showed that a significant and specific Pearson correlation ( ) exists between the frequency of WA-to-WG mutations and the number of mRNA hairpins that could potentially form at the mutation site. Indeed the statistical significance of the correlation improved with increased stem length (or stability) of the dsRNA substrate (Figure 11, in [8]) and proximity of the nascent dsRNA to the transcription bubble. It is known that ADAR1 edits pre-mRNAs in the nucleus prior to splicing [40, 41]. Indeed ADAR1 seems to act on the WA-site closest to the transcription bubble and explains why the A-stem partner in the target A:U editing site must be previously synthesised [8]. This study strongly implies a role for both RNA editing and reverse transcription during SHM in vivo involving ADAR1 and Pol-η acting in its RT mode. For these reasons, we consider the elevated A-to-G versus T-to-C ratio as a diagnostic for mutational strand bias caused by modified RNA template intermediates and DNA fixation via reverse transcription.

However key direct experimental evidence supporting an RNA template intermediate model is still lacking. The ideal experiment in the context of Ig SHM would be a conditional genetics approach targeting ADAR1 expression in mature B lymphocytes in antigen-activated Germinal Centers. In a collaboration Cre-lox specific gene targeting techniques were used to inactivate ADAR1 during SHM in vivo. A positive result might involve a clear reduction or complete removal of the A-to-G component of the SHM mutation spectrum (i.e., those A-to-G changes which correlate strongly with WA sites in dsRNA stem loops). If however ADAR1 is a more central player in the SHM process it may also result in a total reduction in mutations at A:T base pairs (leaving intact mutations at G:C base pairs). In the recent collaboration ADAR1flox alleles on the C57BL6 mouse background (Wang et al. 2004 [42]) were crossed into C57BL6 mice with a “knocked-in” Ig antigen receptor (the SWHEL IgVH10 single-copy heavy chain transgene) which was assayed for somatic hypermutation in the adoptive transfer system described in Paus et al. 2006 [43]. An inducible Cre-recombinase gene when activated by tamoxifen should specifically target the ADAR1floxed alleles and delete them from B lymphocytes activated by antigen into the somatic hypermutation pathway. Unfortunately no mature donor B lymphocytes could be recovered in Germinal Centers suggesting that one or more ADAR1 sensitive developmental steps were necessary leading to Germinal Center B lymphocytes. Therefore with current Cre-lox technology approaches to implementing a successful experiment targeting ADAR1 alleles seem limited (R. Brink, K. Nishikura, G. F. Weiller, and E. J. Steele unpublished data 2007–2009).

With respect to carcinogenesis, unregulated ADAR-mediated A-to-I RNA editing is a well-described phenomenon [44, 45]. In a similar vein, unregulated APOBEC family C-to-U DNA deaminases such as AID, APOBEC3G, and APOBEC1 are comparable rogue mutator processes thought to be operative in many cancers [21, 2527]. Thus for the present analysis we can reasonably assume that unregulated RNA and DNA deamination processes (at rA and dC residues) may well be associated with either the genesis or progression of many non-lymphoid cancers. Here we are concerned with the molecular implications of such processes particularly in relation to understanding the strand-biased mutation signatures at A:T and G:C base pairs in the TP53 gene and wider genome.

3. The TP53 Mutation Database

The DNA sequence encoding the human tumor suppressor protein TP53, on chromosome 17 located at 17p13.1, has been cloned and sequenced as both full length DNA and cDNA in many tumors over the past two decades. Mutated variants of the TP53 germline sequence carrying somatic mutations mainly in the region encoding DNA binding are found in a wide range of cancers [1, 28, 46]. Oncogenic TP53 mutations are a biased dataset in that they are partially selected for a competitive binding function focused on the DNA binding region. They are usually missense mutations in one allele spanning many sites in the TP53 coding DNA from about codon 130 to 300 [1]. Many investigators have deposited their sequence data in the database funded by the WHO at Lyon in France. The WHO-IARC public database has now curated around 30,000 somatic mutations in TP53. The data analysed here was extracted from this source [47] (, R15, November 2010).

3.1. Method of Data Extraction and Presentation

The somatic point mutation data presented in the Tables were extracted from the database as follows. On entering the website ( “Database Search” is selected allowing entry to which allows selection of “Search database for data related to SOMATIC MUTATIONS” and selection of “Search (Tumor types).” This allows entry to where the tumor site can be selected at “Select a tumor site” and thus entry to “Mutation pattern” (at where selection can be made for the key data sets: “Strand distribution” and “Download data.” The “Strand distribution” tables can be downloaded where numbers of all 12 possible base substitutions are displayed including the numbers of C-to-T and G-to-A mutations at CpG islands. The spreadsheet from “Download data” allows construction and analysis of all types of mutations with 5′ and 3′ flanking sequence context in relation to the unmutated TP53 exon sequence (and in some cases intronic sequence). This allows development of frequency distributions of various types of mutation (e.g., A-to-G) versus nucleotide (and codon) position across regions of interest such as the DNA binding region for example Figure 2.

Figure 2: A-to-G spectrum in TP53 for All Lung Cancers versus All Breast Cancers Covering A sites in TP53 from codon 100 to 300 inclusive. The Pearson correlation coefficient ( ) is 0.93, which for 129 degrees of freedom gives a .
3.2. Statistics

Displayed in each base substitution table are Chi-squared statistical comparisons (1 df) of several types of base substitutions. Thus for A:T base pairs the main comparisons are all mutations of A versus all mutations of T-when strand biased for excessive mutations of A this is symbolized as “A≫T.” The common and dominant A-to-G strand bias is represented as “A>G versus T>C.” Other types of biases are presented and tested similarly. Thus when mutations of G exceed mutations of C this is symbolized as “G≫C”; when G-to-A mutations exceed C-to-T mutations this is symbolized as “G>A versus C>T.” The Appendix and Figure 3 explains the rationale for detecting strand biases in mutation datasets.

Figure 3

4. TP53 Strand-Biased Mutation Data

The strand-biased mutation pattern typical of normal Ig SHM (Table 1 ) was compared with the somatic point mutation patterns observed in the TP53 coding region for a range of key cancers (Table 2). Chi-squared tests were applied to test the levels of statistical significance of the various strand biases. Attention was focused on mutations of A and G, respectively, particularly A-to-G versus T-to-C, G-to-A versus C-to-T and G-to-T versus C-to-A as the analysis of these strand-biases has implications for the molecular mechanisms involved.

Table 2: Somatic point mutation patterns in the TP53 coding region in “All Breast Cancers,” “All Bladder Cancers,” and “All Lung Cancers.”

The main strand-biased mutation patterns observed in TP53 are best represented by the patterns in All Breast, All Bladder, and All Lung cancer categories where each has many (>1000) somatic mutations to analyse (Table 2). We have ranked these tissues in their likely exposure to carcinogens in tobacco smoke as smoking is the leading cause of lung and many other cancers in the world today. The origin of TP53 mutations in breast may be considered the least likely to be caused by direct exposure to tobacco smoke carcinogens and metabolic by-products [2].

In the case of TP53 mutations in bladder cancer we assume the tissues are at least exposed to relatively high levels of carcinogenic metabolic by-products of tobacco smoke in urine.

In the case of TP53 mutations in lung cancer there is direct tissue exposure to tobacco smoke polycyclic hydrocarbons (PAHs) and carcinogenic metabolic derivatives [48]. Here the molecular, biochemical, and cellular evidence is overwhelming. Exposure to such smoke-derived carcinogens causes lung cancer. In particular, bulky DNA adducts at certain G sites targeted by carcinogens such as benzo[a]pyrene (B[a]P) are the direct cause of the dominant G-to-T transversion in these cancers in the TP53 gene [4951] (reviewed in [1, 28]) and throughout the wider lung cancer genome [52].

Some striking similarities are observed for the cancers shown in Table 2.

(a) The first is the similarity between the Ig mutation pattern (Table 1 ) and the patterns in TP53 for “All Breast” and “All Bladder” cancers. The main difference between the Ig pattern and the TP53 pattern in breast and bladder (and with many other TP53 cancer data sets for that matter) is the ~50 : 50 balance of mutations at A/T and G/C for Ig versus the G/C excess over A/T in TP53. The majority (70–75%) of TP53 mutations occurs at G/C sites. This may be partly contributed by the significant excess of G/C (~60%) versus A/T (~40%) base composition of the target region in TP53. But the pattern similarities nevertheless exist within A:T base pairs and G:C base pairs.

(b) The second systematic pattern observed in all data sets shown in Table 2 is the A≫T and A-to-G versus T-to-C strand bias. They stand out as common strand-biased patterns. To our knowledge of the mainstream TP53 literature this strand bias is rarely highlighted in published discussions. However, this pattern is not observed in colorectal, stomach, skin, and some other cancers as A/T mutations here have been ablated or significantly reduced (below). This pattern is stable across breast, through bladder to lung (and all ovary cancers, Table 3) suggesting a common causal mechanism that may not be associated with exposure to carcinogens in tobacco smoke. For A-to-G hotspots in both breast and lung cancers, the majority are defined by being part of a WA-site, particularly at codons 132, 163, 205, 220, 234, and 239; the TAT site in codon 220 is a super hotspot (data not shown).

Table 3: Somatic point mutation patterns in the TP53 coding region in “All Ovary” Cancers.

If the A-to-G spectrum in TP53 at all A-sites in codons 100–300 inclusive for “All Breast Cancer” is compared with the same spectrum in “All Lung Cancer”, they are virtually super imposable (Figure 2). The Pearson correlation coefficient (r) is 0.93, which for 129 degrees of freedom gives a . Indeed, we think that this repeatable pattern in the two disparate target tissues is consistent with their origin being the result of an “endogenous” process. By comparison with what we have inferred from Ig SHM (Figure 1) a likely candidate is unregulated ADAR1-mediated A-to-I RNA editing and fixing of the mutated retrotranscripts back into DNA via reverse transcription as envisaged for Ig somatic hypermutation. The cellular reverse transcriptase could be DNA polymerase-η (or one of it’s Y family relatives iota (ι) and kappa (κ)) which also possess significant reverse transcriptase activity, [7]).

(c) The third common pattern is the excess of G mutations over C mutations (G≫C), particularly the dominant G-to-A over C-to-T strand bias (evident at both CpG and non-CpG sites). The pattern in breast and bladder is very similar again, suggesting common causal mechanisms. All ovary cancers display a similar pattern (Table 3). Again, to our knowledge of the mainstream TP53 literature, this particular and striking strand bias is rarely highlighted in published discussions on the topic.

The likely cause of the G≫C imbalance for lung cancer is known to be due to the binding of bulky tobacco smoke-derived adducts such as B[a]P at certain G-sites (mainly CpG islands although GpG sites can be targeted) in known critical codons such as 154, 157, 158, 245, 248, 249, and 273. Such adducted G sites now mispair with adenosines causing G-to-T transversion mutations if left unrepaired. This grossly imbalances other G mutations at these sites leading to a loss of the G-to-A versus C-to-T strand bias evident in breast, bladder, and ovary cancers.

Direct experiments by Pfeiffer and colleagues have shown that the G-to-T versus C-to-A strand bias is caused by the much slower repair of bulky DNA adducts such as B[a]P along the nontranscribed strand compared with the faster repair on the transcribed strand [51]. So the strand biased G-to-T pattern is a direct consequence of a transcription-coupled DNA repair (TCR) process for bulky DNA adducts [35]. This is a DNA-based strand-biased mutation mechanism and thus quite different from the RNA-based mechanisms outlined above for Ig SHM (Figure 1).

Before leaving this section we wish to deal with two further issues. First, we must deal with a relevant point that has now emerged from whole genome sequencing of individual lung cancers such as NCS-H209, Pleasance et al. [52]. In a critical section on the origin of the DNA repair pathways that may be responsible for the complex strand-biased signatures of tobacco exposure the authors make the following set of statements and assumptions:

“… that bulky adducts on purines are the predominant form of DNA damage induced by tobacco carcinogens and can be sufficiently disruptive to impede RNA polymerase when they occur on the transcribed strand..” and they observed “that guanine and adenine substitutions are generally less frequent on the transcribed than the nontranscribed strand-confirming that purines seem to be the major target of carcinogens in tobacco smoke.”

We accept this explanation for the origin of the G-to-T transversions but our data do not support their conclusions on the origin of the A-to-G strand bias. Indeed we would hardly expect the A-to-G spectra in TP53 to be identical in lung and breast cancers (Figure 2) if this is the case. The fitted curves in Pleasance et al. [52] show the effect of gene expression on strand bias mutation rate for the six classes of adenine and guanine mutations in NCI-H209 (Figure in [52]). The overall patterns are to be expected from the TP53 lung cancer pattern (Table 2 ) which the authors acknowledge in their Supplementary data. However the profiles for G-to-T and A-to-G are quite different. The decline in G-to-T mutation rate with increased transcription level is biphasic suggesting two causal DNA repair mechanisms for G-to-T: one to be expected and rapid depending on increased transcription involving TCR of bulky adducts, the other suggestive of another strand bias process. This could be due to perhaps 8oxoG generation by reactive oxygen species in RNA and an RT step of RNA DNA fixation.

The curve for A-to-G mutation rate versus transcription level appears monophasic with the difference between the repair of the NTS and TS mutations deepening with increased gene expression (or transcription). However the slope of the curve is very shallow—a slight decline in mutation rate from expression level 4 to 9 (approximately 2.9 mutation rate/Mb to approximately 2.5 mutation rate/Mb). This suggests a constitutive process of error generation and repair marginally affected by transcription level. One interpretation could be that this is consistent with A-to-I RNA editing, and RNA DNA fixation.

The second issue adds to the argument against asymmetrical or strand-differential TCR as an explanation for the excess of G-to-A mutations over C-to-T mutations in both Ig SHM and somatic mutation patterns in TP53 in cancers such as in breast. If TCR was occurring to clear C-to-U lesions arising from the action of unregulated APOBEC family deaminases in cancer cells (or progenitors) we would expect a preferential clearance of C-to-U on the TS and an excess of unrepaired C-to-U lesions, manifest as C-to-T, on the NTS. In fact in Ig SHM and in mutated TP53 genes in breast and other cancers (Table 2) it is the other way around. An excess of G-to-A over C-to-T suggests that C-to-U lesions would need to go unrepaired preferentially on the TS which is not evident in the mutation patterns analysed here.

So the argument goes full circle posing the question why then should G-to-A mutations exceed C-to-T mutations? This conundrum may explain why the G-to-A≫C-to-T strand bias has gone relatively unreported in the literature. The simplest explanation of the G-to-A strand bias is that C-to-U goes unrepaired on the TS prior to RNA Pol II copying this into G-to-A in the mRNA [6] which in turn would be manifest in a strand-biased manner in the DNA by reverse transcription first as a C-to-T in TS DNA and then following replication as G-to-A on the NTS [3] (see Figure 1). This is the simplest explanation of the G-to-A versus C-to-T strand bias involving AID/APOBEC3G/APOBEC1 deaminations in DNA. The same argument for G-to-C over C-to-G in SHM and TP53 mutation patterns in breast and other cancers with similar mutation patterns (e.g., bladder Table 2 ) is used, and the lesser strand bias of G-to-T over C-to-A. In the TP53 mutation pattern in breast cancers the relative increase in the G-to-T over C-to-A ratio compared to that in Ig SHM (Table 1 )—in “a tissue less exposed to smoke”—suggests it might be contributed by another mechanism. One possibility is the formation of 8oxoG in the TP53 mRNA (as suggested in Steele and Lindley [4]) because it is known that 8oxoG DNA lesions are unlikely to display strand differential TCR biases on DNA repair [37]. Further research on the genetic and biochemical consequences of 8oxoG formation in RNA is clearly required.

5. TP53 G T versus C A Strand Biases in Other Cancers

B[a]P and similar DNA-binding tobacco-derived carcinogens are known to form bulky G-site adducts causing G to base pair like T with adenosines. These bulky adducts are preferentially cleared from the transcribed strand during transcription-couple repair and this causes the excess of G-to-T mutations on the upper nontranscribed strand. In a similar vein, in hepatocellular liver cancer the G-to-T versus C-to-A strand bias is caused by adduct formation at G-sites by aflatoxin dietary contaminants, diagnostically at the third position of codon 249 in TP53 [46]. This G-site (GpG) is also a G-to-T hotspot in many lung cancers.

Other geographically and ethnically localised strand biases via dietary contamination for G-to-T over C-to-A and for A-to-T over T-to-A are reviewed in [1, 46]. Thus G-to-T transversions in the third base of TP53 codon 249 correlates strongly in tumors from HBV carriers and exposure to dietary aflatoxinB1; and A-to-T strand bias (codons 131, 209, and 280) has been linked with crops contaminated with Aristolochia sp seeds (in certain Balkan communities in southeastern Europe). Aristolochic acids (AAS) are the identified carcinogenic agent, and DNA adducts involving AA have been detected in patients suffering from Balkan endemic nephropathy (BEN). The strand bias suggests preferential TCR of bulky DNA lesions along the transcribed DNA strand as concluded already for G-to-T strand biases in many lung cancers.

6. DNA Repair Deficiencies: Colorectal, Stomach, and Skin Cancers

There are also some other more complex TP53 mutation patterns not conforming to the simple strand biases just discussed and worthy of further comment here. Before we do this, it is informative to consider what is known about DNA repair deficiencies and distortions of the Ig SHM mutation pattern (Tables 1 and 4). Much of this work has been done using single and double knockout mice targeting key base excision repair (BER) and mismatch repair (MMR) genes encoding proteins that have been coopted to now function aberrantly (a form of “subverted DNA repair” as put by Martomo and Gearhart [53]) see summary in Table 4 and review in Steele [3]. The various protein components of the normally tightly regulated and targeted Ig mutator now act to encourage error-prone DNA synthesis during the somatic hypermutation of rearranged antibody variable genes in Germinal Center B lymphocytes following antigenic challenge. The key proteins and DNA repair enzymes are AID deaminase which initiates the SHM process (and Ig class switch recombination, CSR) by deaminating C-to-U in VDJ DNA; this is then followed by attempts to remove the base in a base excision repair (BER) manner via uracil DNA glycosylase (UNG), followed by other “subverted” DNA repair enzymes such as translesion DNA polymerase η, and the mismatch repair heterodimer MSH2-MSH6.

Table 4: Effect of main DNA repair deficiencies in Ig somatic hypermutation studies in mice.

Table 4 lists the main consequences on the Ig SHM mutation spectrum of genetic deficiency in uracil DNA-glycosylase (UNG), in the mismatch repair heterodimer (MSH2-MSH6), and deficiencies in Y family DNA polymerases η (eta), (iota) and κ (kappa) and combinations thereof. Additional deficiencies are shown in alkyladenine DNA glycosylase (Aag) which removes hypoxanthine (deaminated adenine) from DNA generating an abasic site and 8-hydroxyguanine-DNA glycosylase (Ogg1) which removes oxidised guanine from DNA. The effect on the SHM spectrum of inactivating TP53 is also shown.

In Ig SHM a failure to remove uracils from DNA as a consequence of dC-to-dU AID deaminase action (UNG−/−, Table 4) has a slight effect on overall A/T mutations and no effect on A≫T or A-to-G strand biases. The main effect is on the focusing of mutations to G:C base pairs with a reduction in transversion mutations.

In UNG−/−MSH−/− double deficient mice mutations at A:T base pairs are virtually eliminated as are transversions at G:C base pairs leaving what is considered the “AID deamination footprint” (Table 4, [11]) which now manifests itself as a strong strand bias of C-to-T exceeding G-to-A by at least 1.5-fold [3]. The simplest interpretation is that since AID-deaminase converts C-to-U in the single stranded DNA regions of the transcription bubble, this is likely to happen more often on the NTS than the TS, thus the unfettered C-to-T over G-to-A strand bias in such mutant mice. The same strand bias is revealed in double deficient Pol-η−/−/MSH2−/−mice [13]. These data are consistent with SHM models whereby MSH2-MSH6 heterodimers engage G:U DNA mispairs and recruit Pol-η necessary for full blown mutagenesis at A:T base pairs [54]. In the complete absence of Pol-η, Pol-κ can step in to affect A/T mutagenesis [16] and probably also the RT step [7].

With respect to the general over-arching molecular mechanism of Ig SHM the deficiencies in Aag, Ogg1, and TP53 are informative. First, the failure to remove 8oxoG from DNA has no effect on the Ig SHM spectrum [9] indicating that the G-to-T/C-to-A component of the SHM spectrum does not involve 8oxoG residues in DNA. This leaves open the possibility that 8oxoG sites in RNA may contribute in other somatic mutation scenarios such as in TP53 as envisaged in Figure 1. Second, direct deamination of adenines in DNA to hypoxanthine (and thus potential A-to-G miscoding) seems to play no role in the generation of the A-to-G spectrum [17] once again leaving open the possibility of adenosine deamination to inosine at the RNA level contributing significantly to the observed A-to-G strand bias as envisaged in Figure 1. The investigators also observed a borderline statistically significant ( ) increase in T-to-C in Aag(−/−) mice—in our view this variation in T-to-C frequency is well within the range of variation for these PCR based SHM assays as outlined in Steele [3] and Supplementary data therein. Third, and most importantly for the present analysis, Strob and associates have clearly shown an effect of TP53 inactivation on the Ig SHM spectrum—there is a striking increase in A-to-G frequency in such mice and a corresponding increase in both A-to-G versus T-to-C and A≫T strand bias [18]. This result suggests that TP53 inactivation can profoundly affect the A-to-G component of the SHM spectrum and thus implies that the global DNA damage surveillance function of TP53 extends, as expected [18], to Ig SHM. In the context of the present analysis the result suggests that TP53 may well regulate the imprint of A-to-I RNA editing on the DNA somatic mutation pattern as predicted by the RT model of Ig SHM (Figure 1). This in turn has implications for the magnitude of the A≫T and A-to-G versus T-to-C strand biases across the human cancers bearing inactivated TP53 alleles such as, for example, bladder (Table 2 ) and ovary cancers (Table 3) and the wider cancer genome.

With respect to the other major effects on A/T and G/C mutations can we find parallel, or similar, patterns in other cancers with TP53 somatic mutation data? It is well known that colorectal and other aggressive gastrointestinal cancers are typified by the known high incidence of defects in mismatch DNA repair machinery, Bellizzi and Frankel (2009) [55]. We would therefore expect, if subverted DNA repair components of the Ig SHM process are operative in such tumors, that the signature of ablated or suppressed A/T mutagenesis should be revealed in the TP53 patterns in such cancers. This expectation is partly satisfied by the TP53 mutation data on colorectal and stomach cancers (Table 5) where A/T mutations have been reduced (but not eliminated) in colorectal and stomach cancers. In some cases the A-to-G strand bias has been retained (stomach Table 5 ) and in other cases it is lost (colorectal cancers, Table 5 ). Reductions in A/T mutations are also noted in oesophagus adenocarcinomas (not shown). Whilst there is a relative excess of G-to-A and C-to-T mutations at presumed methylated CpG sites in colorectal cancers, the strand bias here is also evident at non-CpG sites (not shown). In contrast, the TP53 mutation patterns in stomach cancers lack significant strand biases apart from those involving A-to-G versus T-to-C and G-to-T versus C-to-A (Table 5 ). It is conceivable that unrepaired excessive C-to-U deaminations on the NTS (relative to TS) are blunting intrinsic strand biases of G-to-A versus C-to-T in the same way such blunting (or strand bias reversal) occurs in Ig SHM in UNG−/−MSH−/− and Polη−/−/MSH−/− mice (Table 4; see extended discussion on this point in Steele [3]).

Table 5: Somatic point mutation patterns in the TP53 coding region in “All Colorectum Cancers,” “All Stomach Cancers,” and “All Skin Cancers.”

TP53 mutation patterns in skin cancers are shown in Table 5 . Here there is both suppression of A/T mutagenesis and a reversal of the strand bias at G/C sites, namely, C-to-T mutations clearly exceed G-to-A mutations by almost 2-fold. This is similar to data for Ig SHM in mice displaying the “AID deamination footprint” in UNG−/−MSH−/− and Pol-η−/−/MSH−/− mice (Table 4, [11, 13]).

In summary, known DNA repair deficiencies in previously characterised Ig SHM model systems are displayed in toto or in part in TP53-bearing cancer mutation patterns. The results are consistent with the hypothesis that components of “subverted DNA repair” play a similar role in both SHM and non-lymphoid cancers bearing mutated TP53 derivatives. In addition inactivation of TP53 may increase the magnitude of A≫T and A-to-G versus T-to-C strand biases in the tumors that harbor mutated TP53 derivatives [18].

7. TP53 Mutation Patterns in Brain Cancers

This particular pattern is discussed at length in Soussi [1] and displayed in Table 6. Note that a trend to A≫T strand bias and a significant A-to-G versus T-to-C strand bias is evident. There is also a specific strand bias of G-to-T transversions over C-to-A. However the global G≫C strand bias has been ablated, particularly the dominance of mutations of G-to-A versus C-to-T evident in many other cancers harboring mutated TP53 variants. As Soussi [1] discusses, there are similar numbers of excessive G-to-A and C-to-T mutations at CpG sites suggesting excessive deaminations of 5-Methylcytosine-to-T on both DNA strands (Table 6). The latter may be affected by AID-deaminase [26]. Once again however, as with the analysis of mutation patterns in Ig SHM, competing strand-biased mutational processes at G:C base pairs may accentuate, blunt, ablate, or even reverse the strand-biased signature presented by a particular tumor.

Table 6: Somatic point mutation patterns in the TP53 coding region in “Glioblastoma” cancers of brain.

8. Context of C-to-U Lesions by APOBEC-Family Enzymes

This topic has been extensively studied by the Neuberger group [21]. Here we summarise the main findings and relate them to the sequence context of TP53 mutations in lung and breast cancer. For mutations of G most if not all occur at one of the known 5′ 3′ motifs on the opposite deamination strand. Thus APOBEC1 targets 5′-TCA-3′ on the deamination strand (or 5′-TGA-3′ on the NTS); APOBEC3G targets 5′-CCG-3′ on the TS (or 5′-CGG-3 on the NTS) and AID variously targets in descending order 5′-ACA-3′, 5′-GCA-3′, 5′-ACG-3′ and 5′-GCG-3′ (or 5′-TGT-3′, 5′-TGC-3′, 5′-CGT-3′ and 5′-CGC-3′ on the NTS). We also include the possibility that 5-MeC-to-T deaminations are affected by such enzymes [26].

When this information is applied to the G-site mutations of the “endogenous pattern” represented by the All Breast Cancer data it reveals that all major and minor G-site mutation hotspots can be classed as the direct result of either AID or APOBEC3G C-to-U deaminations targeting the transcribed TP53 DNA strand (Table 7). The dominant likely C-to-U deaminase is AID suggesting that dysregulated SHM initiation via AID activation in non-lymphoid tissue may be the primary cause of the mutagenesis leading to cancer.

Table 7: Hotspots in breast cancers for G-site mutations in TP53 in relation to codon, target motif, and likely dC-to-dU deaminase.

9. Breast Cancer Mutation Patterns in TP53 Compared with Patterns in Genome-Wide Data

The somatic mutation patterns in TP53 is often a very good correlate to genome-wide point mutation patterns, for example, lung cancer [52]. However this is not always the case probably because TP53 is not inactivated in all cancers of a given category, for example, about 25% of breast and bladder cancers, 48% of ovary cancers and 38% of lung cancers have an inactivated TP53 allele (see IARC TP53 database). As pointed out in Pfeifer and Besaratinia [28] large-scale genome sequencing of cancer genomes has revealed some interesting results not evident in TP53 patterns [56, 57]. Thus there is a quantitative difference evident between strand-biased mutation patterns in TP53 in breast cancers (Table 2 ) compared with available data from genome-wide exome sequencing of breast cancer genomes [56, 57]. In Table 8 are displayed data illustrating this difference [56] from the sequencing of exons of close to 20,000 protein coding genes in eleven breast cancer genomes. In the majority of these breast cancers (10/11) TP53 is mutated (see Supplementary data in [56]).

Table 8: Somatic point mutation patterns in breast cancer derived from large-scale exomic sequencing of breast cancer genomes.

Note first, the strand-biased pattern at A:T base pairs in TP53 (Table 2 ) and in genome-wide data (Table 8) is similar. The systematic A≫T and prominent A-to-G strand biases are evident. However in this data set (approximately 1445 point mutations) the strand biases at G:C for G≫C is systematic and just significant at the level. A key difference pointed out by Pfeifer and Besaratinia [28] is the higher load of mutations for G-to-C/C-to-G. Pfeifer and Besaratinia postulate the following:

“These data suggest that breast cancers are caused by an etiological agent that induces this particular type of mutation. There are few known mutagens that specifically induce G/C to C/G transversions, let alone selectively at a particular dinucleotide sequence.”

They then go on to point out that a significant fraction of these G to C transversions occur at the 5′ GpA dinucleotide motif which is 5′ TpC on the other strand.

From our perspective it is interesting that this happens to be the favoured APOBEC1 motif if this DNA deaminase enzyme, now unregulated, deaminated cytosines at such sites ([21], see previous section, Table 7). Inspection of the TP53 “All Breast Cancer” data reveals there are indeed several 5′GpA sites which account for about a third of the load of G-to-C mutations in the strand-biased G-to-C pattern in TP53 in cancers of the breast (Table 2 ). These G-to-C hotspots are at codons 196, 280, and 281 (Table 7) and constitute 25 of 92 (27%) of all G-to-C mutations over codons 150–300 inclusive. Additionally, ten G-to-C mutations in this region occur at a CGC site in codon 156 (a motif favoured by AID deamination on the opposite strand).

Recently the Cancer Genome Project (CGP) at The Welcome Trust Sanger Institute has reported on the exomic mutation spectrum of category-selected sets of 21 breast cancer genomes, Nik-Zainal et al. [22]. Few if any significant strand biases are reported in this genome-wide data. Of real interest is the fact that only a minor fraction (4/21) carries an exomic mutation in TP53 and most sample sizes for mutations are statistically small (N values in the hundreds)—except tumor PD4120a which carries 1931 exomic mutations which are predominantly focused on G/C with ≤5% mutations at A/T base pairs. Two of the tumors bearing a TP53 mutation, PD4109a and PD4199a, display the early trends of the significant strand biases at A:T and G:C base pairs (Table 9) evident in the earlier Wood et al. (2007) exome study [56].

Table 9: Somatic point mutation patterns in the TP53 mutated breast cancers in Nik-Zainal et al. exomic data [22].

Collectively these genome-wide data sets suggest that mutations in TP53 accentuate strand-biased mutation patterns across the cancer genome implying that inactivated TP53 and dysregulated SHM contribute to such patterns. This conclusion is underlined by the documented functional interaction between TP53 and the Ig SHM machinery shown in mice by the Strob group, particularly in relation to the accentuated strand-biased mutation pattern of A-to-G versus T-to-C [18].

10. Inflammation and Carcinogenesis

Whilst there are some exceptions and qualifications, we conclude that there is a strong statistically significant similarity between the strand-biased mutation signatures of TP53 in many tumor types and the now well-established Ig SHM pattern, particularly in relation to the strand biases of A-to-G over T-to-C and the G-to-A over C-to-T. Previous work on Ig SHM suggests that the A-to-G over T-to-C stand bias correlates strongly with A-to-I RNA editing coupled to reverse transcription to fix the A-to-G pattern in the cellular DNA [8].

The G-to-A over C-to-T pattern is found to be a dominant strand bias in all those cancers arising in tissues “least accessible to tobacco smoke” suggesting that this strand biased pattern (as well as A-to-G over T-to-C) arises from endogenous mutation processes in most non-lymphoid cancers. We have previously concluded that the G-to-A over C-to-T strand bias is consistent with RNA mutations initiated at C sites by activation-induced cytidine deaminase (AID)-mediated C-to-U deamination on the transcribed strand (TS) resulting in G-to-A transitions in the mRNA which are fixed as G-to-A mutations on the nontranscribed strand (NTS) following reverse transcription [3], see Figure 1.

The present analyses therefore confirm and extend our earlier conclusions in a preliminary study of genome-wide somatic mutation data curated by the Cancer Genome Project (CGP) at The Welcome Trust Sanger Institute, Hixton, UK: [4]. The special features of the strand biases at A:T and G:C base pairs in tumors bearing mutated derivatives of TP53 imply a role for base-modified mRNA template intermediates and reverse transcription in somatic mutagenesis leading to or initiating cancer. In addition our analysis and conclusions are consistent with the view that inflammatory infiltrates, or in situ inflammatory episodes, in non-lymphoid tissues may contribute to dysregulated Ig SHM and thus oncogenesis, first by mutating components of the Ig SHM machinery and then by affecting mutations in TP53. This could occur via “a bystander effect” of various liberated cytokines inappropriately activating gene expression pathways in nearby non-lymphoid cells. It is common clinical knowledge that most tumors have associated inflammatory infiltrates and are part and parcel of tumor growth.

This analysis also identifies several potential new drug targets for cancer therapy: in addition to AID and APOBEC family deaminases we can include Pol-η, ADAR1 and yet to be identified factors that modulate the apparatus of RNA Pol II transcription-coupled repair. Further, identifying the interacting proteins/genes mediating the functional TP53-Ig SHM interaction must also be considered a top priority in drug development and targeting.

Consistent with these conclusions is the large and now rapidly growing literature on chronic inflammation preceding cancer in many tissues [58]. Whilst it may not be possible to control all those induced somatic genetic factors leading to cancer, strategies to dampen and avoid chronic or transient inflammatory episodes in life may depress the chance of triggering “endogenous” mutagenic events, via dysregulated Ig SHM machinery, being turned on in non-lymphoid tissues [4]. This is particularly important in breast and ovarian tissues as estrogen can directly elevate AID expression as demonstrated by Petersen-Mahrt and colleagues [59, 60] and discussed by Maul and Gearhart [61]. Indeed the Pauklin et al. data [60] show that estrogen induces AID transcription in these non-lymphoid tissues suggesting that the TP53 G-site mutation hotspots in breast cancers may be directly caused by AID and other APOBEC-family deaminases targeting such sites, as the data in Table 7 imply. Collectively, these findings and the present analyses are shedding a new light on how we might view oncogenesis. Our work points to the unregulated Ig SHM mechanism as playing a key role in the progression of the main non-lymphoid cancer groups. This provides us with a fundamentally new molecular model with which to view the process of oncogenesis and ways to develop new strategies for treating (and perhaps preventing) the development of certain cancer groups.


Detection of Strand-Biased Somatic Mutation Signatures

In a data set containing a large number of somatic mutations strand-biased signatures are revealed by comparing the base substitution frequencies of Watson-Crick complements on the same strand. By convention nucleotide substitutions are read from the nontranscribed strand (NTS). However the known direction of transcription in a region of genomic DNA encoding a protein allows identification of the strands. Thus if A-to-G mutations occur with equal frequency on both strands, then its Watson-Crick complement, T-to-C will occur with equivalent frequency when scored off the same strand. However if there is a bias in the mutations favouring the NTS then A-to-G mutations will exceed T-to-C mutations. If there are systematic strand biases involving excessive mutations of A or G (e.g., as seen in many of the data tables presented herein) then the sum total of mutations of A will exceed the sum total of mutations of T (at A:T base pairs where A≫T) and the sum total of mutations of G will exceed the sum total of mutations of C (at G:C base pairs where G≫C). (See Figure 3).


We thank John A Millman, Brent (“Charlie”) J Stewart, Pat Carnegie, Susan Lester, Joseph F Williamson and Roger L Dawkins for early discussions and the AL & M Dawkins Foundation and C Y O’Connor ERADE Village Foundation for support in the early stages of the project. We thank Thierry Soussi for comments on an earlier draft of the manuscript, and Selena Nik-Zainal and Bert Vogelstein with helpful patience and assistance in accessing and analysing the supplementary data in [22, 56].


  1. T. Soussi, “Advances in carcinogenesis: a historical perspective from observational studies to tumor genome sequencing and TP53 mutation spectrum analysis,” Biochimica et Biophysica Acta, vol. 1816, no. 2, pp. 199–208, 2011. View at Publisher · View at Google Scholar · View at Scopus
  2. P. Hainaut and G. P. Pfeifer, “Patterns of p53→T transversions in lung cancers reflect the primary mutagenic signature of DNA-damage by tobacco smoke,” Carcinogenesis, vol. 22, no. 3, pp. 367–374, 2001. View at Google Scholar · View at Scopus
  3. E. J. Steele, “Mechanism of somatic hypermutation: critical analysis of strand biased mutation signatures at A:T and G:C base pairs,” Molecular Immunology, vol. 46, no. 3, pp. 305–320, 2009. View at Publisher · View at Google Scholar · View at Scopus
  4. E. J. Steele and R. A. Lindley, “Somatic mutation patterns in non-lymphoid cancers resemble the strand biased somatic hypermutation spectra of antibody genes,” DNA Repair, vol. 9, no. 6, pp. 600–603, 2010. View at Publisher · View at Google Scholar · View at Scopus
  5. J. Wu and Z. Li, “Human polynucleotide phosphorylase reduces oxidative RNA damage and protects HeLa cell against oxidative stress,” Biochemical and Biophysical Research Communications, vol. 372, no. 2, pp. 288–292, 2008. View at Publisher · View at Google Scholar · View at Scopus
  6. I. Kuraoka, M. Endou, Y. Yamaguchi, T. Wada, H. Handa, and K. Tanaka, “Effects of endogenous DNA base lesions on transcription elongation by mammalian RNA polymerase II. Implications for transcription-coupled DNA repair and transcriptional mutagenesis,” Journal of Biological Chemistry, vol. 278, no. 9, pp. 7294–7299, 2003. View at Publisher · View at Google Scholar · View at Scopus
  7. A. Franklin, P. J. Milburn, R. V. Blanden, and E. J. Steele, “Human DNA polymerase-η, an A-T mutator in somatic hypermutation of rearranged immunoglobulin genes, is a reverse transcriptase,” Immunology and Cell Biology, vol. 82, no. 2, pp. 219–225, 2004. View at Publisher · View at Google Scholar · View at Scopus
  8. E. J. Steele, R. A. Lindley, J. Wen, and G. F. Weiller, “Computational analyses show A-to-G mutations correlate with nascent mRNA hairpins at somatic hypermutation hotspots,” DNA Repair, vol. 5, no. 11, pp. 1346–1363, 2006. View at Publisher · View at Google Scholar · View at Scopus
  9. D. B. Winter, Q. H. Phung, X. Zeng et al., “Normal somatic hypermutation of Ig genes in the absence of 8-hydroxyguanine-DNA glycosylase,” Journal of Immunology, vol. 170, no. 11, pp. 5558–5562, 2003. View at Google Scholar · View at Scopus
  10. C. Rada, G. T. Williams, H. Nilsen, D. E. Barnes, T. Lindahl, and M. S. Neuberger, “Immunoglobulin isotype switching is inhibited and somatic hypermutation perturbed in UNG-deficient mice,” Current Biology, vol. 12, no. 20, pp. 1748–1755, 2002. View at Publisher · View at Google Scholar · View at Scopus
  11. C. Rada, J. M. Di Noia, and M. S. Neuberger, “Mismatch recognition and uracil excision provide complementary paths to both Ig switching and the A/T-focused phase of somatic mutation,” Molecular Cell, vol. 16, no. 2, pp. 163–171, 2004. View at Publisher · View at Google Scholar · View at Scopus
  12. X. Zeng, D. B. Winter, C. Kasmer, K. H. Kraemer, A. R. Lehmann, and P. J. Gearhart, “DNA polymerase η is an A-T mutator in somatic hypermutation of immunoglobulin variable genes,” Nature Immunology, vol. 2, no. 6, pp. 537–541, 2001. View at Publisher · View at Google Scholar · View at Scopus
  13. F. Delbos, S. Aoufouchi, A. Faili, J. C. Weill, and C. A. Reynaud, “DNA polymerase η is the sole contributor of A/T modifications during immunoglobulin gene hypermutation in the mouse,” Journal of Experimental Medicine, vol. 204, no. 1, pp. 17–23, 2007. View at Publisher · View at Google Scholar · View at Scopus
  14. D. Schenten, V. L. Gerlach, C. Guo et al., “DNA polymerase K deficiency does not affect somatic hypermutation in mice,” European Journal of Immunology, vol. 32, no. 11, pp. 3152–3160, 2002. View at Publisher · View at Google Scholar · View at Scopus
  15. J. P. McDonald, E. G. Frank, B. S. Plosky et al., “129-Derived strains of mice are deficient in DNA polymerase ι and have normal immunoglobulin hypermutation,” Journal of Experimental Medicine, vol. 198, no. 4, pp. 635–643, 2003. View at Publisher · View at Google Scholar · View at Scopus
  16. A. Faili, A. Stary, F. Delbos et al., “A backup role of DNA polymerase κ in Ig gene hypermutation only takes place in the complete absence of DNA polymerase η,” Journal of Immunology, vol. 182, no. 10, pp. 6353–6359, 2009. View at Publisher · View at Google Scholar · View at Scopus
  17. S. Longerich, L. Meira, D. Shah, L. D. Samson, and U. Storb, “Alkyladenine DNA glycosylase (Aag) in somatic hypermutation and class switch recombination,” DNA Repair, vol. 6, no. 12, pp. 1764–1773, 2007. View at Publisher · View at Google Scholar · View at Scopus
  18. S. Ratnam, G. Bozek, D. Nicolae, and U. Storb, “The pattern of somatic hypermutation of Ig genes is altered when p53 is inactivated,” Molecular Immunology, vol. 47, no. 16, pp. 2611–2618, 2010. View at Publisher · View at Google Scholar · View at Scopus
  19. J. G. Jansen, P. Langerak, A. Tsaalbi-Shtylik, P. Van Den Berk, H. Jacobs, and N. De Wind, “Strand-biased defect in C/G transversions in hypermutating immunoglobulin genes in Rev1-deficient mice,” Journal of Experimental Medicine, vol. 203, no. 2, pp. 319–323, 2006. View at Publisher · View at Google Scholar · View at Scopus
  20. H. Saribasak, R. W. Maul, Z. Cao et al., “DNA polymerase ζ generates tandem mutations in immunoglobulin variable regions,” Journal of Experimental Medicine, vol. 209, no. 6, pp. 1075–1081, 2012. View at Publisher · View at Google Scholar · View at Scopus
  21. R. C. L. Beale, S. K. Petersen-Mahrt, I. N. Watt, R. S. Harris, C. Rada, and M. S. Neuberger, “Comparison of the different context-dependence of DNA deamination by APOBEC enzymes: correlation with mutation spectra in vivo,” Journal of Molecular Biology, vol. 337, no. 3, pp. 585–596, 2004. View at Publisher · View at Google Scholar · View at Scopus
  22. S. Nik-Zainal, L. B. Alexandrov, D. C. Wedge, P. Van Loo, C. D. Greenman, and K. Raine, “Mutational processes molding the genomes of 21 breast cancers,” Cell, vol. 149, no. 5, pp. 979–993, 2012. View at Publisher · View at Google Scholar · View at Scopus
  23. J. M. Di Noia and M. S. Neuberger, “Molecular mechanisms of antibody somatic hypermutation,” Annual Review of Biochemistry, vol. 76, pp. 1–22, 2007. View at Publisher · View at Google Scholar · View at Scopus
  24. G. Teng and F. N. Papavasiliou, “Immunoglobulin somatic hypermutation,” Annual Review of Genetics, vol. 41, pp. 107–120, 2007. View at Publisher · View at Google Scholar · View at Scopus
  25. I. M. Okazaki, H. Hiai, N. Kakazu et al., “Constitutive expression of AID leads to tumorigenesis,” Journal of Experimental Medicine, vol. 197, no. 9, pp. 1173–1181, 2003. View at Publisher · View at Google Scholar · View at Scopus
  26. H. D. Morgan, W. Dean, H. A. Coker, W. Reik, and S. K. Petersen-Mahrt, “Activation-induced cytidine deaminase deaminates 5-methylcytosine in DNA and is expressed in pluripotent tissues: implications for epigenetic reprogramming,” Journal of Biological Chemistry, vol. 279, no. 50, pp. 52353–52360, 2004. View at Publisher · View at Google Scholar · View at Scopus
  27. T. Honjo, M. Kobayashi, N. Begum, A. Kotani, S. Sabouri, and H. Nagaoka, “The AID Dilemma. Infection, or Cancer?” Advances in Cancer Research, vol. 113, pp. 1–44, 2012. View at Publisher · View at Google Scholar · View at Scopus
  28. G. P. Pfeifer and A. Besaratinia, “Mutational spectra of human cancer,” Human Genetics, vol. 125, no. 5-6, pp. 493–506, 2009. View at Publisher · View at Google Scholar · View at Scopus
  29. G. B. Golding, P. J. Gearhart, and B. W. Glickman, “Patterns of somatic mutations in immunoglobulin variable genes,” Genetics, vol. 115, no. 1, pp. 169–176, 1987. View at Google Scholar · View at Scopus
  30. E. J. Steele, A. Franklin, and R. V. Blanden, “Genesis of the strand-biased signature in somatic hypermutation of rearranged immunoglobulin variable genes,” Immunology and Cell Biology, vol. 82, no. 2, pp. 209–218, 2004. View at Publisher · View at Google Scholar · View at Scopus
  31. P. Zylstra, H. S. Rothenfluh, G. F. Weiller, R. V. Blanden, and E. J. Steele, “PCR amplification of murine immunoglobulin germline V genes: strategies for minimization of recombination artefacts,” Immunology and Cell Biology, vol. 76, no. 5, pp. 395–405, 1998. View at Publisher · View at Google Scholar · View at Scopus
  32. R. V. Blanden and E. J. Steele, “Misinterpretation of DNA sequence data generated by polymerase chain reactions,” Molecular Immunology, vol. 37, no. 6, p. 329, 2000. View at Publisher · View at Google Scholar · View at Scopus
  33. M. F. Goodman, “Error-prone repair DNA polymerases in prokaryotes and eukaryotes,” Annual Review of Biochemistry, vol. 71, pp. 17–50, 2002. View at Publisher · View at Google Scholar · View at Scopus
  34. E. J. Steele and J. W. Pollard, “Hypothesis: somatic hypermutation by gene conversion via the error prone DNA → RNA → DNA information loop,” Molecular Immunology, vol. 24, no. 6, pp. 667–673, 1987. View at Google Scholar · View at Scopus
  35. P. C. Hanawalt and G. Spivak, “Transcription-coupled DNA repair: two decades of progress and surprises,” Nature Reviews Molecular Cell Biology, vol. 9, no. 12, pp. 958–970, 2008. View at Publisher · View at Google Scholar · View at Scopus
  36. E. J. Steele, J. F. Williamson, S. Lester et al., “Genesis of ancestral haplotypes: RNA modifications and reverse transcription-mediated polymorphisms,” Human Immunology, vol. 72, no. 3, pp. 283–293, 2011. View at Publisher · View at Google Scholar · View at Scopus
  37. T. Thorslund, M. Sunesen, V. A. Bohr, and T. Stevnsner, “Repair of 8-oxoG is slower in endogenous nuclear genes than in mitochondrial DNA and is without strand bias,” DNA Repair, vol. 1, no. 4, pp. 261–273, 2002. View at Publisher · View at Google Scholar · View at Scopus
  38. F. Matsuda, K. Ishii, P. Bourvagnet et al., “The complete nucleotide sequence of the human immunoglobulin heavy chain variable region locus,” Journal of Experimental Medicine, vol. 188, no. 11, pp. 2151–2162, 1998. View at Publisher · View at Google Scholar · View at Scopus
  39. B. L. Bass, “RNA editing by adenosine deaminases that act on RNA,” Annual Review of Biochemistry, vol. 71, pp. 817–846, 2002. View at Publisher · View at Google Scholar · View at Scopus
  40. A. Herbert, J. Alfken, Y. G. Kim, I. S. Mian, K. Nishikura, and A. Rich, “A Z-DNA binding domain present in the human editing enzyme, double-stranded RNA adenosine deaminase,” Proceedings of the National Academy of Sciences of the United States of America, vol. 94, no. 16, pp. 8421–8426, 1997. View at Google Scholar · View at Scopus
  41. A. Herbert and A. Rich, “The role of binding domains for dsRNA and Z-DNA in the in vivo editing of minimal substrates by ADAR1,” Proceedings of the National Academy of Sciences of the United States of America, vol. 98, no. 21, pp. 12132–12137, 2001. View at Publisher · View at Google Scholar · View at Scopus
  42. Q. Wang, M. Miyakoda, W. Yang et al., “Stress-induced apoptosis associated with null mutation of ADAR1 RNA editing deaminase gene,” Journal of Biological Chemistry, vol. 279, no. 6, pp. 4952–4961, 2004. View at Publisher · View at Google Scholar · View at Scopus
  43. D. Paus, G. P. Tri, T. D. Chan, S. Gardam, A. Basten, and R. Brink, “Antigen recognition strength regulates the choice between extrafollicular plasma cell and germinal center B cell differentiation,” Journal of Experimental Medicine, vol. 203, no. 4, pp. 1081–1091, 2006. View at Publisher · View at Google Scholar · View at Scopus
  44. A. Gallo and F. Locatelli, “Adars: allies or enemies? The importance of A-to-I RNA editing in human disease: from cancer to HIV-1,” Biological Reviews, vol. 87, pp. 95–110, 2011. View at Publisher · View at Google Scholar · View at Scopus
  45. D. Dominissini, S. Moshitch-Moshkovitz, N. Amariglio, and G. Rechavi, “Adenosine-to-inosine RNA editing meets cancer,” Carcinogenesis, vol. 32, no. 11, pp. 1569–1577, 2011. View at Publisher · View at Google Scholar · View at Scopus
  46. M. Olivier, M. Hollstein, and P. Hainaut, “TP53 mutations in human cancers: origins, consequences, and clinical use,” in Additional Perspectives on the p53 Family, A. J. Levine and D. Lane, Eds., vol. 2, a001008 of Cold Spring Harbor Perspectives in Biology, pp. 1–17, 2010. View at Google Scholar
  47. A. Petitjean, E. Mathe, S. Kato et al., “Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database,” Human Mutation, vol. 28, no. 6, pp. 622–629, 2007. View at Publisher · View at Google Scholar · View at Scopus
  48. S. S. Hecht, “Tobacco carcinogens, their biomarkers and tobacco-induced cancer,” Nature Reviews Cancer, vol. 3, no. 10, pp. 733–744, 2003. View at Google Scholar · View at Scopus
  49. M. S. Greenblatt, W. P. Bennett, M. Hollstein, and C. C. Harris, “Mutations in the p53 tumor suppressor gene: clues to cancer etiology and molecular pathogenesis,” Cancer Research, vol. 54, no. 18, pp. 4855–4878, 1994. View at Google Scholar · View at Scopus
  50. M. F. Denissenko, A. Pao, M. S. Tang, and G. P. Pfeifer, “Preferential formation of benzo[a]pyrene adducts at lung cancer mutational hotspots in P53,” Science, vol. 274, no. 5286, pp. 430–432, 1996. View at Publisher · View at Google Scholar · View at Scopus
  51. M. F. Denissenko, A. Pao, G. P. Pfeifer, and M. S. Tang, “Slow repair of bulky DNA adducts along the nontranscribed strand of the human p53 gene may explain the strand bias of transversion mutations in cancers,” Oncogene, vol. 16, no. 10, pp. 1241–1247, 1998. View at Google Scholar · View at Scopus
  52. E. D. Pleasance, P. J. Stephens, S. O'Meara et al., “A small-cell lung cancer genome with complex signatures of tobacco exposure,” Nature, vol. 463, no. 7278, pp. 184–190, 2010. View at Publisher · View at Google Scholar · View at Scopus
  53. S. A. Martomo and P. J. Gearhart, “Somatic hypermutation: subverted DNA repair,” Current Opinion in Immunology, vol. 18, no. 3, pp. 243–248, 2006. View at Publisher · View at Google Scholar · View at Scopus
  54. T. M. Wilson, A. Vaisman, S. A. Martomo et al., “MSH2-MSH6 stimulates DNA polymerase η, suggesting a role for A:T mutations in antibody genes,” Journal of Experimental Medicine, vol. 201, no. 4, pp. 637–645, 2005. View at Publisher · View at Google Scholar · View at Scopus
  55. A. M. Bellizzi and W. L. Frankel, “Colorectal cancer due to deficiency in DNA mismatch repair function: a review,” Advances in Anatomic Pathology, vol. 16, no. 6, pp. 405–417, 2009. View at Publisher · View at Google Scholar · View at Scopus
  56. L. D. Wood, D. W. Parsons, S. Jones et al., “The genomic landscapes of human breast and colorectal cancers,” Science, vol. 318, no. 5853, pp. 1108–1113, 2007. View at Publisher · View at Google Scholar · View at Scopus
  57. C. Greenman, P. Stephens, R. Smith et al., “Patterns of somatic mutation in human cancer genomes,” Nature, vol. 446, no. 7132, pp. 153–158, 2007. View at Publisher · View at Google Scholar · View at Scopus
  58. A. Takai, H. Marusawa, and T. Chiba, “Acquisition of genetic aberrations by activation-induced cytidine deaminase (AID) during inflammation-associated carcinogenesis,” Cancers, vol. 3, no. 2, pp. 2750–2766, 2011. View at Publisher · View at Google Scholar · View at Scopus
  59. S. K. Petersen-Mahrt, H. A. Coker, and S. Pauklin, “DNA deaminases: AIDing hormones in immunity and cancer,” Journal of Molecular Medicine, vol. 87, no. 9, pp. 893–897, 2009. View at Publisher · View at Google Scholar · View at Scopus
  60. S. Pauklin, I. V. Sernández, G. Bachmann, A. R. Ramiro, and S. K. Petersen-Mahrt, “Estrogen directly activates AID transcription and function,” Journal of Experimental Medicine, vol. 206, no. 1, pp. 99–111, 2009. View at Publisher · View at Google Scholar · View at Scopus
  61. R. W. Maul and P. J. Gearhart, “Women, autoimmunity, and cancer: a dangerous liaison between estrogen and activation-induced deaminase?” Journal of Experimental Medicine, vol. 206, no. 1, pp. 11–13, 2009. View at Publisher · View at Google Scholar · View at Scopus