Abstract

Nonhomologous end joining (NHEJ) plays a major role in double-strand break DNA repair, which involves a series of steps mediated by multiprotein complexes. A ring-shaped Ku70/Ku80 heterodimer forms first at broken DNA ends, DNA-dependent protein kinase catalytic subunit (DNA-PKcs) binds to mediate synapsis and nucleases process DNA overhangs. DNA ligase IV (LigIV) is recruited as a complex with XRCC4 for ligation, with XLF/Cernunnos, playing a role in enhancing activity of LigIV. We describe how a combination of methods—X-ray crystallography, electron microscopy and small angle X-ray scattering—can give insights into the transient multicomponent complexes that mediate NHEJ. We first consider the organisation of DNA-PKcs/Ku70/Ku80/DNA complex (DNA-PK) and then discuss emerging evidence concerning LigIV/XRCC4/XLF/DNA and higher-order complexes. We conclude by discussing roles of multiprotein systems in maintaining high signal-to-noise and the value of structural studies in developing new therapies in oncology and elsewhere.

1. Introduction

Nonhomologous End Joining (NHEJ) and Homologous Recombination (HR) comprise the two major modes of DNA double-strand break (DSB) repair in human cells. Although HR is dominant in late S/G2 phases when a sister chromatid is available [1], NHEJ, which does not require a template [2], plays a major role in G1/early S phase [1]. It is predicted that in humans about 50 endogenous DSBs per cell during each cell cycle may occur [3]. These are mainly generated by ionizing radiation, reactive oxygen species, and DNA replication across a nick [2]. Unrepaired DSBs can cause catastrophic gene loss during cell division, leading to chromosomal translocations, increased mutation rates, and carcinogenesis [4]. The NHEJ system is also responsible for programmed DSBs in V(D)J recombination [5] and class switch recombination [6] during development of immune diversity. NHEJ has an alternative end-joining pathway, which is mostly microhomology-mediated end joining [7] and is independent of NHEJ components [8]. Here NHEJ implies the main NHEJ pathway.

The NHEJ pathway comprises three major steps: synapsis, end processing and ligation [9]. Synapsis is carried out by DNA-dependent protein kinase (DNA-PK) consisting of Ku70, Ku80, DNA-PK catalytic subunit (DNA-PKcs), and DNA. Ku70 and Ku80 form a ring-shaped heterodimer around the broken DNA ends and maintain them in proximity [10, 11]. DNA-PKcs, a very large protein belonging to the phosphatidylinositol-3-OH kinase (PI3K)-related kinase (PIKKs) family [12], is recruited through interaction with the C-terminus of Ku80 [13, 14], and causes the Ku70/80 heterodimer to move about one helical turn inward from the end [15] to make space for DNA-PKcs to bind DNA. Two DNA-PK assemblies are probably required to hold the two DNA ends close together [16]. Activated DNA-PKcs phosphorylates itself and various proteins, including the other NHEJ components [17, 18]. Synapsis induces the autophosphorylation of DNA-PKcs and allows other NHEJ proteins access to DNA ends [19, 20].

The end processing involves nucleases such as Artemis [21], which is capable of cutting an array of DNA overhangs and is thought to be sufficient as a nuclease, although other nucleases in particular PNK, aprataxin (APTX), and PNK-APTX-like factor (PALF), a exonuclease, cannot be ruled out [22]. Artemis interacts with DNA-PKcs and opens DNA hairpins in the V(D)J recombination process [23]. Mutations in Artemis gene cause Radiosensitive Severe Combined Immunodeficiency (RS-SCID) [21]. Polymerases Pol and Pol use their BRCT domains to bind to Ku/DNA complexes and terminal deoxynucleotidyl transferase (TdT) is exclusively expressed in initial lymphoid cells to engage in NHEJ of the V(D)J recombination process [2426]. Furthermore, as recently shown Ku in its role as a lyase also participates in end processing cutting DNA at abasic sites, indicating that this protein, like its partner DNA-PKcs, has enzymatic properties and thus fulfils a number of roles in the NHEJ pathway [27].

The final ligation step of rejoining is mediated by DNA ligase IV (LigIV), which is associated with dimeric X-ray cross-complementation group 4 (XRCC4) [28]. These proteins form a very stable complex, which is maintained at 2 M NaCl or 7 M urea [29]. XRCC4 stimulates adenylation and ligase activity [3032]. Knockouts of these genes in mice result in the late embryonic lethality in the p53-dependent manner [3336] while mutations in lig4 gene result in LIG4 syndrome characterized by radiosensitivity, unusual facial features, microcephaly, developmental and growth delay, pancytopenia, and skin abnormality [37]. XRCC4-Like Factor (XLF)/Cernunnos (XLF), mutations of which in humans cause Severe Combined Immunodeficiency, also interacts with XRCC4, and enhances the ligation by LigIV [38, 39].

Here we review what is known of the architectures of the transient multicomponent complexes that mediate Nonhomologous End Joining. Figure 1 is an attempt to construct an interaction diagram that summarises our current understanding of NHEJ protein interactions and phosphorylation by DNA-PKcs, indicating where structural information is available. Although the existence of DNA-PK—the complex between DNA-PKcs, heterodimeric Ku and DNA—is clearly defined, as is the tight complex between XRCC4 and LigIV, the temporal and spatial organisation of higher-order complexes is unclear. Do subcomplexes exist that allow the Ku to get off the DNA before ligation, or is there one supercomplex in which DNA-PK, LigIV/XRCC4, and XLF coexist to achieve ligation? In this case, how does Ku leave when ends are ligated? In this paper, we first consider what is known about the structure of the huge single chain DNA-PKcs and how this might lead to a better understanding of the target for use in structure-guided drug discovery. We then discuss the organisation of DNA-PKcs/Ku70/Ku80/DNA complex (known as DNA-PK), in order to shed light on the initial events that take place in the NHEJ pathway. We discuss the emerging evidence concerning 3D structures of LigIV/XRCC4/XLF/DNA complexes, which should give clues about the binding and functional mechanism of LigIV/XRCC4 and XLF in NHEJ. Finally we consider the spatial arrangement of higher-order complexes in order to give a picture of the NHEJ repair system as a whole.

2. Structural Biology of Individual Components

Considerable advances have been made in the structural biology of individual components and complexes of the NHEJ repair machinery, but further work is required to understand the spatial organisation of this complicated and dynamic process. Here we discuss what is known about each component before discussing the multiprotein complexes that mediate their functions in NHEJ.

2.1. Ku70/80

The double-stranded (ds) DNA end-binding activity of Ku70 and Ku80 requires their association to form a heterodimer [40]. The crystal structure of the Ku70/Ku80 heterodimer reveals a similar topology and domain organisation, comprising an amino-terminal domain, a central -barrel domain, and a helical C-terminal arm [10]. These proteins, when associated, form a pseudosymmetrical structure, in which residues that contribute to the dimer interface show a low level of sequence identity (approximately 15%; Figure 2), favouring heterodimer formation over Ku70-Ku70 or Ku80-Ku80 homodimerisation.

The crystal structure of the Ku70/80 heterodimer in complex with one 55-nucleotide long Y-shaped DNA fragment shows that the Ku70/80 heterodimer adopts the shape of a ring that encircles duplex DNA (Figure 3). No large conformational changes occur on binding DNA to heterodimeric Ku except for the C-terminal domains of Ku70 and Ku80. Indeed, no contacts with DNA bases and only a few interactions with the sugar-phosphate backbone are made. The DNA duplex is embraced through the Ku70/80 preformed ring in such a way that one DNA face is relatively accessible to the solvent and therefore exposed to processing enzymes that remove damaged nucleotides and fill gaps prior to ligation. These features can provide structural support to broken DNA ends and bring the DNA helix into phase across the junction during end processing and ligation. Although Ku70/80 heterodimer shows low affinity for circularized DNA [43] and does not bind any DNA substrates shorter than 14 bp, it does bind dsDNA fragments of similar length and structure in a DNA sequence-independent fashion and irrespective of whether the DNA ends are blunt, with hairpin loops, or or overhangs.

2.2. DNA-PKcs

The structure of DNA-PKcs has proved quite elusive. Some beautiful work performed using cryo-electron microscopy single particle reconstruction of DNA-PKcs [4447] has given a good impression of the overall structure (see Figure 4(a)). This has now been complemented by work in our laboratory. We have shown that DNA-PKcs crystals can be grown and diffract to about 8.5 Å resolution but the diffraction is better for the complexes with C-terminal fragments of Ku80, presumably due to a stabilization of the DNA-PKcs in the complex leading to better ordering of the crystal packing. We have recently used multiwavelength anomalous dispersion with the heavy-metal cluster [48] to solve the structure of DNA-PKcs in complex with C-terminal domain of Ku80 at 6.6 Å resolution (Figure 4(b)).

Much of the DNA-PKcs polypeptide chain is constructed from HEAT repeat units (Figure 5) to form several separate domains. The DNA-PKcs tertiary structure measures 160 Å high and 120 Å across as viewed in Figure 6(a). From the N-terminus, HEAT-repeat motifs comprising about 66 -helices fold into a hollow circular structure, which when viewed from the side resembles a cradle (Figure 6(b)). The chain changes direction before the circle is complete, thus leaving a gap (Figure 6(a)). Within this circular structure the regularity of the HEAT repeats breaks down at certain points, as indicated in Figure 6(a) with blue arrows. These points of irregularity may play a part in conformational changes that have been implicated in the function of this molecule [16]. It is possible that these conformational changes could have a bearing on the size of the gap (Figure 6(a)), which may have a role in the release of DNA-PKcs from DNA ends when NHEJ is complete. The ring structure most likely acts as a platform for proteins that engage in repair of broken DNA and together with Ku holds in place the DNA while it is being repaired.

In the second part of the structure the polypeptide chain exploits HEAT repeats to fold into a small, globular, putative DNA-binding domain within the circular structure. It is known that DNA-PKcs binds both double-stranded and single-stranded DNA. Williams et al. (2008) have proposed that “the protrusion” in their cryo-EM structure binds DNA [47], and this protrusion is equivalent to the small globular domain located within the circular region of the crystal structure. This remains the best candidate for both single- and double-stranded DNA recognition, but further work on DNA-PK (DNA-PKcs, Ku, DNA complex) crystals at a higher resolution structure of DNA-PKcs will be needed to confirm this. Thirdly, the C-terminal region folds into the Head/Crown that is perched right at the top of the cradle shaped circular structure and extends further back. This part contains the FAT, kinase domain, FATC, and various parts where other proteins, as indicated by biochemical studies, may bind to form complexes with DNA-PKcs (Figure 7).

The core of the kinase structure from PI(3)K , one of the family members, was superposed onto this Head/Crown region resulting in a plausible fit to the N-lobe -strands and the C-lobe -helices (Figure 8). In this location the kinase is exposed and easily accessible to substrates (Figure 8). From the location of the kinase domain the positions of the FAT and FATC regions can be inferred (Figure 7) as the kinase domain likely “snuggles” in between these two regions [62].

The size of the monomer of DNA-PKcs is predicted by small angle X-ray scattering (SAXS) to be about 155 Å [63], broadly in agreement with that of the crystallographic structure. DNA-PKcs dimerizes without DNA in a concentration-dependent manner. SAXS data indicate a large conformational change between autophosphorylated and unphosphorylated DNA-PKcs; the dimension and radius of gyration of phosphorylated DNA-PKcs increased 25 and 2 Å, respectively, compared to mock DNA-PKcs. Also, shape reconstruction of phosphorylated DNA-PKcs shows a wider cleft between head and palm domains than in the unphosphorylated enzyme.

2.3. DNA Ligase IV

Human LigIV has also proved difficult to study in isolation due to instability and flexibility but it is stabilised by interaction with XRCC4 [28]. In human, LigIV is one of three ATP-dependent DNA ligases, I, III, and IV, and plays a central role in eukaryotic NHEJ. LigIV can be divided into the catalytic and interaction regions. There are excellent reviews of the comparison of structures of DNA and RNA ligases, and RNA capping enzymes elsewhere [6467].

LigIV belongs to the nucleotidyltransferase superfamily and carries out a three-step nucleotidyl transfer reaction: the formation of covalent enzyme-nucleotide monophosphate (NMP) intermediate (step 1), the transformation of the NMP to a -phosphate of polynucleotide (step 2), and the joining of the -phosphate with -hydroxyl to seal two polynucleotides (step 3) [68]. These enzymes have four common motifs (I, III, IV, V) and in addition two more motifs (III and VI), which are conserved among the cellular and ASF virus capping enzymes and eukaryotic ATP-dependent DNA ligases [69]. A recently found motif, Va, is well conserved among human DNA ligases [70]. The motif I KX(D/N)G has the catalytic lysine which forms the NMP-covalent intermediate. Motifs I-V are located in the nucleotidyltransferase domain (NTase) (Figure 9), the core of which comprises three mainly antiparallel -sheets flanked by six -helices [64]. Motifs Va and VI belong to the oligonucleotide/oligosaccharide-binding domain (OBD) (Figure 9), which has the five stranded Greek-key -barrel capped by an -helix [64, 71]. NTase and OBD are conserved among capping enzymes and DNA ligases [64]. Most LigIV syndrome mutations are found in the NTase and OBD [72] (Figure 9).

Many enzymes in the nucleotidyltrasferase superfamily have extra domains in addition to conserved catalytic core domains. N-terminal to the NTase, for instance, the human DNA ligases have DNA-binding domain (DBD), the three-dimensional structure of which was first uncovered by Pascal et al. (2004) with the catalytic domains of human DNA ligase I (LigI) complexed with an unligatable, nicked DNA fragment [73]. This domain is also found in archaeal DNA ligases [7476] and possibly other eukaryotic, Poxvirus, and archaeal DNA ligases [77]. Pascal et al. (2004) showed that DBD is essential for LigI to bind DNA and to carry out ligation of DNA nicks [73]. However, this does not seem to be the case for DNA ligase III [78], although most DNA-binding affinity of LigIV seems to come from its DBD (T Ochi and TL Blundell, unpublished results). These results suggest that DBD of each human DNA ligase has different DNA-binding properties, although they are likely to have similar structures [79]. Two LigIV syndrome mutations are severe only when they are combined with R278H, and they seem to have little impact on LigIV activity [37]. On the basis of structural similarities of the catalytic regions of LigI and LigIV, they are likely to bind DNA in a similar manner.

In addition to the catalytic region, human DNA ligases have extra domains [79]. LigIV has a tandem BRCT domain with a linker predicted to be mostly disordered. This linker seems to be important for the catalytic activity of LigIV [80] and has a phosphorylation site at T650 by DNA-PKcs, the phosphorylation of which stabilizes LigIV [81]. The BRCT domain, which typically has four parallel -strands surrounded by three -helices [82], is common in cell cycle checkpoint proteins that respond to DNA damage [83]. LigIV interacts with XRCC4 mainly through the linker between the two BRCT [80]. As noted above, in addition to the interaction with XRCC4, the first BRCT domain (BRCT1) has been shown to interact with Ku70/80 [84].

Structures of tandem BRCT domains of human BRCA1 and MDC1, yeast Crb2, Nbs1, and Brc1, have been solved with different phosphopeptides. Four key residues that form the phospho-serine binding pocket—the (S/T)G motif at the end of the first -strand ( 1) and the (S/T)XK motif at the beginning of the second helix ( 2) [8593] have been identified; the residues are conserved in the tandem BRCT domain of LigIV except that the second motif is replaced by NXR. Thus BRCT1 might bind to phosphoserines [94], although the interactions with the two proximal BRCT domains found in BRCA1, MDC1, Crb2, Nbs1, and Brc1 are unlikely to occur in LigIV as the tandem BRCT domains are probably positioned apart [94, 95]. Indeed, in vitro phosphopeptide binding experiments showed that BRCT domains of LigIV bound phosphopeptides [96, 97]. However, the precise sequence of a phosphopeptide that binds to the BRCT domains has not yet been determined.

Since the tandem BRCT domains have a common globular arrangement of the BRCT domains, LigIV domains may interact when LigIV is in the free form. The main dimerization interface of the tandem BRCT domain is 2 in BRCT1, and the first and third helices 1 and 3 in the second BRCT domain (BRCT2) [98]. Interestingly, the interaction surface of those BRCT domains and XRCC4 is similar to that of other tandem BRCT domains (as discussed below). It is possible that a BRCT domain from another protein interacts with 2 of BRCT1, which is exposed to the solvent. Thus, although the tandem BRCT domains of LigIV have a long linker, they have features in common with other tandem BRCT domains.

2.4. XRCC4

In solution, XRCC4 exists as a salt-dependent equilibrium of dimers and tetramers [102] see Figure 10. Tail-to-tail tetramerisation was observed in XRCC4 protein crystals [103]. Binding of LigIV with the XRCC4 C-terminal α-helical coiled-coil stabilizes the XRCC4 dimer formation. The binding region between XRCC4 and LigIV overlaps with the XRCC4 tetramerisation region, which may explain why LigIV functions to shift XRCC4 to dimer form in solution [102]. Whether tetrameric XRCC4 has a function during NHEJ repair pathway is still not known.

The protein sequence of XRCC4 after residue 213 is not included in XRCC4 crystal structures due to the expected highly disordered and flexible structure of XRCC4 C-terminal domain [103]. However, EM studies have revealed that mouse XRCC4 C-terminal structure is a globular domain [104]. This domain includes putative nuclear localization sequences [105]. These authors also suggested that a cluster of acidic amino acids 229–238 is important for the auto-transcription activity. Furthermore, the XRCC4 C-terminal domain is the target for NHEJ regulatory proteins. DNA-PKcs phosphorylates XRCC4 and regulates its binding with DNA [31]. Residues S260 and S318 in the XRCC4 C-terminal region were identified to be the main phosphorylation sites by DNA-PKcs [106]. XRCC4 is also phosphorylated by CK2, residue T233 and the phosphorylation by CK2 recruits PNK, which is likely to participate in NHEJ [107]. Indeed the structure of the ForkHead-Associated (FHA) Domain of PNK with a XRCC4-derived phosphopeptide has been solved [108]. XRCC4 residue K210 was also reported to be important for small ubiquitin-like modifier (SUMO) modification, which regulates XRCC4 cellular localization [109]. The XRCC4 C-terminal region, together with the N-terminal region (residues 1–28) and central region (residues 168–200), may facilitate cooperative DNA binding [31]. Thus, definition of the structure of the C-terminal region structure will contribute to understanding how XRCC4 binds to LigIV and DNA in order to carry out its function.

2.5. XLF

XLF was identified through a cDNA functional complementation cloning study of patient 2BN following discovery of a group of NHEJ deficiency patients (2BN) [38, 110]. It was also independently identified through yeast two-hybrid screening for XRCC4 interactors [39]. XLF is evolutionarily conserved throughout a wide range of eukaryotes such as vertebrates, insects, and even in filamentous fungi [111]. Full-length human XLF contains 299 residues. At its extreme C-terminus, a small conserved basic cluster constitutes the nuclear localization sequence. Using immunofluorescence staining, XLF was observed localizing in nucleus of human cells [39].

The crystal structure of XLF with a C-terminal truncation, solved independently at 2.3 Å resolution by Andres et al. [112] and in our laboratory [101], exists as a homodimer containing a globular N-terminal head domain and extended coiled-coil helical tail, which is folded back around the coiled-coil (Figure 11). The N-terminal head domain starts with a single helix 1, which is followed by a seven-stranded antiparallel structure sandwiching a helix-turn-helix motif between 4 and 5. The tail structure contains three helices 4, 5, and 6. While 4 extends away from N-terminal head domain around 60 Å, 5 and 6 fold back and make contact with the head domain. The 4 helices from the two protomers interact as a coiled-coil structure burying highly conserved and hydrophobic residues at the interface. This dimerization of XLF is further enhanced through the folding back of the 5 and 6 helices to encircle the 4 helices of the other protomer to form a clamp, leading to burying of a surface area of 6500 Å2. Gel filtration, protein crosslinking and analytical ultracentrifugation are also consistent with a stable homodimer form of XLF in solution [101]. XLF was found to have concentration-dependent higher-order complex formation during gel filtration experiments [112]. The homodimer of XLF, however, is the smallest stable functional unit.

Due to the predicted disordered structure for the XLF C-terminal region after residue 245, around 70 residues were removed from the XLF C-terminus in the crystal structure analyses [101, 112]. The approximate location of the XLF C-terminal region, however, can be predicted to be near the N-terminal head domain region according to the helix 6 direction.

Although XLF and XRCC4 have similar architectures, large structural differences from head to tail occur between these two proteins. For the head domain, both proteins contain the same seven-stranded antiparallel -structure sandwiching a helix-turn-helix motif, but XLF contains an extra helix at the N-terminus. As we have seen, the tail structure of XLF contains distinct helices folding back, while the extended coiled-coil tail structure of XRCC4 contains the LigIV binding region near the C-terminus. The differences in sequence and structure between XLF and XRCC4 tails explain why LigIV does not bind to XLF in the same way as XRCC4.

The functions and mechanisms of action of XLF in NHEJ are still not fully understood. XLF not only stabilizes LigIV/XRCC4 at broken DNA ends, but also enhances the LigIV/XRCC4 end-joining process. XLF has also been found to be essential for repairing mismatched overhangs and the gap-filling process together with DNA polymerase pol and pol [113, 114]. Understanding how XLF functions in NHEJ through studying its interaction with other NHEJ proteins structurally will help unravel the exact role of XLF. It will contribute towards our current understanding of DNA repair in NHEJ and may also potentially lead to future therapeutic application for NHEJ defects patients.

3. Structural Biology of Complexes

3.1. DNA-PKcs/Ku70/Ku80/DNA Ternary Complex (DNA-PK)

The crystal structure of the Ku70/80 heterodimer does not include the C-terminal DNA-PKcs interaction domain of Ku80 (Ku80CTD), which is dispensable for the binding of Ku70/80 to DNA but is required for DNA-PK recruitment to the sites of damaged DNA [13, 14]. Nuclear magnetic resonance analysis of 19 kDa Ku80CTD (residues 545–732) defines an -helical structure [115, 116]. Further structural studies of full-length Ku70/80 with and without DNA have been conducted using single-particle electron microscopy (EM) [117] and SAXS combined with live cell imaging [63]. The position of Ku80CTD was proposed to be under the / domain of Ku70 by EM, but the domain was found to be flexible in the SAXS study. Molecular dynamic simulations of Ku80CTD produced an ensemble of conformations, supporting the idea of Ku80CTD being a region of high flexibility [63]. Taken together, the studies show that association of Ku70 and Ku80 to form a heterodimer is required for binding dsDNA ends, that Ku-dependent DNA binding drives the recruitment of DNA-PKcs and that the latter interaction involves the helical domain located at the C-terminus of Ku80. Although Ku80CTD was included in the crystal structure of DNA-PKcs, its position could not be unequivocally defined, presumably due to the dominance of similar alpha helical structures in the DNA-PKcs itself [48].

Insights into the DNA-PKcs/Ku70/Ku80 holoenzyme structures and possible synaptic complexes have been obtained using cryo-electron microscopy and SAXS. Boskovic et al. (2003) used electron microscopy at low resolution ( 30 Å) to demonstrate large conformational changes in human DNA-PKcs when double-stranded DNA binds, and suggested that this may correlate with the activation of the kinase [118]. Subsequently, Spagnolo et al. (2006) have used single-particle electron microscopy at 25 Å resolution to study human DNA-PKcs/Ku70/Ku80 holoenzyme assembled on DNA [16]. They again found evidence for conformational changes on binding of Ku and DNA to DNA-PKcs. They identified dimeric particles comprising two DNA-PKcs/Ku70/Ku80 holoenzymes, which they consider are likely to be synaptic complexes, maintaining broken ends and providing a platform for other components required for end processing and ligation. A SAXS study of DNA-PK revealed that it had two different modes of dimerization as was observed previously with DNA-PKcs [63]. Depending on the presence of either 40 bp hairpin DNA or 40 bp Y-shaped DNA, DNA-PK formed the head-to-head or palm-to-palm dimer. Very recently Perry et al. (2010) have taken study of the DNA-PKcs/Ku70/Ku80 holoenzyme further by analyzing their earlier SAXS studies in the light of the crystal structure of DNA-PKcs [119]. They have impressively demonstrated that DNA-PK phosphorylation causes a large conformational change, sufficient to open the gap in the ring and provide access to or release from DNA. Ku80CTD has been shown to be flexible and to extend in solution to the benefit of recruitment of DNA-PKcs. It is possible that Ku80 interacts with DNA-PKcs on both sides of BSB [63].

3.2. DNA Ligase IV/XRCC4 Complexes

LigIV is stabilized by forming a tight complex with XRCC4 [28]. About 99% of LigIV is preadenylated when purified together with XRCC4 and it is difficult to readenylate after single-nick ligation [120], implying that the LigIV/XRCC4 complex is ready to ligate DNA. Unlike other human DNA ligases LigIV/XRCC4 can efficiently ligate one of the nicks of a DSB, although the other is unligatable [26], and it can ligate DNA strands across gaps and fully incompatible ends [121]. Furthermore, it has been shown that LigIV can ligate single-stranded poly-T DNA [122]. Interestingly, the ligation efficiency is higher with long DNA substrates 157 bp than short ones 53 bp [123]. This might be related to the observation that a single LigIV/XRCC4 bridges two DNA ends [124].

The crystal structures of the XRCC4 dimer complexed with the tandem BRCT domain of LigIV shows that the linker between the two BRCT domains is well ordered and forms a helix-loop-helix (HLH) clamp around the coiled-coil [29, 94] (Figures 12(a) and 12(b)). The same interaction mode and secondary structure arrangement are observed in the orthologous yeast complex between XRCC4 (Lif1p) and LigIV (Lig4p) [95] (Figure 12(c)). The two BRCT domains in the human and yeast complexes extend the clamp, encircling the coiled-coil domain. The -helix in BRCT1 is located close to the conserved XRCC4 interaction region of the linker (XIR: residue 748–784) between two BRCT domains. The corresponding -helix in BRCT1 of 53BP1 participates in the interaction surface of p53 [125, 126]. The interaction of XRCC4 with LigIV produces a kink in one helix of the coiled-coil of XRCC4 dimer and switches the left-handed heptad repeat into a right-handed undecad coiled-coil; as a result, the LigIV interaction surface becomes flat [29, 94]. The kink bends in the opposite direction in the complex between XRCC4 with XIR and with the tandem BRCT domain [94]. The former structure might be an intermediate state of LigIV/XRCC4 interaction. If so, this dynamical conformational change might have a biological role in vivo. This kink does not appear in Lif1p/Lig4p even though the refinement of the structure against a new 3.5 Å diffraction data set was carried out (see [127], T Ochi and TL Blundell, unpublished results). Thus, the kink may be unique to human and some other higher organisms.

The second helix in HLH mediates a hydrophobic interaction with the opposite side of the flat surface of the XRCC4 to where XIR interacts [94] and a similar extensive hydrophobic interaction is observed in Lif1p/Lig4 (residues 827–839) [95]. LigIV additionally interacts with the coiled-coil of XRCC4 via 1 and 3 of BRCT2, in a manner that resembles the interaction between BRCT1 and BRCT2 of other tandem BRCT domains. Superposition of LigIV/XRCC4 and Lif1p/Lig4p based on XIR and the corresponding region of Lig4p (LIR) shows that, apart from the kink described above, a further change occurs in the position of BRCT1 (Figure 13). This may be a crystallographic artefact because BRCT1 is closely packed with BRCT2 belonging to another molecule in both human and yeast structures. However, the NMR structure of BRCT1 (PDB code: 2E2W) has the same conformation as the crystallographic one, suggesting at least human BRCT1 and the following linker is likely to have the same conformation in solution.

A recently published EM structure of the LigIV/XRCC4 complex shows the N-terminal of LigIV in proximity to the head domain of XRCC4 [128]. The authors compared two LigIV/XRCC4 constructs, one with the full-length sequences, and the other with a full-length LigIV and a truncated XRCC4 (residues 1–213). From the differences of the two EM images, they determined the position of the C-terminal of XRCC4 and by labelling the hexahistidine tag with gold, they identified the N-terminus of LigIV. Although the authors reconstructed 2D averaged images of LigIV/XRCC4, the 3D reconstruction failed partially because of heterogeneity of the LigIV/XRCC4 conformation. Thus, they proposed that the catalytic region of LigIV is connected to the C-terminal region by a flexible linker and this may have functional importance (see also Perry et al. (2010) [119]).

We have carried out SAXS studies of the tandem BRCT domain of LigIV/mutated XRCC4 (BmX4) and LigIV/mutated XRCC4 (LmX4) in order to investigate the conformation of the catalytic region in solution (Figure 14(a)) [129]. Here, mutated XRCC4 is identical to the one used for solving the structure of XIR/XRCC4 [29]. The linearity of the respective Guinier plots confirmed that the protein solutions were homogeneous and monodisperse (Figure 14(b)). The deduced radius of gyration and the maximum molecular dimension of LmX4 are 9 Å and 43 Å larger, respectively, than those of BmX4. The simulated scattering profile using the crystallographic structure of BmX4 (PDB code: 3II6) fitted the measured SAXS curve well ( , data not shown). Moreover, the ab initio 3D shape restoration of BmX4 reproduced an overall conformation consistent with the crystal structure (Figure 14(c)). The ab initio shape reconstruction of LmX4 revealed that the catalytic region may contribute additional density to the head domain of XRCC4 or the tandem BRCT domain of LigIV when compared with the conformation of BmX4 (Figure 14(c)). Since the extended, open conformation of the catalytic region in solution has also been observed in an archaeal DNA ligase [74], the extra density may correspond to a similar conformation of the catalytic region of LigIV to the closed conformation observed in other archaea DNA ligases [75, 76]. As the shape restorations of LmX4 yielded a reproducible conformation (also indicated by the normalized spatial discrepancy (NSD) value after shape averaging), this finding might imply interactions between the catalytic region and BmX4. However, electrophoretic mobility shift assay and protease analysis (data not shown) indicate that the catalytic region is unlikely to have strong interactions with BmX4. Thus, although the majority of LmX4 in solution may have the extended open conformation, the catalytic region is flexibly attached to BmX4. Our observations agree with the EM study [128].

3.3. XLF/XRCC4 Complexes

The interaction between XLF and XRCC4 is salt sensitive, it does not depend on DNA [39, 133] and interactions occur through the head regions as shown by yeast two-hybrid study of various mutants [134]. XLF bound to beads at its C-terminal was still able to pull down LigIV/XRCC4, implying that the C-terminal of XLF is not important for interaction with LigIV/XRCC4 [135].

Mutagenesis studies indicate that the structurally exposed XLF residue L115 (Figure 16 shown in green) located in the 6- 7 loop is important for XLF/XRCC4 interaction [112]. Residues K63, K65, and K99 (Figure 15 shown in green) of XRCC4 are essential for interaction and are located on the region of the head domain close to the helical tail [112]. Nonessential interaction residues of XLF are mainly located outside the head domain region whereas the nonessential XLF/XRCC4 binding residues in XRCC4 are mainly located on the topside of N-terminal head domain and on the helical tail structure before the LigIV binding region (Figures 15 and 16 shown in grey) [112]. These studies are consistent with a linear side-by-side interaction model, in which XLF head domains slide into the space created by XRCC4 head domains and N-terminal part of the tail structure [112] (Figure 17). However, we cannot exclude a model for XLF/XRCC4, involving XLF and XRCC4 binding together in a side-by-side manner but with a degree of twist introducing curvature and possibly a circular complex. This would have the advantage of forming a finite and discrete complex. Further X-ray small angle scattering experiments may be the best approach to resolving this, especially if the complexes are dynamic as gel filtration experiments suggest. However, some encouragement that well-defined complexes can be identified is found in the observation that XLF/XRCC4 complexes have been crystallized and X-ray data collected, albeit to low resolution (Q Wu, TL Blundell unpublished data).

3.4. Spatial Arrangement of Higher-Order Complexes

In order to give a picture of the spatial and temporal organisation of the NHEJ repair system as a whole, an understanding of the order of interactions during the assembly of the DNA-PKcs/Ku70/Ku80/DNA ternary complex and the LigIV/XRCC4/XLF/DNA quaternary complex will be essential.

Ku70/80 and DNA-PKcs, which have higher DNA-binding affinity compared to LigIV/XRCC4/XLF, most likely form the DNA-PKcs/Ku70/Ku80/DNA ternary complex first. For the following LigIV/XRCC4/XLF/DNA complex formation, the order and dynamics of protein assembly are still to be determined. The interaction between XRCC4 and XLF is relatively weak compared to the strong binding between XRCC4 and LigIV. It is not clear whether the XLF-dimer interactions with XRCC4-dimer are maintained when the ligase is recruited. Protein interaction assays have confirmed the XRCC4 independent, XLF recruitment to DSBs ends through interaction with Ku70/80 only in the present of DNA. This may imply that XLF can act independently without XRCC4.

Live cell imaging techniques have identified the immediate recruitment of XLF to laser-induced DSBs with Ku70/80 protein bound [136]. XRCC4 is dispensable for XLF recruitment to DNA ends, but its presence can stabilize the XLF/DNA interaction [136]. Protein interaction assays have confirmed the interaction between Ku70/80 and XLF, and this interaction only occurs in the presence of DNA [136].

Both XRCC4 and XLF require a long piece of DNA for binding. How DNA is structurally involved in all the higher-order protein complexes is of fundamental interest. The phosphorylation of LigIV, XRCC4, and XLF by DNA-PKcs does not interfere greatly with the core functions of these proteins, but could alter the relative binding affinities of various protein-protein or protein-DNA interactions, which are important for correct spatial arrangement of the higher-order complexes. All of this uncertainty underlines the need for further studies to characterise complexes temporally as well as spatially.

4. Discussion

The challenges of structural characterisation of dynamic multiprotein systems clearly demand a combination of SAXS, EM, X-ray crystallography, and other approaches. All will be advantaged by methods for stabilization and fixation of the complexes. Modified constructs, for example, phospho-mimicking mutation and truncation as well as postmodification, for example, phosphorylation and methylation, need to be explored in order to identify stable complexes. For single particle cryo-EM studies, GraFix has been successfully introduced to stabilize macromolecules [137]. This exploits glycerol gradient centrifugation into increasing concentrations of chemical fixation reagent to stabilize individual macromolecules and to prevent opportunistic aggregation. A similar approach might be used for other structural studies including X-ray crystallography, although here molecular surface modification of the complexes may prevent the formation of ordered crystals.

Crystals of large multiprotein assemblies suitable for high-resolution X-ray diffraction remain a challenge. So the development of methods to analyse low-resolution X-ray diffraction data is essential. In this respect free-electron laser (FEL) light sources may allow single particle X-ray FEL (XFEL) imaging. X-ray crystallography with nanocrystals is also a promising method.

X-ray crystallography is still the only technique to give atomic resolution of large structures and high resolution is essential for studying the binding of small molecules. Indeed chemical tools that allow specific intervention in NHEJ should allow dissection of the roles of the various components. These tools would also likely contribute to the discovery of lead compounds and preclinical candidates for therapeutic intervention at allosteric and other regulatory interaction sites in oncology and for patients with defects in the NHEJ pathway.

The immediate interest, developing from the emerging structure of DNA-PKcs, is the improvement of design of inhibitors that bind at the ATP site of the protein kinase moiety. Such inhibitors would not only inform the development of useful therapeutic agents but should also be of immediate value in investigating the possibility of improving stability of the kinase domain, and the quality and resolution of crystals.

Eventually, we would hope to pursue a structure-guided approach to optimize the design of such inhibitors. Similar approaches could be taken with the ligase active site. In our view a more exciting and adventurous approach would be to design new chemical entities that bind at allosteric sites, templates or adaptor binding sites—so called allo-targeting—that are critical to the activation, colocalisation and/or specificity of the regulation of NHEJ. The use of fragment-based methods [138140] in this context is attractive. Likely targets would be the head-to-head interactions of XRCC4 and XLF, the interactions of BRCT domains, and the interaction of Ku70/80 and the DNA-PKcs.

In conclusion, a spatial and temporal understanding of NHEJ should provide insights into the mechanism of this critical cellular process and also suggest approaches to designing useful chemical tools. Indeed the design of small chemical agents that noncovalently modulate interactions would also likely contribute to the discovery of lead compounds that allow therapeutic intervention in oncology and treatment of patients with defects in the NHEJ pathway.

Acknowledgments

The authors would like to thank Dr. J. Günter Grossmann for useful discussions and comments on the SAXS data. B. L. Sibanda and D. Y. Chirgadze were supported by Wellcome Trust Programme Grant: 079281/Z/06/Z. T. Ochi was supported by Overseas Research Studentship (ORS). Molecular graphics except for Figures 4 and 14 were prepared by using PyMOL [141].