Toxoplasma gondii is a protozoan parasite capable of infecting humans and animals. Surface antigen glycoproteins, SAG2C, -2D, -2X, and -2Y, are expressed on the surface of bradyzoites. These antigens have been shown to protect bradyzoites against immune responses during chronic infections. We studied structures of SAG2C, -2D, -2X, and -2Y proteins using bioinformatics methods. The protein sequence alignment was performed by T-Coffee method. Secondary structural and functional domains were predicted using software PSIPRED v3.0 and SMART software, and 3D models of proteins were constructed and compared using the I-TASSER server, VMD, and SWISS-spdbv. Our results showed that SAG2C, -2D, -2X, and -2Y are highly homologous proteins. They share the same conserved peptides and HLA-I restricted epitopes. The similarity in structure and domains indicated putative common functions that might stimulate similar immune response in hosts. The conserved peptides and HLA-restricted epitopes could provide important insights on vaccine study and the diagnosis of this disease.

1. Introduction

Toxoplasma gondii (T. gondii) is a species of parasitic protozoa in the genus Toxoplasma that can be carried by many warm-blooded animals including humans [1]. There are three infectious stages in a complex life cycle of T. gondii: the tachyzoites, the bradyzoites, and the sporozoites [2]. A bradyzoite is a slowly replicating version of the parasite, which is responsible for chronic infection of T. gondii [3]. In chronic toxoplasmosis, the parasitophorous vacuoles containing the reproductive bradyzoites form cysts in the tissues of the muscles and brain [4].

The surface antigen of T. gondii that plays roles in the processes of host cell attachment and host immune evasion is dominated by a SRS (SAG1-related sequence) family of proteins which includes the SAG1-like sequence branch and the SAG2-like sequence branch [5]. SRS proteins are expressed in a stage-specific manner. SAG1, SAG2A, SAG2B, SAG3, SRS1, SRS2, and SRS3 are mainly expressed on the tachyzoite surface [6]. Studies have indicated that SAG2 members participate in the process of parasite’s invasion to the host, and their antibodies could block the further attachment of T. gondii on host cells [7, 8]. Previous studies have demonstrated that T. gondii parasites with a deletion of SAG2C, -2D, -2X, and -2Y gene cluster are less capable of maintaining a chronic infection in the brain [9]. It revealed that SAG2CDXY are important for persistence of cysts in the brain and these antigens might protect bradyzoites against an immune response. Contrary to SAG2A and SAG2B, which are expressed in tachyzoites, SAG2C, -2D, -2X, and -2Y appeared to be expressed exclusively on the surface of bradyzoites [9, 10]. However, among 160 members of the SRS family, only three proteins’ structures were reported. They are (i) the tachyzoite-expressed SAG1 [11], (ii) the bradyzoite-expressed BSR4 [12], and (iii) the sporoSAG [13]. The structure and function domains of SAG2C, -2D, -2X, and -2Y are still not very clear.

In this study, we sought to predict the structure and function domains of SAG2C, -2D, -2X, and -2Y by bioinformatics methods. The protein sequence alignments were performed by the T-Coffee method. Secondary structural and functional domains were predicted using the software PSIPRED v3.0 and SMART software. The 3D structure model of each protein was mapped using the I-TASSER server. The structural similarities of these proteins were summarized and possible functions of some key amino acids were predicted using the space confrontation by VMD and SWISS-spdbv. Furthermore, HLA-restricted epitopes of SAG2C, -2D, -2X, and -2Y proteins were predicted via algorithms.

2. Methods

2.1. Data Resources

The protein sequences were derived from ToxoDB 5.1 (http://toxodb.org/toxo/). Toxoplasma gondii has three common types: type I, T. gondii GT1 (TGGT1_chrX 7,429,598); type II, T. gondii ME49 (TGME49_chrX 7,419,075); type III, T. gondii VEG (TGVEG_chrX 7,553,721). The original resources are listed in Table 1.

2.2. Modular Architecture Identification

Multiple sequence alignment tool, T-Coffee (http://www.tcoffee.org/) [14, 15], was used to obtain the alignment analysis among SAG2C, SAG2D, SAG2X, and SAG2Y. The secondary structures were constructed using the software PSIPREDv3.0 (http://bioinf.cs.ucl.ac.uk/psipred/) [16, 17]. Simple modular architecture research identification and annotation of signaling domain sequences were analyzed via a web-based tool, SMART (http://smart.embl-heidelberg.de/) [18].

The 3D models of proteins were constructed by I-TASSER, a protein structure server on the website http://zhanglab.ccmb.med.umich.edu/I-TASSER/, which is considered to predict protein 3D structures that have more than 100 amino acids [1921]. VMD is a molecular visualization software for displaying, animating, and analyzing large biomolecular systems using 3D graphics and built-in scripts (http://www.ks.uiuc.edu/Research/vmd/). VMD was used to read standard Protein Data Bank (PDB) files and display the contained structure [2225]. Swiss-Pdb Viewer (http://www.expasy.org/spdbv/) is an application that provides a user friendly interface allowing analyses of several proteins at the same time. The proteins can be superimposed in order to obtain structural alignments and compare their active domains. We deduced amino acid mutations, H bonds, angles, and distances between atoms from the intuitive graphic and menu interface. 3D protein molecular fitness analysis was performed for SAG2C, -2D and SAG2X, -2Y [22, 23].

2.3. Conserved HLA-Restricted Epitopes Prediction

Consensus methods including ANN, SMM, and CombLib-Sidney in immune epitope database IEDB (http://www.immuneepitope.org/) were used to predict HLA-restricted epitopes [2628]. We used this tool to determine each peptide sequence's ability to bind to the specific HLA class I molecule.

3. Results and Discussion

3.1. Amino Acid Sequence Alignment Analysis

SAG2C, SAG2D, SAG2X, and SAG2Y are positioned next to each other on chromosome X. The molecular masses of SAG2C, -2D, -2X, and -2Y are 32–38 kDa, 18–20 kDa 31–34 kDa, and 28–30 kDa, respectively [9]. Multiple sequence alignment for SAG2C, -2D, -2X, and -2Y shows that the four proteins sequences have 97% similarity (Figure 1). In Particular, SAG2C (184 to 364) has a 98% sequence identity to SAG2D (14 to 196) and SAG2X (184 to 367) has a 99% sequence identity to SAG2Y (128 to 300). The protein sequence alignment analysis indicated that SAG2C, -2D, -2X, and -2Y have high homologous sequences. However, when including SAG2A and SAG2B in the alignment analysis, the consensus dropped to 73%, even though the consensus between SAG2A and SAG2B has very good score 84%. It indicates that a great difference exists among SAG2A, -2B and SAG2C, -2D, -2X, -2Y.

3.2. 2-D Structure Alignment for SAG2C, -2D, -2X, and -2Y Proteins

PSIPRED v. 3.0 was used to predict the secondary structures of SAG2C, -2D, -2X, and -2Y proteins. Figure 2 showed that SAG2C protein has two -helixes, 19 -strands, and 20 coils; SAG2D protein has one -helix, 9 -strands, and 10 coils; SAG2X protein has 3 -helixes, 14 -strands, and 18 coils; SAG2Y protein has two -helixes, 15 -strands, and 18 coils. Obviously, there was a long -helix on the C-terminal of all the proteins. SAG2D protein has similar secondary structure elements as SAG2C protein resides from 169 to 364. SAG2X and SAG2Y also have quite similar secondary structures except for a little discrepancy: SAG2X have one more helix than SAG2Y and one strand less than SAG2Y.

Furthermore, we used SMART to identify domains of these proteins (Figure 3). SAG2C, SAG2X, and SAG2Y all have two domains, while SAG2D only has one domain. SAG2D has an insertion of an adenosine, causing a frame shift and a premature stop codon, presumably leading to a truncated protein. SAG2C and SAG2D have transmembrane segments, while no transmembrane segments were identified on SAG2X and SAG2Y. From Figure 3, we could see that these proteins have no signal peptides, indicating that they are mature proteins. Members of the SAG2 family also differ in terms of open reading frame size, with the smaller SAG2D protein consisting of only one SAG domain, whereas SAG2C, SAG2X, and SAG2Y contain two SAG domains interrupted by a single intron. This indicates that SAG2C, SAG2X, and SAG2Y proteins have similar structure domains except SAG2D protein, which only has one domain.

3.3. Construction of 3D Model for SAG2C, -2D, -2X, and -2Y Proteins

3D model of SAG2C, -2D, -2X, and -2Y proteins were constructed by I-TASSER server. Five models were set up for each protein by Dr. Zhang’s lab [19]. We selected the model with highest confidence C-score, which estimates the quality of predicted models by I-TASSER. It was calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations [20]. C-score is typically in the range of , and model with a C-score above 2 suggested a high confidence.

Low temperature replicas (decoys) generated during the simulation were clustered by SPICKER and top five cluster centroids were selected to generate full atomic protein models. The cluster density was defined as the number of structure decoys at each unit of space in the SPICKER cluster. A higher cluster density meant that the structure occurs more often in the simulation trajectory and therefore a better quality model. Table 2 showed the parameters for construction D model of each protein.

The best model of each protein was selected and viewed via VMD program (Figure 4). SAG2C, -2X, and -2Y have obvious two domains, D1 and D2, which are formed by two -strands separated by one -helix; SAG2D has one domain which is formed by one -strand separated by one -helix. The -strands rotate to form a sheet tube that is a common character of these proteins. Furthermore, the binding sites of residues in the model were predicted and showed in Table 3.

Previous analysis of SAG2C, -2D, -2X, and -2Y structures revealed that the five on three sandwich fold of SAG2 was most similar to the T. gondii bradyzoite-expressed BSR4 with TM-scores of 0.583, 0.661, 0.672, and 0.670, respectively (Table 4). BSR4 is a prototypical bradyzoite surface antigen encoded in a cluster of SRS genes on chromosome IV, including the closely related paralogs SRS6 and SRS9 [8, 9]. Sequence alignment shows that SAG2C, -2D, -2X, and -2Y share 71% sequence identity with the tachyzoite-expressed BSR4. This observation is consistent with the prediction that stage-specific structural features might play an important role in the process of infection, dissemination, and pathogenesis in T. gondii. In BSR4, two strands are organized in an antiparallel fashion, followed by another strand on the lower face of the sandwich. The dimeric structure of SAG1 showed a sandwich, two parallel outside strands with an opposite one in between [29]. The overall topology of the five on three sandwich D2 domain is conserved between SAG2C, -2D, -2X, -2Y and BSR4. A detailed comparison of SAG2C, -2D, -2X, -2Y and BSR4 reveals a similarity in topology of the D1 and D2 domain consistent with the lower Z-score from the Dali search.

By comparison, the next most similar structure is SproSAG (surface antigen glycoprotein) with a substantially reduced TM-score [30, 31]. SporoSAG is a dominant surface coat protein expressed on the surface of sporozoites. SporoSAG crystallized as a monomer and displayed unique features of the SRS sandwich fold compared to SAG1 and BSR4 [9]. Intriguingly, the structural diversity is localized to the upper sheets of the sandwich fold and may have important implications for multimerization and host cell ligand recognition. By fit analysis, SAG2D fits well on the C-terminal of the protein SAG2C. SAG2X and SAG2Y fit pretty well from C-terminal to N-terminal (Figure 5).

3.4. Conserved HLA-Restricted CD8+ T Cells Epitope Prediction

Epitope prediction algorithm consensus was used to predict peptides that could stimulate human to induce effective and protective immune response against T. gondii. We want to see if they have similar epitopes scattered on the surface of their protein. The epitopes from SAG2C, -2D, -2X, and -2Y were predicted using the software from IEDB (http://www.immuneepitope.org/) which could identify novel HLA-class I restricted T cell epitopes derived from T. gondii. 16 peptides were selected based on a high HLA allele binding score (percentile rank < 3).

From Table 5, we can see three HLA-A*0201-restricted peptides: VVLGSAFMI, FMIAFISCF, AFISCFALV; four HLA-A*1101-restricted peptides: QVTVAVTSK, SSPQNIFYK, QVGTQTECK, KVLINIEEK; and two HLA-B*0702-restricted peptides: LPSSPQNIF, KPEAETPAT shared by SAG2C and SAG2D. From Table 6, we can see two HLA-A*0201-restricted peptides: ALVPNSSLV, VLSSSFMIV; three HLA-A*1101-restricted peptides: ALAITSTTK, SSAQTFFYK, KVLISVEKR; and two HLA-B*0702-restricted peptides: LPSSAQTFF, RPDSDATAT shared by SAG2X and SAG2Y.

More interestingly, when we marked the HLA-restricted epitopes on the alignment sequences of the proteins, we found that the epitopes restricted by the same type of HLA allele are located at the same domains of the proteins (Figure 6). Our results indicated that the epitopes from SAG2C, -2D, -2X, and -2Y can be recognized by the proper MHC-I molecular and present on the cell surface to induce immune response in the host T cells which might be helpful on vaccine study and diagnosis for this parasitic disease. Some identified peptides from these proteins have been proven to be recognized by PBMC cells from proper HLA-restricted T. gondii seropositive individuals and significantly induced IFN- production in T cells from immunized mice [32, 33] and therefore confirmed our predictions.

4. Conclusions

In this study, we have conducted a detailed bioinformatic and structural characterization analysis of the bradyzoite proteins SAG2C, -2D, -2X, and -2Y. The characterization of SAG2C, -2D, -2X, and -2Y provided structural view of the T. gondii SRS family members at chronic bradyzoite stage. Our bioinformatic analysis clearly showed that SAG2C, -2D, -2X, and -2Y are homologous protein members of the SAG2 subfamily. Consistently, our structural analysis demonstrated that SAG2C, -2D, -2X, and -2Y are similar to two other bradyzoite SAG2 members, BSR4 and SPOROSAG, rather than tachyzoite SAG1. This result indicated that SAG2 family has conserved structure at bradyzoite stage but a great difference from SAG1 at tachyzoite stage. Furthermore, the predicted conserved peptides and HLA-restricted epitopes shed interesting light on vaccine study and diagnosis for this parasitic disease.

Conflict of Interests

The authors wish to declare that there is no known conflict of interests associated with this publication and there has been no other significant financial support for this work that could have influenced its outcome.


This study was supported by a Grant from the National Natural Science Foundation Project of China (no. 81171604) and no. 49 China postdoc foundation.