Abstract

The structural model of begomovirus AC1 protein is useful for understanding biological function at molecular level and docking study. For this study we have used the ProSA program (Protein Structure Analysis) tool to establish the structure prediction and modeling of protein. This tool was used for refinement and validation of experimental protein structures. Potential problems of protein structures based on energy plots are easily seen by ProSA and are displayed in a three-dimensional manner. In the present study we have selected different AC1 proteins of begomovirus strains (YP_003288785, YP_002004579, and YP_003288773) for structural analysis and display of energy plots that highlight potential problems spotted in protein structures. The 3D models of Rep proteins with recognized errors can be effectively used for in silico docking study for development of potential ligand molecules against begomovirus infection.

1. Introduction

Geminiviruses were recognized in 1978 by the International Committee on the Taxonomy of viruses on the basis of their unique virion morphology and possession of ssDNA as their genomic material [1, 2]. Geminiviridae is one of the largest plant virus family; its members have a circular, single-stranded DNA (ssDNA) genome of approximately 2.7–5.2 kb encapsulated within twinned (geminate) icosahedral virions. The protein coat of geminiviridae consists of one type protein molecule of about 28 kd molecular weight. Based on their genome arrangement and biological properties, geminiviruses are classified into one of four genera: Mastrevirus, Curtovirus, Topocuvirus, and Begomovirus [3].

Begomoviruses, currently hold 200 species [4] and contain dicotyledonous infecting whitefly transmitted viruses in the family Geminiviridae, have either bipartite genomes (DNA-A and DNA-B) or monopartite genomes resembling DNA-A. DNA-A typically has six open reading frames (ORFs): AV1/V1 (coat protein, CP) and AV2/V2 (AV2/V2 protein) on the virion-sense strand and AC1/C1 (replication initiation protein, Rep), AC2/C2 (transcriptional activator, TrAP), AC3/C3 (replication enhancer, REn), and AC4/C4 (AC4/C4 protein) on the complementary-sense strand. DNA-B has two ORFs, encoding movement proteins: BV1 (nuclear shuttle protein, NSP) on the virus-sense strand and BC1 (movement protein, MP) on the complementary-sense strand [5].

Computational methods can be applied for the prediction of unknown structures of experimental and theoretical models of virus proteins [6, 7], but the problem in structural biology is the recognition of errors in experimental and theoretical models of protein structures. The ProSA tool (https://prosa.services.came.sbg.ac.at/) verifies the three-dimensional experimental and the theoretical models of protein structures that have prospective errors.

The application of computational methods [8, 9] and server (e.g., NAR web server) for the prediction of unknown structures adds a plethora of structural models [10, 11] to the study. The analysis of protein structures is generally a difficult and cumbersome exercise. The new service presented here is a straightforward and easy to use extension of the classic ProSA program, which exploits the advantages of interactive web-based applications for the display of scores and energy plots that highlight potential problems spotted in protein structures. To check 3D models of protein structures for potential errors, ProSA [12] is a widely used tool. Its range of application includes error recognition in experimentally determined structures [1315], theoretical models [1619], and protein engineering [20, 21]. For in silico ligand designing to be an effective inhibitor, Rep protein of selected begomovirus strains (YP_003288785, YP_002004579, and YP_003288773), which is responsible for replication, was used. This is the highlight of this study.

2. Materials and Methods

For the present study different bioinformatics tools and databases were used for molecular modeling of Rep protein of begomovirus strains (YP_003288785, YP_002004579, and YP_003288773), for example, GenBank-NCBI, PDB (Protein Data Bank), UCLA-DOE, RAMPAGE server, and so forth. Rep proteins sequence of begomovirus strain (YP_003288785, YP_002004579, and YP_003288773) was retrieved in FASTA format from NCBI database for homology modeling. Homology modeling procedure was performed in four basic sequential steps: template selection, target template alignment, model construction, and model assessment, and ProSA tool was used for potential errors detection [22]. ProSA-web requires the atomic coordinates of the model to be evaluated. The z-score indicates overall model quality and measures the deviation of the total energy of the structure with respect to an energy distribution derived from random conformations. Z-scores outside a range characteristic for native proteins indicate erroneous structures. In order to facilitate interpretation of the z-score of the specified protein, its particular value is displayed in a plot that contains the z-scores of all experimentally determined protein chains.

Sequences retrieved from NCBI:

>gi|262530246|ref|YP_003288785.1| Rep [Sweet

potato leaf curl Lanzarote virus]MPRAGRFN

IKAKNYFLTYPQCSLTKEEALDQLLHLNTPTNKKFIKICR

ELHENGEPHLHVLLQFEGNYQCTNQRFFDLVSPSRSSHFH

PNIQRAKSSSDVKSYVDKDGDTIEWGEFQVDGRSARGGQQ

TANDAAAEALNSGSKEAALQIIREKLPEKFIFQYHNLCGN

LDRIFSPPPSVYSSPFSSSSFNAVPDIISDWAAENVMDSA

ARPDRPISIVIEGPSRIGKTVWARSLGPHNYLCGHLDLSP

KVYSNSAWYNVIDDVNPQYLKHFKEFMGAQKDWQSNCKYG

KPVQIKGGIPTIFLCNPGEGSSFKLWLDKPEQGALKNWAT

ANAIFCDVQSPFWVQEEVSHSGATAHRGEEGQEESS

>gi|194271409|ref|YP_002004579.1| Rep [Sweet

potato leaf curl Spain virus]

MPRAGRFNINAKNYFLTYPQCSISKEEALAQILNIPTAVN

KKFIKICRELHEDGQPHLHVLLQFEGKFQCTNQRLFDLVS

QTRSAHFHPNIQRAKSSSDVKSYVDKDGDTLEWGEFQVDG

RSARGGQQTANDAAAEALNAGSKDAALQIIREKLPEKFIF

QYHNLVSNLDRIFSPPPSVYSSPFSISSFNNVPDIISDWA

AENVMDAAARPERPISIVIEGPSRMGKTVWARSLGPHNYL

CGHLDLSPKVYSNSAWYNVIDDVNPQYLKHFKEFMGAQKD

WQSNCKYGKPVQIKGGIPTIFLCNPGEGSSFKLWLDKPEQ

EALKNWAVKNAVFCDVDSPFWIQEEVSHSGTNTRGGQEEP

EENS

>gi|262530241|ref|YP_003288773.1| REP [Sweet

potato leaf curl Canary virus]

MPRKQGFRVQAKNIFLTYPKCSLSKEQALEQLRATHCPSD

KLFIRVSQEKHQDGSLHLHVLIQFKGKAEFKNPRHFDLHH

PHNSSQFHPNFQAAKSSSDVKSYIEKDGDYLDWGEFQIDG

RSARGGQQTANDAAAEALNAGSKEAALQIIREKLPEKYIF

QYHNLVSNLDRIFSPPPAVYCSPFSSSSFNNVPDIISDWA

AENVMDSAARPDRPISIVIEGPSRIGKTVWARSLGPHNYL

CGHLDLSPKVYSNSAWYNVIDDVNPQYLKHFKEFMGAQKD

WQSNCKYGKPVQIKGGIPTIFLCNPGEGSSFKLWLDKPE

QEALKNWALKNAIFCDVQSPFWVQEEVSGAGAITRSSEE

GQEESS

Procheck [23] outcomes are displayed in the form of profile search and Ramachandran plots. The models were checked with Verified-3D server [24] and Ramachandran plot at RAMPAGE [25] server. PDB files of Rep protein were used for evaluation through ProSA-web (https://prosa.services.came.sbg.ac.at/prosa.php/) that requires the atomic coordinates’ file of protein.

3. Results and Discussion

A particular intention of the ProSA-web application is to encourage structure depositors to validate their structures before they are submitted to PDB and to use the tool in early stages of structure determination and refinement. Rep proteins 3D models with recognized errors were used for development of potential ligand molecules against begomovirus infection through docking process. A good quality Ramachandran plot has over 90% in the most favored regions [26] but the Ramachandran plot of YP_003288785.pdb has only 87.3% of residues in the most favoured regions. Therefore it is a near to good quality model (Table 1, Figure 1(a)). Similarly, the Ramachandran plot of YP_002004579 (Figure 1(b)) and YP_003288773 (Figure 1(c)), respectively, has 79.7% and 85.5% residues in the most favored regions. Figure 2 shows the results for a monomer of Rep proteins of Sweet potato leaf curl Lanzarote virus, Sweet potato leaf curl Spain virus, and Sweet potato leaf curl Canary virus [27].

The ProSA-web results indicate that Rep proteins have features characteristic for native structures. Figure 2(a) depicts the ProSA-web z-scores of all protein chains in PDB (Table 2) determined by X-ray crystallography (light blue) or NMR spectroscopy (dark blue) with respect to their length [22]. The plot shows only chains with less than 1,000 residues and a z-score of 10. The z-scores of Rep proteins are highlighted as large dots.

Figure 2(b) shows the energy plot of Rep proteins. The energy plot shows the local model quality by plotting energies as a function of amino acid sequence position . Residue energies averaged over a sliding window are plotted as a function of the central residue in the window. A window size of 80 is used due to the large size of the Rep protein chain (default: 40). In general, positive values correspond to problematic or erroneous parts of a model. A plot of single residue energies usually contains large fluctuations and is of limited value for model evaluation. Hence the plot is smoothed by calculating the average energy over each 40-residue fragment , + 39, which is then assigned to the “central” residue of the fragment at position + 19.

In order to further narrow down those regions in the model that contribute to a bad overall score, ProSA-web visualizes the 3D structure of the protein using the molecule viewer, Jmol. Figure 2(c) illustrates the Jmol Ca trace of Rep proteins. Residues are colored from blue to red in the order of increasing residue energy.

4. Conclusion

PDB files sometimes contain errors and generally remain unknown until the corresponding revisions are made available to the structural community. Hence, ProSA is a diagnostic tool that is based on the statistical analysis of all available protein structures. By using subsequent independent X-ray analysis, we studied Rep proteins of Sweet potato leaf curl Lanzarote virus, Sweet potato leaf curl Spain virus, and Sweet potato leaf curl Canary virus that are known to be incorrect, yielding a completely different conformation. The 3D models of Rep proteins with recognized errors can be effectively used for in silico docking study for development of potential ligand molecules against begomovirus infection.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this article.

Authors’ Contribution

Rajneesh Prajapat and Avinash Marwal contributed equally to the work.

Acknowledgments

The authors would like to acknowledge a vote of thanks to the Department of Biotechnology (DBT Project no. BT/PR13129/GBD/27/197/2009) and the Department of Science and Technology (DST Project no. SR/FT/LS-042/2009), India, for their financial support.