Abstract

Perturbation methods add variation terms to a known experimental solution of one problem to approach a solution for a related problem without known exact solution. One problem of this type in immunology is the prediction of the possible action of epitope of one peptide after a perturbation or variation in the structure of a known peptide and/or other boundary conditions (host organism, biological process, and experimental assay). However, to the best of our knowledge, there are no reports of general-purpose perturbation models to solve this problem. In a recent work, we introduced a new quantitative structure-property relationship theory for the study of perturbations in complex biomolecular systems. In this work, we developed the first model able to classify more than 200,000 cases of perturbations with accuracy, sensitivity, and specificity >90% both in training and validation series. The perturbations include structural changes in >50000 peptides determined in experimental assays with boundary conditions involving >500 source organisms, >50 host organisms, >10 biological process, and >30 experimental techniques. The model may be useful for the prediction of new epitopes or the optimization of known peptides towards computational vaccine design.

1. Introduction

National Institute of Allergy and Infectious Diseases (NIAID) supported the launch, in 2004, of the Immune Epitope Database (IEDB), http://www.iedb.org/ [14]. The IEDB system withdrew information from approximately 99% of all papers published to date that describe immune epitopes. In doing so, IEDB system analyses over 22 million PubMed abstracts and subsequently curated 13 K references, including 7 K manuscripts about infectious diseases, 1 K about allergy topics, 4 K about autoimmunity, and 1 K about transplant/alloantigen topics [5]. IEDB lists a huge amount of information about the molecular structure as well as the experimental conditions () in which different th molecules were determined to be immune epitopes or not. This explosion of information makes necessary both query/display functions for retrieval of known data from IEDB as well predictive tools for new epitopes. Salimi et al. [5] reviewed advances in epitope analysis and predictive tools available in the IEDB. In fact, IEDB analysis resource (IEDB-AR: http://tools.iedb.org/) is a collection of tools for prediction of molecular targets of T- and B-cell immune responses (i.e., epitopes) [6, 7].

On the other hand, Quantitative Structure-Activity/Property Relationships (QSAR/QSPR) techniques are useful tool to predict new drugs, RNA, drug-protein complexes, and protein-protein complexes. In general, QSAR/QSPR-like methods transform molecular structures into numeric molecular descriptors () in a first stage and later fit a model to predict the biological process. For example, DRAGON [810], CODESSA [11, 12], MOE [13], TOPS-MODE [1417], TOMOCOMD [18, 19], and MARCH-INSIDE [20] are among the most used softwares to calculate molecular descriptors based on quantum mechanics (QM) and/or graph theory [2127]. The software STATISTICA [28] and WEKA [29] are often used to perform multivariate statistics and/or machine learning (ML) analysis in order to preprocess data and later fit the final QSAR/QSPR model using techniques like principal component analysis (PCA), linear discriminant analysis (LDA), support vector machine (SVM), or artificial neural networks (ANN) [28].

QSAR/QSPR models are also important in immunoinformatics to predict the propensity of different molecular structures to play different roles in immunological processes. They include skin vaccine adjuvants and sensitizers [3038], drugs and their activity/toxicity protein targets in the immune system [39], and epitopes [4049]. Moreover, Reche and Reinherz [50] implemented PEPVAC (promiscuous epitope-based vaccine), a web server for the formulation of multiepitope vaccines that predict peptides binding to five distinct HLA class I supertypes (A2, A3, B7, A24, and B15). PEPVAC can also identify conserved MHC ligands, as well as those with a C-terminus resulting from proteasomal cleavage. The Dana-Farber Cancer Institute hosted the PEPVAC server at the site http://immunax.dfci.harvard.edu/PEPVAC/. To close with a last example, Lafuente and Reche [51] reviewed the available methods for predicting MHC-peptide binding and discussed their most relevant advantages and drawbacks.

In many complex QSPR-like problems in immunoinformatics, like in other areas, we know the exact experimental result (known solution) of the problem, but we are interested in the possible result obtained after a change (perturbation) on one or multiple values of the initial conditions of the experiment (new solution). For instance, we often know, for large collections of th molecules , organic compounds, drugs, xenobiotics, and/or peptide sequences, the efficiency of the compound as adjuvant, action as epitope, immunotoxicity, and/or the interaction (affinity, inhibition, etc.) with immunological targets. In addition, we often known for each molecule the exact conditions () of assay for the initial experiment including structure of the molecule (drug, adjuvant, and sequence of the peptide), source organism (so), host organism (ho), immunological process (ip), experimental technique (tq), concentration, temperature, time, solvents, and coadjuvants. This is the case of big data retrieved from very large databases like IEDB [14] and CHEMBL [52]. However, we do not know the possible result of the experiment if we change at least one of these conditions (perturbation). We refer to small changes or perturbations in both structure and condition for input or output variables. It means that we include changes in ho, so, ip, and tq, changes of the compound by one analogue compound with similar structure, changes in the sequence of the epitope (artificial by organic synthesis or natural mutations), and polarity of the solvent or coadjuvants. In these cases, we could use a perturbation theory model to solve the QSAR/QSPR problem. Perturbation theory includes methods that add “small” terms to a known solution of a problem in order to approach a solution to a related problem without known solution. Perturbation models have been widely used in all branches of science from QM to astronomy and life sciences including chaos or “butterfly effect,” Bohr’s atomic theory, Heisenberg’s mechanics, Zeeman’s and Stark’s effects, and other models with applications in like protein spectroscopy and others [5357]. In a very recent work Gonzalez-Diaz et al. [58] formulated a general-purpose perturbation theory or model for multiple-boundary QSPR/QSAR problems. However, there is not report in the immunoinformatics literature of a general QSPR perturbation model for IEDB B-epitopes. Here we report the first example of QSPR-perturbation model for B-epitopes reported in IEDB able to predict the probability of occurrence of an epitope after a perturbation in the sequence, the experimental technique, the exposition process, and/or the source or host organisms.

2. Materials and Methods

2.1. Molecular Descriptors for Peptides

We calculated the molecular descriptors of the structure of peptides using the software MARCH-INSIDE (MI) based on the algorithm with the same name [59]. The MI approach uses a Markov Chain method to calculate the th mean values of different physicochemical molecular properties for th molecules . These values are calculated as an average of values for all atoms placed at topological distance ; which are in turn the means of atomic properties for all atoms in the molecule and its neighbors placed at . For instance, it is possible to derive average estimations of molecular refractivities , partition coefficients , and hardness for atoms placed at different topological distances . In this first work, we calculated only one type of values. We calculated for all peptides the average value of all the atomic electronegativities for all atoms connected to the th atom and their neighbors placed at a distance [59]:

We calculate the probabilities for any atomic property including using a Markov Chain model for the gradual effects of the neighboring atoms at different distances in the molecular backbone. This method has been explained in detail in many previous works so we omit the details here [59].

2.2. Electronegativity Perturbation Model for Prediction of B-Epitopes

Very recently Gonzalez-Diaz et al. [58] formulated a general-purpose perturbation theory or model for multiple-boundary QSPR/QSAR problems. We adapted here this new theory or modeling method to approach to the peptide prediction problem from the point of view of perturbation theory. Let be a set of th peptide molecules denoted as with a value of efficiency as epitopes experimentally determined under a set of boundary conditions . We put the main emphasis here on peptides reported in the database IEDB. In this sense, the boundary conditions used here are the same reported in this database, the specific peptide, , , , and . In general, so is the organism that expresses the peptide (but it can include also artificial peptides, cellular lines, etc.), ho is the host organism exposed to the peptide by means of the bp detected with tq. As our analysis, based on the data reported by IEDB we are unable to work with continuous values of epitope activity . Consequently, we have to predict the discrete function of B-epitope efficiency for epitopes reported in the conditions and , otherwise. Our main aim is to predict the shift or change in a function of the output efficiency that takes place after a change, variation, or perturbation in the structure and/or boundary conditions of a peptide of reference. But we know the efficiency of the process of reference in addition to the molecular structure and the set of conditions for initial (reference) and final processes (new). Consequently, to predict we have to predict only the efficiency function of the new state obtained by a change in the structure of the peptide and/or the boundary conditions. Let be a perturbation in a function ; we can define as the state information function for the reference and new states. According to our recent model [58], we can write as a function of the conditions and structure of the peptide as follows. In fact, the variational state functions have to be written in pairs in order to describe the initial (reference) and final (new) states of a perturbation, as follow:

The state function is for the th peptide measured under a set of boundary conditions in output, final, or new state. The conjugated state function is for the th peptide measured under a set of boundary conditions for the input, initial, or reference state. The difference between the new (output) state and the reference (input) state is the additive perturbation [58]. Consider

Equation (3) described before opens the door to test different hypothesis. A simple hypotheses is H0: existence of one small and constant value of the perturbation function for all the pairs of peptides and a linear relationship between perturbations of input/output boundary conditions with coefficients , , , and . Consider

We can use elemental algebraic operations to obtain from these equations an expression for efficiency as epitope of the peptide . In this case, considering , we can obtain the different expressions; the last may be very useful to solve the QSRR problem for the large datasets formed by IEDB B-epitopes. Consider

3. Results and Discussion

We propose herein, for the first time, a QSRR-perturbation model able to predict variations in the propensity of a peptide to act as B-epitope taking into consideration the propensity of a peptide of reference and the changes in peptide sequence, immunological process, host organism, source organisms, and the experimental technique used. The best QSPR-perturbation model found here with LDA was

The first input term is the value is the scoring function of the efficiency of the initial process (known solution). The function if the th peptide could experimentally be demonstrated to be a B-epitope in the assay of reference (reference) carried out in the conditions , otherwise. The variational-perturbation terms are at the same time terms typical of perturbation theory and moving average (MA) functions used in Box-Jenkin models in time series [60]. These new types of terms account both for the deviation of the electronegativity of all amino acids in the sequence of the new peptide with respect to the peptide of reference and with respect to all boundary conditions. In Table 1, we give the overall classification results obtained with this model. Speck-Planche et al. [6163] introduced different multitarget/multiplexing QSAR models that incorporate this type of information based on MAs. The results obtained with the present model are excellent compared with other similar models in the literature useful for other problems including moving average models [64, 65] or perturbation models [58]. Notably, this is also the first model combining both perturbation theory and MAs in a QSPR context.

The other input terms are the following. The first is the perturbation term for the variation or in the mean value of electronegativity for all amino acids in the sequence of the peptide of reference. The remnant input variables of the model quantify values of the conditions of the new assay -new that represent perturbations with respect to the initial conditions -ref of the assay of reference. The quantities and are the average values of the mean electronegativity values and for all new and reference peptides in IEDB that are epitopes under the th or th boundary condition. The values of these terms have been tabulated for >500 source organisms, >50 host organisms, >10 biological process, and >30 experimental techniques. We must substitute the values of and of the new and reference peptides and the tabulated values of and for all combinations of boundary conditions to predict the perturbations of the action as epitope of peptides. In doing so we can found the optimal sequence and boundary conditions towards the use of the peptide in the development of a vaccine. In Table 2 we give some of these values of and .

In Table 3 we depict the sequences and input-output boundary conditions for top perturbations present in IEDB. All these perturbations have observed value of and predicted value also equal to 1 with a high probability. See Supplementary Material available online at http://dx.doi.org/10.1155/2014/768515 file contains a full list of >200,000 cases of perturbations.

4. Conclusions

It is possible to develop general models for vaccine design able to predict the results of multiple input-output perturbations in peptide sequence and experimental assay boundary conditions using ideas of QSPR analysis, perturbation theory, and Box and Jenkins MA operators. The electronegativity values calculated with MARCH-INSIDE seem to be good molecular descriptors for this type of QSPR-perturbation models.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The present study was partially supported by Grants AGL2010-22290-C02 and AGL2011-30563-C03 from Ministerio de Ciencia e Innovación, Spain, and Grant CN 2012/155 from Xunta de Galicia, Spain.

Supplementary Materials

Supplementary Material includes the sequences of peptides obtained from IEDB, boudary conditions (source organisms, host organisms, techniques, biological process) as well as the values of the input/output variables of the models calculated for all the cases present in the dataset used. These values have been obtained for all the input-output boundary conditions for the perturbations.

  1. Supplementary Material