BioMed Research International

BioMed Research International / 2016 / Article
Special Issue

Scalable Data Mining Algorithms in Computational Biology and Biomedicine

View this Special Issue

Research Article | Open Access

Volume 2016 |Article ID 3832176 | https://doi.org/10.1155/2016/3832176

Zhao Li, Yilei Zhao, Gaofeng Pan, Jijun Tang, Fei Guo, "A Novel Peptide Binding Prediction Approach for HLA-DR Molecule Based on Sequence and Structural Information", BioMed Research International, vol. 2016, Article ID 3832176, 10 pages, 2016. https://doi.org/10.1155/2016/3832176

A Novel Peptide Binding Prediction Approach for HLA-DR Molecule Based on Sequence and Structural Information

Academic Editor: Yungang Xu
Received10 Mar 2016
Accepted04 May 2016
Published31 May 2016

Abstract

MHC molecule plays a key role in immunology, and the molecule binding reaction with peptide is an important prerequisite for T cell immunity induced. MHC II molecules do not have conserved residues, so they appear as open grooves. As a consequence, this will increase the difficulty in predicting MHC II molecules binding peptides. In this paper, we aim to propose a novel prediction method for MHC II molecules binding peptides. First, we calculate sequence similarity and structural similarity between different MHC II molecules. Then, we reorder pseudosequences according to descending similarity values and use a weight calculation formula to calculate new pocket profiles. Finally, we use three scoring functions to predict binding cores and evaluate the accuracy of prediction to judge performance of each scoring function. In the experiment, we set a parameter in the weight formula. By changing value, we can observe different performances of each scoring function. We compare our method with the best function to some popular prediction methods and ultimately find that our method outperforms them in identifying binding cores of HLA-DR molecules.

1. Introduction

Histocompatibility refers to the degree of antigenic similarity between the tissues of different individuals, which determines the acceptance or rejection of allografts. Transplantation antigen or histocompatibility antigen is the cause of rejection of allografts [1, 2]. MHC (Major Histocompatibility Complex) is present on the chromosome encoding a major histocompatibility antigen, mutual recognition between control cells, and the regulation of immune response.

MHC molecule plays a key role in immunology, and the molecule binding reaction with peptide is an important prerequisite for T cell immunity induced [2, 3]. By detecting a wide variety of microbial pathogens, the immune system protects host against diseases. Because of this, the binding prediction of MHC molecules with peptides has always been a hot topic in bioinformatics. Many researches in this field not only help us to understand the process of immune but also develop the work of vaccine design assisted by computers.

MHC genes produce two different types of molecules, which are MHC I molecules and MHC II molecules [1, 2]. MHC I molecules contain two separate polypeptide chains: the MHC α chain encoded by MHC genes and the MHC β chain encoded by non-MHC genes [4, 5]. MHC I class molecules are expressed in almost all eukaryotic cell surfaces, recognized by CD8+ cells. MHC II class molecules consist of two non-covalently linked polypeptide chains, namely, α chain and β chain. MHC II class molecules are expressed on antigen-presenting cells in general. Foreign MHC II antigens only capture and present on the surface of antigen-presenting cells (APC) TH cell [6]. After that, APC secretes large amounts of cytoplasm, activating cell invasion defensed behavior. Only the binding of antigen peptides and MHC II class molecules can activate CD4+ TH cells (helper T cells) [7]. Then, the activated TH cells would differentiate into effector cells and activate the immune response.

The structures of MHC I molecules and MHC II molecules slightly differ in the binding grooves [5]. Close grooves form on the binding of MHC I molecules and antigenic peptides. On the other hand, MHC II molecules do not have conserved residues, so they appear as open grooves. As a consequence, this will increase the difficulty in predicting MHC II molecules binding peptides [7]. In this paper, we aim to solve more difficult problem of predicting MHC II binding peptides.

The pioneering and most popular pan-specific approach for MHC II binding prediction is the TEPITOPE method [8], and basic idea is the HLA-DR allele having identical pseudosequence. The same pocket will share the same quantitative profile. By using multiple instance learning, the MHCIIMulti method [9] can predict more than 500 HLA-DR molecules. Transforming each DRB allele into a pseudosequence with 21 amino acids and using the SMM-align method to identify binding cores, the NetMHCIIpan method [5] gets an accurate prediction by using an artificial neural network algorithm [10, 11]. Combining NN-align and NetMHCpan with NetMHCIIpan [9, 12], the MULTIPRED2 method [1315] can get a perfect prediction for 1077 HLA-I and HLA-II alleles and 26 HLA supertypes.

In this paper, we propose a novel prediction method for predicting MHC II molecules binding peptides. First, we calculate sequence similarity and structural similarity between different MHC molecules [13, 16]. Then, we reorder pseudosequences according to descending similarity values and use a weight calculation formula to calculate new pocket profiles. Finally, we use three scoring functions to predict binding cores and evaluate the accuracy of prediction to judge performance of each scoring function [17, 18]. In the experiments, we set a parameter in the weight formula. By changing value, we can observe different performances of each of the scoring functions. We compare our method with the best function to some popular prediction methods and ultimately find that our method outperforms them in identifying binding cores of HLA-DR molecule [19]. The work would suggest a novel computational strategy for special protein identification instead of traditional machine learning based methods [20, 21].

2. Materials and Methods

2.1. Data Sets

We find 39 MHC molecules and peptides binding complexes from Protein Data Bank (http://www.rcsb.org/pdb/search/), which constitutes the data set used in this paper. In this data set, lengths are between 11 and 23, and we can find polypeptide-binding sites, namely, binding cores. Table 1 lists the details of these 39 MHC molecules and peptide binding complexes [14, 22, 23].


PDB IDDRB allelePeptide sequence

1AQDDRB10101VGSDWRFLRGYHQYA
1PYWDRB10101XFVKQNAAALX
1KLGDRB10101GELIGILNAAKVPAD
1KLUDRB10101GELIGTLNAAKVPAD
2FSEDRB10101AGFKGEQGPKGEPG
1SJHDRB10101PEVIPMFSALSEG
1SJEDRB10101PEVIPMFSALSEGATP
1T5WDRB10101AAYSDQATPLLLSPR
1T5XDRB10101AAYSDQATPLLLSPR
2IANDRB10101GELIGTLNAAKVPAD
2IAMDRB10101GELIGILNAAKVPAD
2IPKDRB10101XPKWVKQNTLKLAT
1FYTDRB10101PKYVKQNTLKLAT
1R5IDRB10101PKYVKQNTLKLAT
1HXYDRB10101PKYVKQNTLKLAT
1JWMDRB10101PKYVKQNTLKLAT
1JWSDRB10101PKYVKQNTLKLAT
1JWUDRB10101PKYVKQNTLKLAT
1LO5DRB10101PKYVKQNTLKLAT
2ICWDRB10101PKYVKQNTLKLAT
2OJEDRB10101PKYVKQNTLKLAT
2G9HDRB10101PKYVKQNTLKLAT
1A6ADRB10301PVSKMRMATPLLMQA
1J8HDRB10401PKYVKQNTLKLAT
2SEBDRB10401AYMRADAAAGGA
1BX2DRB11501ENPVVHFFKNIVTPR
1YMMDRB11501ENPVVHFFKNIVTPRGGSGGGGG
1FV1DRB50101NPVVHFFKNIVTPRTPPPSQ
1H15DRB50101GGVYHFVKKHVHES
1ZGLDRB50101VHFFKNIVTPRTPGG
4E41DRB10101GELIGILNAAKVPAD
1DLHDRB10101PKYVKQNTLKLAT
1KG0DRB10101PKYVKQNTLKLAT
3L6FDRB10101APPAYEKLSAEQSPP
3PDODRB10101KPVSKMRMATPLLMQALPM
3PGDDRB10101KMRMATPLLMQALPM
3S4SDRB10101PKYVKQNTLKLAT
3S5LDRB10101PKYVKQNTLKLAT
1HQRDRB50101VHFFKNIVTPRTP

In Table 1, the first column is PDB ID of 39 complexes from PDB; the second column is the name of corresponding alleles from 39 complexes; the third column is the corresponding polypeptide sequences, in which the enlarged nine positions are the binding cores.

2.2. Methods

There are thousands of allele variants in nature [2, 4]. It is absolutely impossible to measure the binding specificity one by one. Motivated by this perspective, we propose a new computational method to predict the binding specificity of peptides without any biochemical experiment, which combines the sequence and structural information of these known specificity-binding MHC molecules, as showed in Figure 1. We evaluate the method on all general HLA-DRB data sets, and results indicate that our method is close to the state-of-the-art technology and our approach can predict all sequence-known MHC molecules and cost little time, extending the prediction space compared with other time-consuming approaches.

2.3. Crucial Pockets relative to Binding Specificities of HLA-DR Molecules

We mainly use Position Specific Scoring Matrix (PSSM) [13, 24] in our approach, which is a popular technology in the problem of MHC binding. Roughly speaking, there are nine amino acids in MHC binding cores, and each position is a specific pocket as showed in Table 2. We use PSSM to quantify the binding affinity between twenty basic amino acids with these nine pockets.


PDB ID Pocket  1Pocket 2Pocket 3Pocket 4Pocket 5Pocket 6Pocket 7Pocket 8Pocket 9

1AQD82N 85V 86G77T 78Y 81H 82N78Y13F 74A 78Y 13F 71R11L47Y 61W 67L 70Q 71R60Y 61W9W 57D 61W

1PYW82N 85V 86G 89F77T 78Y 81H 82N78Y13F 70Q 71R 74A 78Y13F 71R11L11L 28E 61W 71R60Y 61W57D 61W

1KLG82N 85V78Y 81H 82N78Y13F 71R 78Y13F 71R11L61W60Y 61W57D 61W

2FSE82N 85V 86G 89F77T 78Y 82N13F 28E 70Q 71R 74A 78Y13F 71R71R28E 47Y 61W 67L
71R
61W57D

1KLU82N 85V78Y 81H 82N13F 71R 78Y13F 71R11L61W60Y 61W57D 61W

1SJH82N78Y 81H 82N13F 26L 70Q 71R 74A 78Y71R11L61W60Y 61W57D 61W

1SJE82N78Y 81H 82N78Y13F 26L 70Q 71R 74A 78Y71R11L61W60Y 61W57D 60Y 61W

1T5W82N 86G 89F78Y 81H 82N78Y13F 70Q 71R 74A 78Y13F 71R11L61W 71R60Y 61W9W 57D 61W

1T5X82N 86G 89F78Y 81H 82N78Y13F 70Q 71R 74A 78Y71R11L61W 71R61W57D 61W

2IAN82N 85V78Y 81H 82N78Y13F 70Q 74A 78Y13F 70Q 71R11L61W 71R61W57D 61W

2IPK82N 85V 86G 89F77T 78Y 81H 82N13F 70Q 71R 74A 78Y71R11L47Y 61W 67L 71R60Y 61W9W 57D 61W

1FYT82N 85V 86G 89F78Y 81H 82N78Y13F 70Q 71R 74A 78Y13F 71R11L28E 47Y 61W 67L 71R60Y 61W9W 57D 61W

1R5I82N 85V 86G 89F77T 78Y 81H 82N78Y13F 70Q 71R 74A 78Y70Q 71R11L47Y 61W 67L 71R61W9W 57D 61W

1HXY82N 85V 86G 89F78Y 81H 82N13F 70Q 71R 74A 78Y71R11L28E 47Y 61W 67L 71R60Y 61W9W 57D 61W

1JWM82N 85V 86G 89F78Y 81H 82N78Y13F 70Q 71R 74A 78Y71R11L28E 47Y 61W 67L 71R61W57D 61W

1JWS82N 85V 86G 89F78Y 81H 82N78Y13F 70Q 71R 74A 78Y13F 71R11L47Y 61W 67L 71R61W9W 57D 61W

1JWU82N 85V 86G 89F78Y 81H 82N78Y13F 70Q 71R 74A 78Y13F 71R11L28E 47Y 61W 67L 71R61W9W 57D 61W

1LO582N 85V 86G 89F78Y 81H 82N78Y13F 70Q 78Y13F 71R11L47Y 61W 67L 71R61W9W 57D 60Y 61W

2ICW82N 85V 86G 89F78Y 81H 82N78Y13F 70Q 71R 74A 78Y13F 71R11L28E 47Y 61W 67L 71R61W9W 57D 61W

2OJE82N 85V 86G 77T 78Y 81H 82N78Y13F 70Q 71R 74A 78Y70Q 71R11L28E 47Y 61W 67L 71R61W9W 57D 61W

2G9H82N 85V 86G 89F77T 78Y 81H 82N78Y13F 70Q 71R 74A 78Y71R11L 13F28E 47Y 61W 67L 71R60Y 61W9W 57D 61W

2IAM82N78Y 81H 82N78Y13F 70Q 71R 74A 78Y70Q 71R11L61W 67L
71R
60Y 61W57D 61W

1A6A82N 85V 86V77T 78Y 81H 82N78Y13S 26Y 74R 78Y71K 74R11S 30Y30Y 47F 61W 67L 71K61W9E 30Y 57D 61W

1J8H82N 85V 86G 89F77T 78Y 81H 82N78Y13H
26F 28D 70Q 74A 78Y
13H 70Q 71K11V 13H 30Y30Y 47Y 61W 67L 60Y 61W37Y 57D 61W

2SEB82N77T 78Y 81H 82N13H
26F
71K
78Y
13H 71K30Y30Y 47Y 61W 60Y 61W61W

1BX282N 85V 77T 78Y 81H 82N78Y13H
26F 28D 70Q 74A 78Y
70Q13R57D 60Y 61W

1YMM82N77T 78Y 81H 82N78Y13R
26F 28D 70Q 74A 78Y
70Q13R61W 67I 61W57D 61W

1FV182N 85V 86G 89F78Y 81H 82N78Y13Y 71R 78Y71R13Y61W 67L
71K
61W57D

1H1582N 89F77T 78Y 81H 82N78Y13Y 71R 78Y71R11D 13Y 30D61W57D 60Y

1ZGL82N 85V 89F77T 78Y 81H 82N13Y 26F 71R 78Y13Y13Y 28H 61W 71R61W57D 60Y 61W

There are five anchor sites (1, 4, 6, 7, and 9) at the binding core for MHC II molecules, which determine the binding strength of peptides with MHC II molecules. Because site 1 of MHC II is consistent with different MHC II molecules and peptides, it is important to identify the precise quantification of its binding core in site 1, yet we use weights of four anchor sites (4, 6, 7, and 9) to define profiles. For other sites, the same approach, such as TEPITOPE, is to specify their quantitative profiles.

2.4. Computing Similarity between Different MHC Molecules
2.4.1. Sequence-Based Similarity

Sequence-based similarity can be calculated by alignment results. Here, pocket pseudosequences and associated profiles refer to raw pocket pseudosequences and raw pocket profiles, respectively. These raw pseudosequences are composed of several amino acids, whose associated residue indices are shown in Table 3. Eleven representative HLA-DR alleles are adopted to specify different profiles for anchor pockets 4, 6, 7, and 9. These eleven alleles are , , , , , , , , , , and . If two alleles have identical pseudosequences in the same pocket, they will have identical profiles. For a given pocket, we collect all the different raw pocket pseudosequences into one set , , and , where , , is the number of unique pseudosequences, and is the number of amino acids contained in a pseudosequence. Meanwhile, we collect all different raw profiles into one set , , and , where . There is a one-to-one correspondence between and . We use BLOSUM to calculate the sequence similarity between different MHC molecules, defined as . Then, we can get encoded pseudosequence, which is a 20-dimensional real vector . We use Radial Basis Function (RBF) to measure the similarity between encoded predicted pseudosequences and a raw encoded pseudosequence:


PocketImportant positions

Pocket 182 85 86 89
Pocket 277 78 81 82
Pocket 378
Pocket 411 13 26 28 70 71 74 78
Pocket 511 13 28 70 71 74
Pocket 611 13 28 70 71 74
Pocket 711 28 30 47 61 67 70 71
Pocket 860 61
Pocket 99 30 37 57 60 61

2.4.2. Structure-Based Similarity

Using MHC II HLA-peptide complex structure from Protein Data Bank (PDB), we can get the residues 3D-coordinate of the pocket in each MHC molecule, . We define vector , where is the number of amino acids in the pseudocontained sequence; meanwhile, we collect a set , , is the number of different pseudosequences, and there is also one-to-one correspondence between and .

Next, we need to estimate the similarity of three-dimensional structures between a measured MHC molecule and five MHC molecules with known pseudosequence PSSM. Rigid transformation is to compare three-dimensional substructures of two proteins [25, 26].

Intuitively, we fix one of the structures, A, move (translation and rotation) the other structure, B, and find the best movement in three-dimensional space, with two atoms to the nearest structure. We calculate the Euclidean distance between two structures, defined as . We can get encoded pseudosequence and calculate the similarity between 3D structures of encoded predicted pseudosequences and a raw encoded pseudosequence:

2.4.3. Overall Similarity

After that, we have obtained sequence similarity and structural similarity. We calculate final similarity score functions according to the following three formulas:

2.5. Weights Calculation for New Pocket Profiles

We reorder all pseudosequences according to descending similarity values and use a weight calculation formula to calculate new pocket profiles. A new pocket profile is generated as a weighted average over raw pocket profiles in . Next, we use the gamma distribution to generate the weights. The gamma PDF distribution is defined as follows: where and , , and denotes the gamma function.

The weight distribution is generated to discretize the gamma PDF as follows:where is the dimension of the weights and and are the shape and scale parameters, respectively. The gamma distribution generates the weight vector to give a higher weight for more similarity pseudosequences.

After normalizing, the weight vector is defined as follows:

Given a predicted DRB allele , let , where , , and α is a positive number and enhances the weight vector to protect the outstanding contribution of most similarity pseudosequences. Associated raw pocket profiles are . Elements of are sorted in descending order, and the reordered vector of is denoted as . The corresponding weight vector is denoted as . We denote pocket profiles associated with the reordered vector as , . We define the pocket profile for allele as follows:where .

3. Result

First, we design an experiment to choose appropriate scoring function to combine sequence similarity and structural similarity. Then, we compare with other state-of-the-art technologies, which are TEPITOPE, MultiRTA, NetMHCIIpan-2.0, and NetMHCIIpan-1.0. The result indicates that our approach can obtain better prediction and effectively extend current prediction methods. Finally, we test on more data sets.

3.1. Evaluation of Different Scoring Functions

Here, we use 30 of 39 MHC molecules and peptide complexes as test set and get the appropriate scoring functions as showed above. The value of the parameter α is set to 1, 2, 3, 4, 5, 10, 15, and 20, followed by results shown in Figure 2. We find that no significant changes can be found by ; for and , when prediction error number is 10 and 9 and when prediction errors reduced to 8, we set the value of α to 3. Comparing these three functions, the least numbers of errors by three functions are 4, 8, and 8. Details are shown in Tables S1, S2, and S3, in the Supplementary Material available online at http://dx.doi.org/10.1155/2016/3832176.

3.2. Compared with Conventional Well-Known Methods

From the above experimental results, obtains the most accurate prediction, so we will select with α = 3 as our final approach. We compare our current prediction results with conventional well-known methods TEPITOPE [23], MultiRTA [13], NetMHCIIpan-2.0 [12], and NetMHCIIpan-1.0 [12], and these results are shown in Table 4.


PDB IDAllelePeptideCoreOursTEPITOPEMultiRTANetMHCIIpan-2.0

1AQDDRB10101VGSDWRFLRGYHQYAWRFLRGYHQWRFLRGYHQWRFLRGYHQWRFLRGYHQWRFLRGYHQ
1PYWDRB10101XFVKQNAAALXFVKQNAAALFVKQNAAALFVKQNAAALFVKQNAAALFVKQNAAAL
1KLGDRB10101GELIGILNAAKVPADIGILNAAKVIGILNAAKVIGILNAAKVIGILNAAKVLIGILNAAK
2FSEDRB10101GELIGTLNAAKVPADIGTLNAAKVIGTLNAAKVIGTLNAAKVIGTLNAAKVIGTLNAAKV
1KLUDRB10101AGFKGEQGPKGEPGFKGEQGPKGFKGEQGPKGFKGEQGPKGFKGEQGPKGFKGEQGPKG
1SJHDRB10101PEVIPMFSALSEGVIPMFSALSVIPMFSALSVIPMFSALSVIPMFSALSVIPMFSALS
1SJEDRB10101PEVIPMFSALSEGATPVIPMFSALSVIPMFSALSVIPMFSALSVIPMFSALSVIPMFSALS
1T5WDRB10101AAYSDQATPLLLSPRYSDQATPLLSDQATPLLLYSDQATPLLSDQATPLLLYSDQATPLL
1T5XDRB10101AAYSDQATPLLLSPRYSDQATPLLSDQATPLLLYSDQATPLLSDQATPLLLYSDQATPLL
2IANDRB10101GELIGTLNAAKVPADIGTLNAAKVIGTLNAAKVIGTLNAAKVIGTLNAAKVIGTLNAAKV
2IPKDRB10101GELIGILNAAKVPADIGILNAAKVIGILNAAKVIGILNAAKVIGILNAAKVLIGILNAAK
1FYTDRB10101XPKWVKQNTLKLATWVKQNTLKLWVKQNTLKLWVKQNTLKLWVKQNTLKLWVKQNTLKL
1R5IDRB10101PKYVKQNTLKLATYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKL
1HXYDRB10101PKYVKQNTLKLATYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKL
1JWMDRB10101PKYVKQNTLKLATYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKL
1JWSDRB10101PKYVKQNTLKLATYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKL
1JWUDRB10101PKYVKQNTLKLATYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKL
1LO5DRB10101PKYVKQNTLKLATYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKL
2ICWDRB10101PKYVKQNTLKLATYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKL
2OJEDRB10101PKYVKQNTLKLATYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKL
2G9HDRB10101PKYVKQNTLKLATYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKL
2IAMDRB10101PKYVKQNTLKLATYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKL
1A6ADRB10301PVSKMRMATPLLMQAMRMATPLLMMRMATPLLMMRMATPLLMMRMATPLLMMRMATPLLM
1J8HDRB10401PKYVKQNTLKLATYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKLYVKQNTLKL
2SEBDRB10401AYMRADAAAGGAMRADAAAGGMRADAAAGGMRADAAAGGMRADAAAGGYMRADAAAG
1BX2DRB11501ENPVVHFFKNIVTPRVHFFKNIVTVHFFKNIVTVHFFKNIVTVHFFKNIVTVVHFFKNIV
1YMMDRB11501ENPVVHFFKNIVTPRGGSGGGGGVHFFKNIVTVHFFKNIVTVHFFKNIVTVHFFKNIVTVHFFKNIVT
1FV1DRB50101NPVVHFFKNIVTPRTPPPSQFKNIVTPRTKNIVTPRTPFKNIVTPRTVHFFKNIVTFFKNIVTPR
1H15DRB50101GGVYHFVKKHVHESYHFVKKHVHYHFVKKHVHYHFVKKHVHYHFVKKHVHYHFVKKHVH
1ZGLDRB50101VHFFKNIVTPRTPGGFKNIVTPRTKNIVTPRTPFKNIVTPRTVHFFKNIVTFFKNIVTPR

Results4 errors0 errors4 errors6 errors

TEPITOPE is a relatively early method and is one of the most popular methods for predicting MHC II binding molecules. The basic idea is that if two HLA-DR alleles have the same pseudorandom sequence in the same pocket, they share the same number of profiles. Through multiple instances, MHCIIMulti has predicted over 500 HLA-DR molecules. NetMHCIIpan firstly converts each of the DRB alleles into a pseudorandom sequence of 21 amino acids, then uses the SMM-align method to identify binding residues in the peptide chain and the core side, and finally uses artificial neural network to train the model. MultiRTA makes prediction on HLA-DR and HLA-DP molecules. By thermodynamic method, it calculates a peptide chain and all other residues to predict the average binding affinity of binding strength and the introduction of standardization constraints to avoid overfitting. MULTIPRED2 can predict 1077 HLA-I and HLA-II genes and 26 HLA supertypes. Details are as shown in Figure 3. Our method obtains 4 errors; however, TEPITOPE, MultiRTA, NetMHCIIpan-2.0, and NetMHCIIpan-1.0 get the numbers of errors as 0, 4, 6, and 3, respectively. Because now we only find five MHC II molecules with three-dimensional structural information, we use the scoring matrix with only 5 MHC II molecules. If the three-dimensional structural information of MHC II molecules can be extended to all of the 11 MHC II molecules, our predictions will be more accurate. From the current view, our approach has reached a higher level of prediction.

3.3. Other Prediction Results

When compared with other methods on the above experiments, we only use 30 of 39 MHC molecules and peptide complexes as test set. In this section, we test on the remaining nine MHC molecules. In this experiment, we choose and set the parameter α = 3. As seen in Table 5, eight of nine predictions are accurate. Therefore, our approach produces a considerably great performance.


PDB IDAllelePeptideCoreOurs

4E41DRB10101GELIGILNAAKVPADIGILNAAKVIGILNAAKV
1DLHDRB10101PKYVKQNTLKLATYVKQNTLKLYVKQNTLKL
1KG0DRB10101PKYVKQNTLKLATYVKQNTLKLYVKQNTLKL
3L6FDRB10101APPAYEKLSAEQSPPYEKLSAEQSYEKLSAEQS
3PDODRB10101KPVSKMRMATPLLMQALPMMRMATPLLMKMRMATPLL
3PGDDRB10101KMRMATPLLMQALPMMRMATPLLMMRMATPLLM
3S4SDRB10101PKYVKQNTLKLATYVKQNTLKLYVKQNTLKL
3S5LDRB10101PKYVKQNTLKLATYVKQNTLKLYVKQNTLKL
1HQRDRB50101VHFFKNIVTPRTPFKNIVTPRTFKNIVTPRT

Results1 error

4. Conclusion

In this paper, we try to solve the problem of predicting MHC II binding peptides with a novel metric and strategy. Sequence similarity and structural similarity between different MHC molecules are calculated to reorder pseudosequences according to descending similarity, and then a weight calculation formula is used to calculate new pocket profiles. Finally, we use three scoring functions to predict binding cores and evaluate the accuracy of prediction to judge performance of each scoring function. In the experiment, we set a parameter in the weight formula. By changing value, we can observe different performances of each scoring function. Then, we compare our method with the best function to some popular prediction methods and ultimately find that our method outperforms them in identifying binding cores of HLA-DR molecules.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by a grant from the National Science Foundation of China (NSFC 61402326) and Peiyang Scholar Program of Tianjin University (no. 2016XRG-0009).

Supplementary Materials

Using different functions to combine sequence similarity and structural similarity, these are the predicted results with the value of alpha ranging from 1 to 5.

  1. Supplementary Material

References

  1. R. M. Zinkernagel and P. C. Doherty, “Restriction of in vitro T cell-mediated cytotoxicity in lymphocytic choriomeningitis within a syngeneic or semiallogeneic system,” Nature, vol. 248, no. 5450, pp. 701–702, 1974. View at: Publisher Site | Google Scholar
  2. P. I. Terasaki, “A brief history of HLA,” Immunologic research, vol. 38, no. 1–3, pp. 139–148, 2007. View at: Publisher Site | Google Scholar
  3. K. Maenaka and E. Y. Jones, “MHC superfamily structure and the immune system,” Current Opinion in Structural Biology, vol. 9, no. 6, pp. 745–753, 1999. View at: Publisher Site | Google Scholar
  4. J. Robinson, M. J. Waller, S. C. Fail et al., “The IMGT/HLA database,” Nucleic Acids Research, vol. 37, supplement 1, pp. D1013–D1017, 2009. View at: Publisher Site | Google Scholar
  5. M. Nielsen, C. Lundegaard, T. Blicher et al., “Quantitative predictions of peptide binding to any HLA-DR molecule of known sequence: NetMHCIIpan,” PLoS Computational Biology, vol. 4, no. 7, article e1000107, 2008. View at: Publisher Site | Google Scholar
  6. H.-G. Rammensee, J. Bachmann, N. P. N. Emmerich, O. A. Bachor, and S. Stevanović, “SYFPEITHI: database for MHC ligands and peptide motifs,” Immunogenetics, vol. 50, no. 3, pp. 213–219, 1999. View at: Publisher Site | Google Scholar
  7. R. N. Germain, “MHC-dependent antigen processing and peptide presentation: providing ligands for T lymphocyte activation,” Cell, vol. 76, no. 2, pp. 287–299, 1994. View at: Publisher Site | Google Scholar
  8. T. Sturniolo, E. Bono, J. Ding et al., “Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices,” Nature Biotechnology, vol. 17, no. 6, pp. 555–561, 1999. View at: Publisher Site | Google Scholar
  9. N. Pfeifer and O. Kohlbacher, “Multiple instance learning allows MHC class II epitope predictions across alleles,” Algorithms in Bioinformatics, vol. 5251, pp. 210–221, 2008. View at: Google Scholar
  10. T. J. Kindt, R. A. Goldsby, B. A. Osborne, and J. Kuby, Kuby Immunology, WH Freeman & Company, New York, NY, USA, 2007.
  11. A. Sette, L. Adorini, E. Appella et al., “Structural requirements for the interaction between peptide antigens and I-Ed molecules,” Journal of Immunology, vol. 143, no. 10, pp. 3289–3294, 1989. View at: Google Scholar
  12. M. Nielsen, C. Lundegaard, T. Blicher et al., “Quantitative predictions of peptide binding to any HLA-DR molecule of known sequence: NetMHCIIpan,” PLoS Computational Biology, vol. 4, no. 7, Article ID e1000107, 2008. View at: Publisher Site | Google Scholar
  13. A. J. Bordner and H. D. Mittelmann, “MultiRTA: a simple yet reliable method for predicting peptide binding affinities for multiple class II MHC allotypes,” BMC Bioinformatics, vol. 11, article 482, 2010. View at: Publisher Site | Google Scholar
  14. P. A. Reche, H. Zhang, J.-P. Glutting, and E. L. Reinherz, “EPIMHC: a curated database of MHC-binding peptides for customized computational vaccinology,” Bioinformatics, vol. 21, no. 9, pp. 2140–2141, 2005. View at: Publisher Site | Google Scholar
  15. P. A. Reche, J.-P. Glutting, H. Zhang, and E. L. Reinherz, “Enhancement to the RANKPEP resource for the prediction of peptide binding to MHC molecules using profiles,” Immunogenetics, vol. 56, no. 6, pp. 405–419, 2004. View at: Publisher Site | Google Scholar
  16. A. Sette, L. Adorini, E. Appella et al., “Structural requirements for the interaction between peptide antigens and I-Ed molecules,” The Journal of Immunology, vol. 143, no. 10, pp. 3289–3294, 1989. View at: Google Scholar
  17. M. Nielsen, S. Justesen, O. Lund, C. Lundegaard, and S. Buus, “NetMHCIIpan-2.0—improved pan-specific HLA-DR predictions using a novel concurrent alignment and weight optimization training procedure,” Immunome Research, vol. 6, no. 1, article 9, 2010. View at: Publisher Site | Google Scholar
  18. M. Nielsen and O. Lund, “NN-align. An artificial neural network-based alignment algorithm for MHC class II peptide binding prediction,” BMC Bioinformatics, vol. 10, article 296, 2009. View at: Publisher Site | Google Scholar
  19. G. L. Zhang, D. S. DeLuca, D. B. Keskin et al., “MULTIPRED2: a computational system for large-scale identification of peptides predicted to bind to HLA supertypes and alleles,” Journal of Immunological Methods, vol. 374, no. 1-2, pp. 53–61, 2011. View at: Publisher Site | Google Scholar
  20. X.-Y. Cheng, W.-J. Huang, S.-C. Hu et al., “A global characterization and identification of multifunctional enzymes,” PLoS ONE, vol. 7, no. 6, Article ID e38979, 2012. View at: Publisher Site | Google Scholar
  21. Q. Zou, Z. Wang, X. Guan, B. Liu, Y. Wu, and Z. Lin, “An approach for identifying cytokines based on a novel ensemble classifier,” BioMed Research International, vol. 2013, Article ID 686090, 11 pages, 2013. View at: Publisher Site | Google Scholar
  22. W. Shen, S. Zhang, and H. Wong, “An effective and effecient peptide binding prediction approach for a broad set of HLA-DR molecules based on ordered weighted averaging of binding pocket profiles,” Proteome Science, vol. 11, p. S15, 2013. View at: Publisher Site | Google Scholar
  23. L. Zhang, Y. Chen, H.-S. Wong, S. Zhou, H. Mamitsuka, and S. Zhu, “TEPITOPEpan: extending TEPITOPE for peptide binding prediction covering over 700 HLA-DR molecules,” PLoS ONE, vol. 7, no. 2, Article ID e30483, 2012. View at: Publisher Site | Google Scholar
  24. P. A. Reche and E. L. Reinherz, “Prediction of peptide-MHC binding using profiles,” Methods in Molecular Biology, vol. 409, pp. 185–200, 2007. View at: Publisher Site | Google Scholar
  25. F. Guo, S. C. Li, L. Wang, and D. Zhu, “Protein-protein binding site identification by enumerating the configurations,” BMC Bioinformatics, vol. 13, article 158, 2012. View at: Publisher Site | Google Scholar
  26. F. Guo, S. C. Li, and L. Wang, “Protein-protein binding sites prediction by 3D structural similarities,” Journal of Chemical Information and Modeling, vol. 51, no. 12, pp. 3287–3294, 2011. View at: Publisher Site | Google Scholar

Copyright © 2016 Zhao Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder
Views922
Downloads400
Citations

Related articles

We are committed to sharing findings related to COVID-19 as quickly as possible. We will be providing unlimited waivers of publication charges for accepted research articles as well as case reports and case series related to COVID-19. Review articles are excluded from this waiver policy. Sign up here as a reviewer to help fast-track new submissions.