Human immunodeficiency virus (HIV) possesses a major threat to the human life largely due to the unavailability of an efficacious vaccine and poor access to the antiretroviral drugs against this deadly virus. High mutation rate in the viral genome underlying the antigenic variability of the viral proteome is the major hindrance as far as the antibody based vaccine development is concerned. Although the exact mechanism by which CTL epitopes and the restricting HLA alleles mediate their action towards slow disease progression is still not clear, the important CTL restricted epitopes for controlling viral infections can be utilized in future vaccine design. This study was designed for the characterization the HIV-1 optimal CTL epitopes and their corresponding HLA alleles. CTL epitope cluster distribution analysis revealed only two HIV-1 proteins, namely, Nef and Gag, which have significant cluster forming capacity. We have found the role of specific HLA supertypes such as HLA B*07, HLA B*58, and HLA A*03 in selecting the hydrophobic and conserved amino acid positions within Nef and Gag proteins, to be presented as epitopes. The analyses revealed that the clusters of optimal epitopes for Nef and p24 proteins of HIV-1 could potentially serve as a source of vaccine.

1. Introduction

Human immunodeficiency virus (HIV), a retrovirus that belongs to the Lentiviridae family, is the causative agent of acquired immunodeficiency syndrome (AIDS). HIV genome is composed of 9.8 Kb positive-sense, single-stranded RNA which is reverse transcribed by the enzyme reverse transcriptase to viral DNA upon its entry into the host cell [1]. Between the two types of HIV (HIV-1 and HIV-2), HIV-1 is more virulent and responsible for most of the HIV infections globally. Human immunodeficiency virus-1 (HIV-1) has infected more than 60 million people and caused nearly 30 million deaths worldwide [2]. In Asia, an estimated 4.9 million people were living with HIV in 2009, about the same as 5 years earlier. Most national HIV epidemics appear to have stabilized. Incidence fell by more than 25% in India, Nepal, and Thailand between 2001 and 2009. The epidemic remained stable in Malaysia and Sri Lanka during this time period. Incidence increased by 25% in Bangladesh and Philippines between 2001 and 2009 even as the countries continue to have relatively low epidemic levels [3]. Although the antiretroviral therapy has proven to be effective in controlling the infection in the developed world, only one-fourth populations in the developing world can afford these medications due to less accessibility. Consequently vast majority of people are living with a constant threat of HIV infection and death by AIDS. In this devastating situation of world AIDS epidemic, there is an urgent need of developing effective HIV vaccine as no vaccine is proved to be efficacious to control HIV infection. To combat this deadly virus, its genome, proteome, pathogenesis, and mechanisms of evasion of immune response should be studied in great detail.

HIV possesses complex RNA genome and contains nine genes which can be classified into 3 functional groups. Among these genes, Gag, Pol, and Env are structural genes, Tat and Rev are regulatory genes, whereas the rest of the genes (Vpu, Vpr, Vif, and Nef) fall into the accessory category of genes [4]. Early HIV replication cycle begins with the recognition of the target cells (mainly CD4+ T cells) by the mature virion and continues as virion core particles enters and facilitates its integration to the genomic DNA of the chromosome of the host cell. The late phase begins with the regulated expression of the integrated proviral genes and ends up with virus budding and maturation.

Gag gene encodes for 3 proteins: matrix (p17), capsid (p24), and nucleocapsid (p7) which are translated as polyproteins and later undergo a cleavage at specific site to give rise to three individual proteins. Pol gene also encodes for a polyprotein which has similar fate like Gag-poly-protein as it is also cleaved by viral protease into three different proteins: reverse transcriptase, protease, and integrase, whereas the Env gene encodes for a single glycoprotein (gp160) which later is cleaved into two proteins: surface glycoprotein gp120 and transmembrane protein gp41. Besides these some other regulatory proteins are also included in the HIV proteome such as Nef, Vpr, Tat, and Ref. In Table 1, we have collected functions of HIV proteins from Uniprot database of HIV, http://www.uniprot.org/uniprot/P04585.

High antigenic variability that results from the high mutation rate can be considered as the characteristic features of retrovirus such as HIV. This vast genetic heterogeneity of HIV not only helps the virus to evade selective pressures exerted by immune response and drug but also facilitates the viral evolution in a faster speed [5]. Phylogenetic studies showed the presence of three distinct groups: M (Major), O (Outliers), and N (non-M and non-O). M is the most predominant group of HIV-1 around the world [6]. Within the M group there are nine subtypes: A–D, F–H, J, and K. Among these, subtypes B are prevalent in most regions of the world such as USA, Europe, South East Asia, Australia, and South Africa [6].

Endogenous pathway of antigen processing and presentation is used to present endogenously synthesized cellular peptides as well as viral protein fragments via the MHC class I molecule to the cytotoxic-T-lymphocytes (CTLs). In this pathway, the proteins that are destined for the presentation are marked by the ubiquitinylation and subjected to proteolytic cleavage by the immunoproteasome. The fragments of peptides are transported to lumen of ER by the help of TAP (TAP1 and TAP2). These TAP proteins also help the loading of the short peptides with appropriate length (9 amino acids) into the groove of MHC class I molecule [7]. Although proteasome is the main player in generating the bulk of the CTL epitopes, cytosolic endopeptidases may also be involved in the production of certain CTL epitopes [8].

Peptides are typically tightly associated along their entire length in MHC class I groove. The N and C termini of the peptide are firmly H-bonded to the conserved residues of the MHC groove. The analysis of the naturally occurring peptides extracted from the MHC-peptide complex revealed that these are mostly 8-9 residues long and in certain key positions amino acids tend to be conserved within the peptide. These are called anchor positions which are proved to be essential for the binding of peptides to MHC class I molecules in allele specific manner. There are typically two (sometimes three) major anchor positions for the class I binding peptides. One is located at the C terminal end and the other one usually lies in the position 2 (P2) but also occurs in P3, 5, or 7 [7].

Cytotoxic-T-lymphocytes (CTLs) are one of the vital components of cellular immunity and play crucial role in eliminating viral infection. CTLs recognize viral antigen on the surface of virally infected cells in combination with appropriate major histocompatibility complex (MHC) molecule and exert their effect by killing the infected cells either by lysis or inducing apoptosis. Previous studies suggested that the HIV infection process can be divided into 3 stages. These are (1) acute viremia, (2) a latency period of variable time period, and (3) clinical AIDS. At the later stages of HIV infection CD4+ T cells counts drops down below 200 cells/mm3 which causes the complete collapse of immune response and consequently the opportunistic infectious agents such as Pneumocystis carinii come into the play [9]. There is now increasing body of evidence that CTLs play an important role in controlling the HIV infection. Analysis of the immune responses of the HIV infected patients revealed that antiviral CTL activity is correlated with clearance of virus particles during the acute phase of infection and a decline in the CTL activity is associated with the disease progression [9, 10]. Two types of antiviral CTL responses have been documented so far: one is classical viral epitope dependent-MHC restricted killing of virally infected cells and the other one is the noncytolytic response in which CTLs control the viral infection by inhibiting the viral replication [11]. Furthermore, studies of the SIV-macaque model, in which the administration of anti-CD8 monoclonal antibodies hinders the decline in viremia, provided strong evidence for the crucial role of CTLs in controlling HIV infection during acute phase [12]. Recently Goulder and Watkins have suggested three additional lines of evidence which signifies the potential role of CTLs in suppressing HIV infection: first they argued that specific HLA class I molecules are consistently associated with particular HIV disease outcomes. Secondly, they highlighted the fact that more rapid disease progression is observed in individuals with HLA class I homozygosity, and lastly they provided evidence that the loss of immune control over HIV infection arises when viral mutants escape CD8+ T-cell recognition [13]. All the above mentioned evidence signifies the important antagonizing role of CTL immune response in HIV disease progression.

2. Analysis of the HLA Class I Restricted CTL Epitopes in HIV Proteome

Design and development of HIV vaccine largely depend on our understanding of complex dynamics between host immune response and viral adaptation to selective pressure exerted by the host. Understanding how the CTL epitopes interact with particular HLA alleles can give an insight into the mechanisms of success or failure of immune control of a pathogen, such as HIV-1, for which clearance of virus particles depends on CTL activity. So the vaccine development strategies for HIV should be focused on identifying the epitopes presented by HLA alleles prevalent in populations severely affected by the global HIV epidemic. In recent years, development of new technologies such as measuring interferon-gamma (IFNγ) release by the enzyme linked immunospot (ELISPOT) assay and flow cytometry ensured the efficient evaluation of CTL responses against HIV epitopes [14]. Moreover, development of overlapping pooled peptide technology (OLP) now provides the opportunity for the detailed and precise analyses of HIV-1-specific cellular immune responses by elucidation of the T-cell epitopes and the identification of immunodominant regions of HIV-1 gene products. Identification and characterization of the CTL epitopes as well as the corresponding HLA alleles can play a major role in elucidating the nature of protective CTL response and mechanism of the immune evasion of HIV. A large number of HIV CTL epitopes have been identified and deposited into various databases. Apart from the experimental methods, various computational tools are now available which can predict CTL epitopes within viral proteome by using different sets of algorithms, for example, artificial neural network (ANN), average relative binding (ARB), stabilized matrix method (SMM), and so forth. The first CTL epitope was identified in 1988 by using synthetic peptide technology [15]. Since then, over 1200 HLA class I restricted HIV-1 epitopes were identified in HIV proteome (http://www.hiv.lanl.gov/content/immunology/index.html). In Table 2, a list of HLA class I allele restricted optimal CTL epitopes for HIV along with their corresponding HLA alleles and clades is given. For the identification of optimal epitopes, two criteria were imposed as described by Llanoa et al. [16]. These criteria include the unequivocal experimental validation of the epitope restriction by a specific HLA class I allele and the definition of the optimal epitope length (8 to 10 amino acid long). Analysis of the CTL epitopes listed in Table 2 reveals that epitopes from 5 HIV proteins (gp160, Nef, p24, p17, and RT) contributed 77% of the total epitopes listed in Table 2. The remaining percentage of the epitopes was derived from the eight other HIV proteins (Integrase, p2p7p1p6, Protease, Rev, Tat, Vpu, Vif and Vpr). The highest number of optimal epitopes was found for p24 (54) while the only one optimal epitope was identified for vpu (Figure 1). The epitope number for gp160, Nef, RT, and p17 were 45, 43, 41, and 23, respectively. The number of unique alleles recognized by these epitopes was also analyzed and found to be correlated with number of epitopes for each HIV protein (Figure 1). For instance, 54 p24 CTL epitopes were restricted cumulatively by 35 unique HLA class I alleles. Similarly, for the epitopes of gp160, Nef, p17, and RT, the numbers of unique HLA class I alleles were found to be 31, 29, and 25, respectively.

3. Clustering of CTL Epitopes in HIV Proteome

Analysis of the HIV-1 proteins reveals that HLA class I restricted epitopes form overlapping clusters known as epitope rich/dense region whereas the regions deficient of any epitope clusters are called the epitope poor regions [17]. Yusim et al. have identified four short overlapping clusters in Nef protein of HIV-1 which was found to be multirestricted indicating that the clusters contain several epitopes recognized by different class I HLA molecules [18]. In another study, Currier et al. identified CTL epitope distribution patterns in the Gag and Nef proteins of HIV-1 from subtype-A infected subjects [19]. Studies aided with powerful experimental as well as computational methods are now being conducted with the aim to construct a fine CTL epitope map for the whole HIV-1 proteome. With the advancement of new sophisticated computational and statistical methods, it is now possible to identify the epitope clusters computationally. One significant achievement in computational immunology is the method of identification of immunoproteasome cleavage sites within the query proteins by using different algorithms such as artificial neural network (ANN) which enables the rapid identification of a wide range of potential epitopes that can be analyzed both computationally and experimentally for their affinity to bind with particular HLA class I molecules. Studies, dedicated to identify the CTL epitope clusters by means of computational methods, are now showing some success as far as the identification of new epitope clusters is concerned, as some novel epitope containing clusters were identified. However, more developments in the algorithms are required to construct more realistic models of epitope and cleavage site prediction, so that the predicted proteasomal cleavage events observed in calculation may better mimic the actual processing of viral antigens in the natural environment. In this study, the analysis of the topological arrangement of the 269 experimentally validated optimal epitopes in the HIV proteins listed in Table 2 allowed the identification of epitope clusters in the individual HIV proteins. Among the 13 different HIV proteins (listed in Table 2), epitope clustering was performed for 5 proteins (gp160, Nef, p17, p24, and RT) because for these proteins a relatively higher number of epitopes were identified (Figure 2 and Table 3). The aim of the cluster analysis is to identify the epitope dense regions or “hot spots” in the HIV-1 proteome. To cancel the possibility of random matching, the clusters containing more than 5 overlapping epitopes were only considered. For gp160, 2 major epitope clusters can be observed where the first (amino acid position: 31 to 69) and second (amino acid position: 770–838) cluster harboured 9 epitopes each. Like the gp160 protein 2 major clearly defined clusters were also identified for the Nef and p17 protein. In case of Nef, one spans from 68 to 100 and the second one lies between 105 and 145 amino acid positions. In previous study Penciolelli et al. [20] identified 4 clusters in the Nef protein which falls within the epitope cluster range for Nef observed in this study. For p17 protein, two clusters were similar in epitope composition and length. First p17 cluster with 34 amino acids was found to contain 10 epitopes whereas 2nd p17 cluster with 12 epitopes was composed of 30 amino acids. p24 was found to contain maximum numbers (4) of major epitope clusters. For RT only 1 major cluster was identified which contained 9 overlapping epitopes, whereas the rest of epitopes were found to be distributed randomly in protein.

Data from Table 3 suggest that epitopes in both the RT and gp160 proteins did not exhibit significant clustering properties compared to other HIV proteins. Only 40% and 22% of the epitopes were identified as part of the major cluster in gp160 and RT, respectively, which indicated that the majority of the epitopes were distributed randomly in respective proteins. Epitopes from other three proteins Nef, p17, and p24 showed significant clustering pattern as evident by both Figure 4 and Table 3. Most of the epitopes in these proteins were found to be a part of cluster or epitope dense region.

4. Are the CTL Epitope Clusters Conserved and Hydrophobic in Nature?

By analyzing the nature of the CTL epitopes and their source proteins, Hughes and Hughes proposed two hypotheses about the nature of the CTL epitopes. First they proposed that the endogenous peptides presented by human leukocyte antigen (HLA) class I molecules are largely derived from conserved regions of proteins, so in general the CTL epitopes tend to be more conserved than the remainder portion of the source proteins. Secondly they hypothesized that the CTL epitope regions are hydrophobic whereas the source protein may itself be overall hydrophilic in nature [8]. In harmony with these hypotheses, Silva and Hughes showed that the CTL epitopes of HIV-1 Nef protein were derived from the hydrophobic and relatively conserved regions by estimating the relative conservation of CTL epitopes of the Nef protein and relating this to the structure and function of the protein. In another study Lucchiari-Hartz et al. showed that the CTL epitope clusters derived from Nef protein tend to coincide with hydrophobic regions, whereas the noncluster regions are predominantly hydrophilic [8]. Their in vitro analysis of the proteasomal degradation products of HIV-Nef protein demonstrates a differential sensitivity of cluster and noncluster regions to proteasomal processing and the cluster regions are digested by proteasomes with greater preference for hydrophobic P1 residues. But the authors admitted that some cytosolic endoproteases other than proteasomes may also be involved in the production of certain Nef-CTL epitopes in natural condition [8]. In both these studies the primary focus was on one protein (Nef) in the whole HIV proteome. So similar studies on other HIV proteins would certainly be interesting and could reveal some important feature of the HIV-CTL epitopes. In contrast to the notion that HIV CTL epitopes are more conserved and hydrophobic in nature, more recent study revealed that distribution of CTL epitopes in 99% of the HIV-1 protein sequences follows a random pattern and is indistinguishable from the distribution of CTL epitopes in proteins from other proteomes such as hepatitis C virus (HCV), influenza and for three eukaryote proteomes. In this study, the authors opted for the computational approach to predict the large set of CTL epitopes where proteasome cleavage pattern, TAP, and HLA-binding, three most crucial steps in classical endogenous antigen presentation pathway, were predicted by means of computational tools. The use of experimentally validated epitopes instead of computationally predicted epitopes could influence the outcome of the study and may lead the authors to a different conclusion. To shed some light on the contradiction of different studies mentioned above, an investigation involving hydrophobic and conservancy pattern of experimentally validated optimal CTL epitopes (Table 2) from 5 HIV proteins (gp160, Nef, p17, p24, and RT) was conducted in this study. Relative conservancy and hydrophobicity of the five selected HIV proteins were analyzed. 100 proteins sequences of different HIV clades retrieved from the Uniprot database (http://www.uniprot.org/) were used as an input for both conservancy and hydrophobic pattern prediction. To unveil the conservation pattern, multiple sequence alignment (MAS) was constructed using well stabled tool called Clustal W version 2.0 (http://www.ebi.ac.uk/Tools/clustalw2/index.html) developed by European Bioinformatics Institute (EBI). From the MSA the conservancy score for each amino acid position was obtained. To predict the hydrophobicity score Protscale tool of the ExPASy Proteomics Server (http://www.expasy.ch/tools/protscale-ref.html) and algorithm (developed by Abraham and Leo) previously used by Lucchiari-Hartz et al. [8] were employed. Both the hydrophobicity and conservancy scores for each amino acid position within a particular HIV protein were used to calculate the total scores for both these parameters. Figure 3 shows the total hydrophobicity and conservancy scores of 5 individual HIV proteins in agreement with previous study [21]. We found that both the RT and p24 are relatively conserved and more hydrophobic than the rest of the analyzed HIV proteins (Figure 3). To visualize the overlapping pattern and correlation between the hydrophobic pattern and epitope clusters for the five selected proteins, the epitope count/hit and hydrophobicity scores were plotted together for each protein (Figure 4). The epitope hit score for a particular position is the number of alleles binding to that particular position.

To compare the correlation among hydrophobicity, conservancy, and epitope count, correlation coefficient was calculated among them (Appendix 1). For the calculation of correlation coefficient, first the standard deviation of each of the three score parameters (epitope count, hydrophobicity, and conservancy) was obtained (Appendix 1). Table 4 shows the correlation score values among these three parameters.

From the correlation score it was evident that Nef epitope clusters were strongly correlated with the hydrophobicity and conservancy. p24 protein also showed relatively high correlation between epitope clusters and hydrophobicity and between conservancy and epitope clusters. In contrast, gp160 and RT showed relatively weak yet similar correlation among the parameters. p17 showed strong correlation when epitope hit was compared with conservancy but showed moderate correlation between epitope hit and hydrophobicity. So these findings suggested that not all the epitopes of HIV proteome are derived from conserved and hydrophobic regions of proteome although this hypothesis was found to be valid for two of the five HIV proteins (Nef and p24) as both of these proteins showed a significant correlation among epitope cluster, hydrophobicity, and conservancy. But the very weak correlation obtained for gp160 and RT diminished the general applicability of the hypothesis that all the HIV CTL epitopes were conserved and hydrophobic in nature.

5. The Role of MHC Class I on Immune Control of HIV Infection

Significant variation in the susceptibility to HIV-1 infection and especially in the clinical outcome after infection is observed in HIV infected patients. For instance, variation in the level of circulating virus particles in the plasma during the nonsymptomatic phase is commonly observed among the patients [22]. In addition to this, there is evidence that in certain cases individuals known as long-term nonprogressor (LTNP), infected with HIV, remain asymptomatic without antiretroviral therapy (ART) in their life time due to the slow or arrested evolution of HIV [23]. The most plausible explanation is that the variation in the susceptibility and outcome of HIV infection is largely due to host factors and viral adaptation to selective pressure. Recently Fellay et al. conducted a whole-genome association study to identify the host factor associated with control of HIV-1. In this study they identified two distinct polymorphisms associated with HLA loci B and C [24]. Surprisingly, almost all HLA class I polymorphisms were found to occur in those residues that belong to peptide-binding groove of these molecules thereby determining the epitopes that bind to each HLA molecule [25]. Among the three MHC class I loci in humans (HLA-A, HLA-B, and HLA-C), HLA-B is the most polymorphic, compared with HLA-A and HLA-C molecules (IMGT/HLA database: http://www.ebi.ac.uk/imgt/hla/). A more direct evidence of the association between HLA polymorphism and disease progression in HIV infected individual came from a previous study where they showed HLA-B*3503 associated with rapid disease progression differs in only one amino acid from HLA-B*3501 for which no such association was observed [26]. The presence of HLA-B*57 allele in a large proportion of LTNPs signifies its role in controlling disease progression and mutations in HLA-B*57-restricted Gag epitopes were frequently present in all viruses from plasma but interestingly inspite of this CTL escape mutations LTNPs can maintain viral suppression [27, 28]. The escape mutation in the HLA-B*57-restricted Gag epitopes can be considered as a consequence of strong evolutionary pressure exerted by the host immune response. Previous studies showed that although mutation in the conserved gag p24 epitope DRFYKTLRAE helps the virus to evade CTL response, it also impairs its ability to replicate because the mutation occurs in a very conserved position which is functionally constrained [29]. Among the three HLA class I molecules, HLA-B is considered as the most important factor for restricting HIV diseases progression and T cells responding to HLA-B-restricted epitopes appear to be immunodominant [30, 31]. Moreover, detailed study of the CTL epitopes in whole HIV-1 proteome revealed that HLA-B-restricted epitopes are more conserved compared to epitopes restricted by HLA-A and C. The same study also showed that although for most of the proteins the fractions of unique HLA-A and B restricted positions are equivalent in the total HIV clade-B proteome, Gag-p24 and Nef seemed to be preferentially targeted by HLA-B alleles as the B-restricted fractions were found to be over threefold higher than the A-restricted residues [32].

In our study, the analysis of different class I HLA alleles that recognize all the listed CTL optimal epitopes revealed some interesting features of HLA restriction patterns of CTL epitopes. Figure 5 shows the number of optimum epitopes recognized by 62 different class I HLA alleles. It was found that HLA-B*57 was the most successful as far as the number of epitope recognition was concerned as it recognizes 22 different optimal epitopes. The other successful HLA alleles were HLA-A*3, A*2, B*7, A*11, and B*35 (Figure 5). Among the total HLA alleles HLA-B contributes to the 50% of the total allele pool whereas HLA-A and C constitute 27% and 23%, respectively (Table 5).

These data also support the previous findings and also signify the role of HLA-B alleles in controlling HIV-1 infection as HLA-B was found to be associated with the 60% of the experimentally validated CTL epitopes. In harmony with the finding of Costaa et al. [32] we also observed a low % (9% of the total optimal epitope pool) of epitopes was recognized by HLA-C alleles. We have also analyzed the % of the epitopes associated with HLA-A and B alleles in individual HIV proteins (gp160, Nef, p24, p17, and RT). Analysis showed that only the p24 and Nef epitopes were associated with higher numbers of HLA-B alleles than A alleles. In contrast, epitopes derived from gp160, p17, protease, and other proteins (Tat, Rev, Vpu, Vpr, Vif) were recognized by slightly greater number of A alleles compared to B alleles. In case of intergrase, the epitopes were restricted by almost similar number of A and B alleles. There was a significant difference between the HLA*A and HLA*B associated epitopes for RT and p24. For RT the % of HLA*A and HLA*B associated epitopes are 19 and 6.28, respectively. In case of p24 the % of HLA*B associated epitopes are significantly higher (20%) than the HLA*A epitopes (3.5%).

6. Conclusion

As CTL response against HIV infected cells is proved to be crucial in controlling virus population in the host, rationally the CTL based vaccine should have a profound effect on HIV infection. Yusim et al. suggested that epitope clustering methods provide an alternative strategy to design novel multiepitope vaccine. They also suggested that the multiepitope vaccine should not be composed of a string of single epitopes, rather it should be composed of short region containing the epitope clusters and proximal regions flanking the epitope cluster that may be essential for optimal processing of epitopes. These epitope clusters harbor multiple overlapping epitopes which may be recognized by multiple HLA alleles. In conclusion, this study has shown the analysis of the HIV-1 CTL epitopes which revealed that Nef and p24 proteins of HIV-1 can be considered for CTL based multiepitope vaccine design since a significant number of optimal CTL epitopes are derived from Nef and Gag-p24 and almost all these epitopes showed a clustering pattern. A further study is needed to test these proposed vaccine candidates in laboratory animal to test safety and immunity against HIV.


Correlation between Different Scores. There are three scores for each of the -th position of epitope:(a)epitope hit score: ,(b)hydrophobicity score: ,(c)conservancy score: .

Let us assume that the position range of epitope, for which each of scores exists, is to .

Now(a)the average epitope hit score EHS_AVG = ;(b)the average hydrophobicity score HS_AVG = ;(c)the average conservancy score CS_AVG = .Now(a)the standard deviation of epitope hit score (b)The standard deviation of hydrophobicity score (c)The standard deviation of conservancy score Then the correlation coefficient between the scores are calculated as below.

Correlation between epitope hit count and hydrophobicity score

Correlation between epitope hit count and conservancy score

Correlation between conservancy and hydrophobicity score The calculation of each of these correlations is carried out for each protein group separately.

Conflict of Interests

All authors read and approved the paper. The authors declare that there is no conflict of interests regarding the publication of this paper.