Machine Learning and Network Methods for Biology and MedicineView this Special Issue
Research Article | Open Access
Nonsynonymous Single-Nucleotide Variations on Some Posttranslational Modifications of Human Proteins and the Association with Diseases
Protein posttranslational modifications (PTMs) play key roles in a variety of protein activities and cellular processes. Different PTMs show distinct impacts on protein functions, and normal protein activities are consequences of all kinds of PTMs working together. With the development of high throughput technologies such as tandem mass spectrometry (MS/MS) and next generation sequencing, more and more nonsynonymous single-nucleotide variations (nsSNVs) that cause variation of amino acids have been identified, some of which result in the damage of PTMs. The damaged PTMs could be the reason of the development of some human diseases. In this study, we elucidated the proteome wide relationship of eight damaged PTMs to human inherited diseases and cancers. Some human inherited diseases or cancers may be the consequences of the interactions of damaged PTMs, rather than the result of single damaged PTM site.
More than 200 different types of protein posttranslational modifications (PTMs) have been detected. PTMs are involved in many protein activities and cellular processes, such as protein folding, stability, conformation, and some significant regulatory mechanisms . For instance, reversible phosphorylation is involved in conformational changes of enzymes, which results in their activation and deactivation in signaling transduction ; the proteins with attached single ubiquitin (Ub) or poly-Ub chains are associated with gene transcription, DNA repair and replication, intracellular trafficking, and virus budding ; methylation at certain residues of histones can regulate gene expression , and glycosylation is responsible for targeting substrates and changing protein half-life .
With the development of high-throughput sequencing technology, gene mutation detection has become another important resource to investigate regulatory mechanisms and cellular processes. Some databases such as dbSNP  and SNVDis  curated such mutation data. Other secondary databases curated mutation data annotated to the phenotype or diseases, such as Clinvar , COSMIC , and SwissVar . These databases provide resources to analyze the effect of mutations on human health. However protein activities are closer to disease activities. Either at genomic or at proteomic level, mutations have significant impact on normal gene or protein function, and human diseases could be associated with mutations like nonsynonymous single-nucleotide variations (nsSNVs) on amino acids. Yet how gene mutations affect protein activities through posttranslational modification sites have not been widely studied.
A PTM site that bears nsSNVs can be defined as damaged PTM. Recently, large-scale studies have shown that damaged PTMs caused by numerous inherited and somatic amino acid substitutions  have profound impact on both gene and protein function , and they are associated with human cancer . One instance is that mutation S215R occurring on the PTMs of TP53 could result in breast cancer ; another is mutation of T286 in cyclin D1 (CCND1) causing the loss of phosphorylation of T286 is involved in nuclear accumulation of cyclin D1 in esophageal cancer .
However, some of these previous studies concluded the relationship between damaged PTMs and human health based on predications; some focused only on cancers and many focused on only unique type of PTM. Although data of both gene mutations and PTMs are increasing fast, the proteome-wide analysis on the relationship between damaged PTMs and human diseases is not well studied. In this work, we chose eight experimentally demonstrated damaged PTMs to elucidate their association to human diseases including inherited diseases and cancers (somatic diseases). These eight types of damaged PTMs include amino acid variations on Phosphorylation, Ubiquitylation, Acetylation, Glycosylation, Methylation, SUMOylation, Hydroxylation, and Sulfation, which have been well proved to play key roles in important cellular processes and have close relationship with human disease development; moreover, some cross talks among them have been recently revealed in the view of systematic biology [15, 16]. In this study, we focused on the effect of nsSNVs affecting the functions of these eight important normal PTMs and established a new protocol to analyze and view how these damaged PTMs are associated with human diseases.
2. Materials and Methods
The eight human PTM data sets of Phosphorylation, Ubiquitylation, Acetylation, Glycosylation, Methylation, SUMOylation, Hydroxylation, and Sulfation were obtained from SysPTM 2.0 (released in June, 2013) , which integrated PTMs from public resources as well as manually curated MS/MS identified PTMs from experimental research articles, and dbPTM 3.0 (released in June, 2012) . In this study, we only collected human-related PTMs, and we chose the most frequently modified residues for each type of PTM, respectively. For Phosphorylation, we chose His, Ser, Thr, and Tyr; for Ubiquitylation, we chose Lys; for Acetylation, we chose Ala, Gly, Lys, Met, Ser, and Thr; for Glycosylation, we chose Lys, Ser, and Thr; for Hydroxylation, we chose Asn, Pro, and Lys; for Methylation, we chose Lys, Arg; for Sulfation, we chose Ser, Thr, and Tyr; for SUMOylation, we chose Lys.
The inherited-diseases-related nsSNVs were obtained from ClinVar (accessed in November, 2013) , dbSNP (build 141) , and SwissVar . Cancer-related nonsynonymous single-nucleotide variations (nsSNVs) data were retrieved from COSMIC , TCGA (https://tcga-data.nci.nih.gov/tcga/), and SNVDis ; neutral nsSNVs were extracted based on dbSNP (build 141) , excluding cancer-related SNVs that overlapped with those in COSMIC and TCGA, and other deleterious nsSNVs were filtered by UniProtKB/Swiss-Prot (UniProt released in October, 2013)  and PolyPhen-2  which curated credible nsSNVs mapped on UniProtKB. Then we mapped all these nsSNVs to UniprotKB according to the accession number.
2.2. Mapping PTM Sites with nsSNV Sites
For phosphorylation mapping, we set three criteria: exact match; ±2 sites around the phosphorylated amino acid; ±7 sites around the phosphorylated amino acid . As for the remaining seven types of PTMs studied, we set two criteria: exact match; ±2 sites around the modified amino acid. For phosphorylation, which is the most widespread type of PTM used in cellular signal transduction , in general, protein kinases show a strong selectivity for the primary sequence around the phosphorylation residues such as serine (S), threonine (T), and tyrosine (Y) , so we chose the maximum range up to ±7 sites around the phosphorylation sites. However, for ubiquitylation, which is commonly known as a type of PTM that targets proteins for degradation , by contrast, little primary sequence selectivity for most E3 ubiquitin ligases surrounding the target Lys was exhibited . For the remaining types of PTMs, such as glycosylation, which is important in protein folding and stability  and acetylation, which influences gene regulation in eukaryotic cells , in order to unify the range and the numbers of nsSNVs around the modification sites, we all chose the same criteria with ubiquitylation.
2.3. Association between Damaged PTM Sites and Diseases
nsSNV affected PTM sites are defined as damaged PTMs in this work. Annotations of nsSNVs (deleterious or neutral) were based on the information from the databases mentioned above and on Online Mendelian Inheritance in Man (OMIM; http://www.ncbi.nlm.nih.gov/omim)  for reference. Moreover, we identified the elaborate annotated information of nsSNV-related diseases from SwissVar  and the explicit matching of nsSNVs with PTM sites was performed. We calculated the association between damaged PTMs and human diseases based on proteins carrying damaged PTM (with SNV related disease annotation-inherited diseases (germline diseases) or cancers (somatic disease)) for each type of PTM, respectively, by hypergeometric test. In our hypergeometric test, the diseases-associated nsSNVs mapped on or around PTM sites were taken as the test dataset, the neutral nsSNVs mapped on or around PTM sites mentioned above were used as control dataset, and the total neutral nsSNVs and the total damaging nsSNVs on proteins containing one specific type of PTM were used as the two background datasets, to find the disease-associated damaged PTM proteins (with damaging SNVs on this type of PTM) (with ).
2.4. Functional Analysis of Diseases Associated Damaged PTM Sites
To further analyze the functions and features of diseases-related damaged PTMs and their proteins, enrichment analyses were performed using DAVID 6.7 (the database for annotation, visualization, and integrated discovery) . Pathways, biomarkers, and related drugs were analyzed by software Ingenuity Pathway Analysis (IPA) (Ingenuity Systems, http://www.ingenuity.com/). In order to find the structure information of the damaged PTMs, we performed domain enrichment analysis for both inherited disease and cancer-related damaged PTMs based on the domain information from Pfam (version 27.0, released in June, 2012); only the domains containing damaged PTMs were chosen. The enrichment results were calculated and chosen based on disease-related PTM-containing proteins using Fisher’s exact test and adjusted with Benjamini-Hochberg method (corrected value < 0.01).
2.5. Cross talks between PTM Types
As for the cross talks between some pairwise types of PTMs, positive and negative cross talks were both considered. Positive cross talk means that one PTM serves as a signal for the addition or removal of a second PTM, or for recognition by a binding protein that carries out a second modification. The negative cross talk could be direct competition for modification of one single residue on a protein, or one modification masks the recognition site of a second PTM . Some positive cross talks can be seen from the pathways or networks they are involved in, based on the physical distance and protein-protein interaction, while negative cross talks can be seen on the same residues where different PTMs compete to occur. Nowadays, more and more information of PTMs have been annotated into protein-protein interaction and associated networks , and we mined the cross talks between PTMs based on PTMcode 2 (http://ptmcode.embl.de/) which compiles known and predicated PTM associations . The interaction of the eight damaged PTMs with annotated disease information was illustrated with STRING (http://string-db.org/) .
The workflow and protocol of this study are shown in Figure 1. We retrieved PTM data and nsSNVs data from the databases mentioned above. Then we matched them to find the PTM sites affected by nsSNVs (the matched results are available in Table S1 in Supplementary Material available online at http://dx.doi.org/10.1155/2015/124630); the percentages of the exact matched result out of all eight types of PTMs is shown in Figure 2, and the concrete numbers of nsSNVs on each type of PTM are presented in Table 1.
3.1. The Statistical Relationship between Damaged PTMs and Inherited Diseases and Cancers
We calculated the PTMs affected by inherited disease and cancer-related nsSNVs, respectively, using hypergeometric test and found that phosphorylation affected by nsSNVs was most significantly related to both inherited diseases and cancers. The next is ubiquitylation; however, based on our calculation, it is not significant in inherited diseases, albeit significant in cancers when performing the exact match. The remaining types of PTMs affected by nsSNVs were not significantly associated with inherited diseases. When we expanded to ±2 amino acids around the modified sites, the damaged PTMs significantly associated with inherited diseases included not only ubiquitylation, but also acetylation and glycosylation. Our results implied that most PTMs affected by nsSNVs were cancer-related, rather than inherited-disease-related (see Tables 1 and 2). This phenomenon might be biased by the data source from big cancer project like The Cancer Genome Atlas (TCGA), Pan-Cancer analysis project , and databases like Catalogue of somatic mutations in cancer (COSMIC) .
| values in this column were calculated using hypergeometric test and all values refer to the left column (genetic disease).|
We chose the most frequent modified amino acids, such as Histidine (H), Serine (S), Threonine (T), and Tyrosine (Y) for phosphorylation, Lysine (K) for ubiquitylation, and made a calculation on the frequency of the appearance of nsSNVs on these modified amino acids. We found that the occurring frequency of the modified amino acids affected by nsSNVs was lower compared with their appearance on the whole proteome (data not shown). This demonstrated that the modified amino acids were less affected by mutations. Previous researches showed that PTM sites generally play a key role in normal cellular process like protein-protein interactions and signal transduction and therefore are more stable [15, 32], and our results supported this concept.
Phosphorylation is the best studied and also the most prominent PTM, which has the most abundant data as well . The association between damaged phosphorylation sites and both inherited diseases and cancers is significant, no matter for exact match or for ±2, ±7 amino acids around the phosphorylation sites (Tables 2 and 3). 76736 human phosphorylation sites were obtained in total, out of which only 7005 (9.128%) PTM sites were directly disrupted by nsSNVs. 313 ( value = 0.01331) and 2684 ( value = 0.01974) out of the 7005 damaged phosphorylation sites were inherited-disease-related and cancer-related, respectively. Therefore, phosphorylation affected by nsSNVs was significantly associated with both inherited diseases and cancers ( values < 0.05) (Table 2). For protein kinases, in general, they exhibit a strong selectivity for the primary sequence around the residues they will phosphorylate , so ranges of ±2, ±7 residues around the phosphorylated sites were used to find impact by nsSNVs  in this study. Ser, Thr, and Tyr can all be phosphorylated; the alterations among these three amino acids can result in diseases, such as S251T in connexin43 (Cx43) protein which is associated with congenital conotruncal anomalies  (Table S2, shown in red).
| values in this column were calculated using hypergeometric test and all values refer to the left column (genetic disease).|
In contrast, ubiquitylation shows little selectivity on primary sequence, such as Lysine, which is highly preferred as the target site of most E3 ubiquitin ligases . So we only chose 2 criteria: exact match and ±2 amino acids around Lysine. Compared to phosphorylation, the ratio of ubiquitylation sites affected by nsSNVs over total ubiquitination sites (7.22%) found on ubiquitylation was lower (22542 ubiquitylation sites, 5988 proteins). There were 1628 exactly matched nsSNVs found on ubiquitylation proteins, only 59 (3.624%, value = 0.08067) were inherited disease-associated and 651 (39.98%, value = 0.01722) were cancer-related sites. For acetylation and glycosylation, both were not found closely related with inherited diseases and cancers (Table 1).
Then, for the remaining four types of PTMs, the numbers of both exact match and ±2 range match were much less than those of the PTMs above, albeit these four types of PTMs are involved in a lot of important cellular processes, and recent works also discovered their related functions and diseases. For instance, SUMOylation proteins are implicated in human diseases including cancers and “Huntington’s, Alzheimer’s, and Parkinson’s diseases”; hydroxylation in Asp110Asn is related with “hemophilia b”; methylation in Arg75Trp is associated with “deafness” ; as for sulfation, however, we only identified four mutations in one protein FA8_HUMAN and those were associated with “hemophilia.”
Although we found that a lot of damaged PTMs were related with human inherited diseases and cancers, however, almost half of the data remain to be elucidated on their relationships with human diseases. With more damaged PTMs being annotated and analyzed, their impact over health or disease development may become clearer.
3.2. The Damaged PTMs Annotated with Information of Inherited Diseases and Cancers
For all of the eight PTM types studied, we annotated some curated information of diseases based on SwissVar, some annotation information were obtained from the source databases. Although the disease information is up-to-date, the limitation of different databases makes it hard to acquire all the information of known diseases. For instance, inherited-disease-related phosphorylation, “congenital, hereditary, and neonatal diseases and abnormalities,” is the most associated disease based on the analysis of SwissVar on exact matched inherited-diseases-related nsSNVs. The next is “skin and connective tissue diseases” and “nervous system diseases.” However, “neoplasms” account for the most part of the known diseases in ubiquitylation and acetylation.
In order to acquire more information on related diseases, we performed enrichment analysis of diseases using IPA (Figures 3(a) and 3(b)). We performed both inherited-diseases and cancers enrichment analysis on web tool IPA based on the proteins that carried the damaged PTMs, which were caused by the nsSNVs on or around the modification sites. Through enrichment analysis, we could see that in the exact matched phosphorylation related inherited diseases, “autosomal dominant disease” (, corrected value = ,) ranked the first with 50 proteins. For example, PSN1_HUMAN, TNR1A_HUMAN, VHL_HUMA, and PSN1_HUMAN were well studied and associated with “autosomal dominant early-onset Alzheimer’s disease” in human . The most significant cancer for the exact matched phosphorylation is “Adenocarcinoma” (, corrected value = ), which ranked the top with 1074 proteins; RASK_HUMAN, P53_HUMAN, EGFR_HUMAN, and so forth were the representative ones. RASK_HUMAN is associated with adenocarcinoma in human large intestine and lung and other tissues. P53_HUMAN is well known for its associations with human colon and rectal and other cancers [37, 38]; for instance, mutation on Ser376 results in the loss of phosphorylation sites, which creates a consensus binding site for 14-3-3 proteins and increases the affinity of p53 for sequence-specific binding sites on DNA . As to ubiquitylation, “Skin abnormality” was the most significant inherited disease (, corrected value = ), and two proteins were closely related to it: TSC2_HUMAN and TSC1_HUMAN. They were reported to be associated with tuberous sclerosis syndrome in human . Non-small-cell lung cancer was found significant (, corrected value = ) in Ubiquitylation. For acetylation and glycosylation, we also examined both associated inherited diseases and cancers. As to acetylation, we observed disorders of cellular development and cellular growth and proliferation besides cancers that were led by mutations on P53_HUMAN. With regard to glycosylation, the diseases were closely related to lipid metabolism and molecular transport.
We then expanded our search range to the nsSNVs that could affect the PTMs: ±2, ±7 around phosphorylation sites and ±2 for the remaining types of PTMs. First, we chose ±2 range for all the 8 types of PTMs to analyze the associated diseases. For inherited diseases, “autosomal dominant disease” and “autosomal recessive disease” ranked top three in phosphorylation, Ubiquitylation, Acetylation, Glycosylation, Methylation, Hydroxylation, and Sulfation. This was clearly different from the exact matched results. Both autosomal diseases and X-linked hereditary diseases became significant when more nsSNVs were accumulated around PTM sites. The comparison between exact-matched and ±2 range-matched results indicates that (a) mutations on PTMs are rare and, only some certain kinds of inherited diseases were indicated to be caused by them, while more kinds of diseases were indicated to be caused by nsSNVs surrounding PTM sites; (b) human inherited diseases are closely associated with disturbances on and surrounding PTM sites.
Next, we analyzed the ±2 sites range-matched on cancers; the results did not introduce as many changes as exact-matched results. We also compared the data between ±2 and ±7 range around phosphorylation sites; however, their difference was not significant. The differences of human inherited diseases and cancers could be related with the damages of nsSNVs on PTM sites and phenotype: cancers are mostly caused by somatic mutations and present in the current generation; however, the damages of nsSNVs on PTM sites are not easily inherited to the next generation, so the numbers and types of inherited diseases are less compared with damaged-PTM related cancers.
3.3. Functional and Structural Analysis
3.3.1. Enrichment Analysis of Keywords, GO, and Domains
We performed functional enrichment analysis using DAVID. First, we performed keywords and GO association analysis (FDR < 0.01). We still divided data into two parts: exact match and ±2 amino acids (AA) match. “Disease mutation” was the most significant keyword based on the inherited-disease-related nsSNVs that appeared in all the four types of PTMs: Phosphorylation, Ubiquitylation, Acetylation, and Glycosylation. The enrichment analyses showed that the proteins we chose were more likely related to diseases when they encountered mutations. GO enrichment analysis was also performed for the four types of PTMs mentioned above. For each PTM category, the differences of functions among them are obvious (see Table S3). For example, the proteins with phosphorylation mainly involve cell activities like cell death, apoptosis, and signal transduction. Coagulation and wound healing were the GO tags for glycosylation. Through the analyses, we found that the diseases led by the damaged PTMs were closely associated with the role of these proteins played in the regulation of normal cellular processes, which indicated that the damage caused by damaged PTMs was serious.
When we moved to cancer-related nsSNVs on PTMs, the keywords about them had less information about mutations, but rather directing to the function of the proteins. What interested us the most was ubiquitylation; the keywords did not show much about themselves, but other modifications on them. This indicates that ubiquitylation is more likely coexisting with other types of PTMs. Then we examined the GO terms on cancers, besides the functions of the proteins performed, also the chemical characters of them showed up. Like phosphorylation, the most significant GO term about phosphorylation was “protein amino acid phosphorylation” on both exact match and ±2 range match. For the remaining types of PTMs, GO terms more revealed protein roles on different processes; for example, “modification-dependent protein catabolic process” ranked in the top two on both range criteria of ubiquitylation.
Then we examined the damaged PTMs associated domains based on the data from Pfam to analyze the impact of damaged PTMs on protein structures. For damaged phosphorylation, “protein tyrosine kinase” (, corrected value = ) and “protein kinase domain” (, corrected value = ) ranked the first in human inherited diseases and cancers, respectively. The damaged phosphorylation on the kinases could result in damage to another phosphorylation and thus nsSNVs do not affect only one phosphorylation site. Then, in terms of ubiquitylation, “P53 DNA-binding domain” (, corrected value = ) and “Histone” (, corrected value = ) were the most significant domains. On P53_HUMAN, lots of phosphorylation and ubiquitylation sites coexisted and some of them affected the same domains, such as “P53 DNA-binding domain.” “Connexin” (, corrected value = ) and “HMG14 and HMG17” (, corrected value = ) were the domains damaged acetylation was enriched in. Glycosylation was involved in wound healing, cell-adhesion, and cellular proliferation and we found that “immunoglobulin domain” (, corrected value = 0.042) and “class I histocompatibility antigen, domains alpha 1 and 2” (, corrected value = ) were enriched in glycosylation domains. Also for Hydroxylation, “collagen triple helix repeat (20 copies)” (, corrected value = ) was found in cancer-related dataset. For other types of PTMs, the domains were scattered compared with PTMs mentioned above. From the data of associated domains, we found that the damaged PTMs associated domains were closely related to molecular binding and protein-protein interactions, which was a major function of PTMs .
3.3.2. Pathway Analysis
In order to investigate the function of damaged PTMs in proteome-wide scale, we performed pathway analysis by IPA (details available in Table S4). In IPA analysis for inherited-disease associated damaged PTMs of the exact matched data, some pathways are significant: “ovarian cancer signaling” in Phosphorylation (corrected value = , ratio = 0.131), Ubiquitylation (corrected value = , ratio = 0.046), and Acetylation (corrected value = , ratio = 0.031); “hereditary breast cancer signaling” in Phosphorylation (corrected value = , ratio = 0.116), Ubiquitylation (corrected value = , ratio = 0.062), Acetylation (corrected value = , ratio = 0.036), and Methylation (corrected value = , ratio = 0.027); “Role of BRAC1 in DNA damage response” in Phosphorylation (corrected value = , ratio = 0.18), Ubiquitylation (corrected value = , ratio = 0.066), Acetylation (corrected value = , ratio = 0.049), and Methylation (corrected value = , ratio = 0.033). In these pathways, some are associated with their functions like “Coagulation system” (corrected value = , ratio = 0.171) in glycosylation. As for cancers, we examined each type of PTM category and found that the pathways were more associated with their functions of the proteins, for instance, “protein kinase A signaling” (corrected value = , ratio = 0.269) in Phosphorylation, “protein ubiquitylation pathway” (corrected value = , ratio = 0.134) in Ubiquitylation; we found that more cancer-related damaged PTMs were associated with signaling pathways and this indicated that somatic mutations could affect normal cellular processes more often and may thus result in human cancers.
3.3.3. Protein-Protein Interaction Analysis
On the proteome-wide range, the associations among these proteins were close, and we illustrated the interactions using networks of protein-protein interactions with STRING (Figure 5). With a total of 159 proteins which carried identified damaged PTM sites with SwissVar annotated information, we manually divided the associated proteins of different types of PTMs into six major parts, while Sulfation and SUMOylation were not shown for the limited number of data. Not only did some proteins carry one kind of PTMs, such as KRAS, MRE11A, but also phosphorylation, ubiquitylation, and acetylation coexisted on these proteins. From this network, we found that, except for phosphorylation, the interactions among one kind of PTMs were less compared with their interactions with phosphorylation. This result showed us that phosphorylation which was the hub of signal transduction with a strong relationship with other types of PTMs played a key role in the association between damaged PTMs and human inherited diseases and cancers. For example, PTPN11, which was found carrying damaged acetylation caused by (T2I) associated with “noonan syndrome 1” , was involved in downstream effectors of cytoplasmic protein tyrosine kinases.
3.3.4. Cross talk Analysis
Cross talk between some paired PTMs of different types such as phosphorylation and ubiquitylation and ubiquitylation and acetylation, has become a study theme on proteomics [15, 16]. It shows that the extensive use of PTMs to generate multiple distinct protein states from a single gene product could compensate for the relative paucity of genes in vertebrate genomes . In this work, we investigated the impact of nsSNVs on cross talks between some pairwise PTMs. Cross talks of PTMs can be defined as positive and negative; both mean one PTM has an impact on the other PTM . In this study, we mined the information of cross talks based on PTMcode . Most of the PTM sites have cross talks with other PTM sites based on some evidences such as coevolution and physical distance. Here, we took PTN11_HUMAN as an example for the cross talk within one protein, which totally carried 23 PTMs with 55 functional associations. In our inherited-disease-related dataset, 4 nsSNVs occurred on phosphorylation sites (T2I, Y62D, Y63C, ad Y279C) and 1 on acetylation site of PTN11_HUMAN(Y279S) (Figure 4). The mutations on Y279 are associated with “human LEOPARD syndrome 1” , and the mutations on the remaining sites are associated with “human Noonan syndrome 1” [41, 43]; also, within this protein, T2 is associated with both Y62 and Y63, which are all found changed in “Noonan syndrome 1” . Thus, the association of the damaged PTMs could play a key role in the development of human inherited diseases.
On the proteome-wide range, the associations were more prevalent. Then we took P53_HUMAN and TOP1_HUMAN as examples for the cross talks between different PTM sites on distinct proteins: on P53_HUMAN, we found 21 phosphorylation sites, 14 ubiquitylation sites, and 9 acetylation sites; among them, the associations were prevalent within the protein, and the damaged PTMs mostly resulted in the deficiency in the role it played in significant cellular functions ; K326R on TOP1_HUMAN is related to human breast cancer , and the protein-protein interaction between them is among 159 proteins (Figure 5, boxed in brown); we found that the ubiquitylation on K326 was associated with 33 PTMs in protein P53 (Figure 6); 18 phosphorylation sites were among our inherited disease-related dataset. From the cross talks among these PTMs, we could infer that not only the nsSNVs on one PTM site affect that site, but also other associated sites could be affected. For instance, O-GlcNacylation of S149 in p53 reduces phosphorylation of T155 . Not only human inherited diseases, but also cancers are related to these damaged PTMs.
For the negative cross talk, where more than one kind of PTMs could happen on the same residue, could be occurred in different stage of cellular processes or on different positions. We chose three pairwise PTMs to perform the analysis: phosphorylation and ubiquitylation, phosphorylation and acetylation, and ubiquitylation and acetylation. For the first and second group, phosphorylation and ubiquitylation, and phosphorylation and acetylation, the exact match sites were not overlapped, but when we used damaged ubiquitylation and acetylation sites to match with ±7 sites around phosphorylational sites, we obtained 12 overlapping sites and 10 overlapping sites, respectively, for ubiquitylation and acetylation, and, among them, 7 and 5 sites were on P53_HUMAN, respectively. For example, K320 on TP53 could be ubiquitylated or acetylated (Figure 6). Then we examined the group concerning ubiquitylation and acetylation; we matched their exact sites and obtained 13 overlapping sites. For example, both ubiquitylation and acetylation were detected on K97; nsSNVs on this site could result in “cardiomyopathy, dilated 1a” . Positive cross talk, in which one PTM promotes or prevents another PTM directly on the same site or indirectly on other sites, extends the impact of nsSNVs on PTMs, thus increasing the chance of development of human inherited diseases and cancers in wider ranges. Negative crosstalk with distinct PTMs competing the same site could render nsSNVs on these sites damages to the normal function of all these PTMs, to result in the damages to the related protein functions.
3.4. Potential of Damaged PTMs as Biomarkers in Inherited Diseases and Cancers
The damaged PTMs may cause protein functions to be out of control in canonical pathways . For research and medical use, some of them might be very good biomarker candidates , which could be used as the drug targets for intervention. We found some proteins with damaged PTMs among the canonical pathways that could be most likely regarded as biomarker candidates using information from IPA. For the exact matched phosphorylation sites with nsSNVs, we filtered 481 gene/proteins; several of them had already been used as the targets of some drugs, but plenty of them still remained to be explored as targets of new drugs (more details available in Table S5). We further identified 169 filtered proteins for ubiquitylation and 90 filtered proteins for acetylation (Table S5). Proteins carrying damaged PTMs are usually associated with lots of critical signaling pathways during the development of diseases , such as VHL, which were von Hippel-Lindau tumor suppressor, E3 ubiquitin protein ligase, which was involved in cardiovascular disease, hematological disease, and other diseases. Some of the candidate biomarkers are functionally similar to the known proteins in clinical use. MRP1_HUMAN, which belonged to the family of ABCC1, has been recognized as a biomarker in breast cancer and other cellular disorders , with drugs like “sulfinpyrazone.” For each PTM, we provided some most likely biomarkers as candidates (Table S5).
In summary, through this work, we investigated the associations between PTMs affected by nsSNVs and human inherited diseases and cancers from diverse perspectives such as functions, pathways, and cross talks. These provided us a proteome-wide view of how the proteins, which carry modifications and nsSNVs, play roles in the development of diseases and cancers. Not only do PTMs play key roles in almost every important cellular process, but also their dysfunction could result in human diseases. We provided a practical protocol to analyze disease-related proteins that carry damaged PTMs; some valuable proteins were listed out as the candidate biomarkers for potential research and clinical use. However, still almost half of damaged PTMs did not demonstrate associations with human health based on our current analysis, and their functions need to be revealed. Moreover, what we need to do in the future is to identify the causative relationships between the damaged PTMs and human diseases, by discovering key nsSNVs on protein modifications.
|PTM:||Protein posttranslational modification|
|nsSNVs:||Nonsynonymous single-nucleotide variations|
|TCGA:||The Cancer Genome Atlas|
Conflict of Interests
The authors confirm that this paper’s content has no conflict of interests.
This work was funded by National Hi-Tech Program (2012AA020201); Key Infectious Disease Project (2012ZX10002012-014); National Key Basic Research Program (2010CB912702, 2011CB910204).
Supplementary Table S1: The proteins and their sites of exact matched nsSNVs on each type of PTM. The proteins were shown with their UniProt Accession number.
Supplementary Table S2: The information of damaged PTM sites associated diseases annotated by SwissVar. For each type of PTM, the information of associated diseases for the exact matched and around matched PTM sites were given. The alterations between the modified amino acids of one type of PTM marked in red.
Supplementary Table S3: Enrichment results of keywords and GO. Both exact matched and ±2 matched results were shown. Inherited diseases were marked in yellow and cancers were marked in green.
Supplementary Table S4: Pathway analysis based on IPA. The results were boxed for each type of PTM, and the analysis were performed for both inherited diseases and cancers. P value and the ratio calculated based on IPA were shown.
Supplementary Table S5: Summary of biomarker candidates. The biomarker candidates were chosen based on information from IPA and details about them were shown.
- J. G. Tooley and C. E. Schaner Tooley, “New roles for old modifications: emerging roles of N-terminal post-translational modifications in development and disease,” Protein Science, vol. 23, no. 12, pp. 1641–1649, 2014.
- J. Seo and K.-J. Lee, “Post-translational modifications and their biological functions: proteomic analysis and systematic approaches,” Journal of Biochemistry and Molecular Biology, vol. 37, no. 1, pp. 35–44, 2004.
- K. Haglund and I. Dikic, “Ubiquitylation and cell signaling,” The EMBO Journal, vol. 24, no. 19, pp. 3353–3359, 2005.
- J. Nakayama, J. C. Rice, B. D. Strahl, C. D. Allis, and S. I. S. Grewal, “Role of histone H3 lysine 9 methylation in epigenetic control of heterochromatin assembly,” Science, vol. 292, no. 5514, pp. 110–113, 2001.
- S. T. Sherry, M. Ward, and K. Sirotkin, “dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation,” Genome Research, vol. 9, no. 8, pp. 677–679, 1999.
- K. Karagiannis, V. Simonyan, and R. Mazumder, “SNVDis: a proteome-wide analysis service for evaluating nsSNVs in protein functional sites and pathways,” Genomics, Proteomics and Bioinformatics, vol. 11, no. 2, pp. 122–126, 2013.
- M. J. Landrum, J. M. Lee, G. R. Riley et al., “ClinVar: public archive of relationships among sequence variation and human phenotype,” Nucleic Acids Research, vol. 42, no. 1, pp. D980–D985, 2014.
- S. Bamford, E. Dawson, S. Forbes et al., “The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website,” British Journal of Cancer, vol. 91, no. 2, pp. 355–358, 2004.
- A. Mottaz, F. P. A. David, A.-L. Veuthey, and Y. L. Yip, “Easy retrieval of single amino-acid polymorphisms and phenotype information using SwissVar,” Bioinformatics, vol. 26, no. 6, pp. 851–852, 2010.
- C. Greenman, P. Stephens, R. Smith et al., “Patterns of somatic mutation in human cancer genomes,” Nature, vol. 446, no. 7132, pp. 153–158, 2007.
- C. Cole, K. Krampis, K. Karagiannis et al., “Non-synonymous variations in cancer and their effects on the human proteome: workflow for NGS data biocuration and proteome-wide analysis of TCGA data,” BMC Bioinformatics, vol. 15, no. 1, article 28, 2014.
- P. Radivojac, P. H. Baenziger, M. G. Kann, M. E. Mort, M. W. Hahn, and S. D. Mooney, “Gain and loss of phosphorylation sites in human cancer,” Bioinformatics, vol. 24, no. 16, pp. i241–i247, 2008.
- E. Manié, A. Vincent-Salomon, J. Lehmann-Che et al., “High frequency of TP53 mutation in BRCA1 and sporadic basal-like carcinomas but not in BRCA1 luminal breast tumors,” Cancer Research, vol. 69, no. 2, pp. 663–671, 2009.
- S. Benzeno, F. Lu, M. Guo et al., “Identification of mutations that disrupt phosphorylation-dependent nuclear export of cyclin D1,” Oncogene, vol. 25, no. 47, pp. 6291–6303, 2006.
- T. Hunter, “The age of crosstalk: phosphorylation, ubiquitination, and beyond,” Molecular Cell, vol. 28, no. 5, pp. 730–738, 2007.
- J.-S. Lee, E. Smith, and A. Shilatifard, “The language of histone crosstalk,” Cell, vol. 142, no. 5, pp. 682–685, 2010.
- J. Li, J. Jia, H. Li et al., “SysPTM 2.0: an updated systematic resource for post-translational modification,” Database, vol. 2014, p. bau025, 2014.
- C. T. Lu, K. Y. Huang, M. G. Su et al., “DbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications,” Nucleic Acids Research, vol. 41, no. 1, pp. D295–D305, 2013.
- M. Magrane and U. P. Consortium, “UniProt Knowledgebase: a hub of integrated protein data,” Database, vol. 2011, Article ID bar009, 2011.
- I. A. Adzhubei, S. Schmidt, L. Peshkin et al., “A method and server for predicting damaging missense mutations,” Nature Methods, vol. 7, no. 4, pp. 248–249, 2010.
- J. Reimand, O. Wagih, and G. D. Bader, “The mutational landscape of phosphorylation signaling in cancer,” Scientific Reports, vol. 3, article 2651, 2013.
- J. D. Graves and E. G. Krebs, “Protein phosphorylation and signal transduction,” Pharmacology and Therapeutics, vol. 82, no. 2-3, pp. 111–121, 1999.
- P. Beltrao, P. Bork, N. J. Krogan, and V. van Noort, “Evolution and functional cross-talk of protein post-translational modifications,” Molecular Systems Biology, vol. 9, article 714, 2013.
- M. M. Chen, A. I. Bartlett, P. S. Nerenberg et al., “Perturbing the folding energy landscape of the bacterial immunity protein Im7 by site-specific N-linked glycosylation,” Proceedings of the National Academy of Sciences of the United States of America, vol. 107, no. 52, pp. 22528–22533, 2010.
- L. Verdone, E. Agricola, M. Caserta, and E. di Mauro, “Histone acetylation in gene regulation,” Briefings in Functional Genomics & Proteomics, vol. 5, no. 3, pp. 209–221, 2006.
- J. Amberger, C. Bocchini, and A. Hamosh, “A new face and new challenges for Online Mendelian Inheritance in Man (OMIM),” Human Mutation, vol. 32, no. 5, pp. 564–567, 2011.
- X. Jiao, B. T. Sherman, D. W. Huang et al., “DAVID-WS: a stateful web service to facilitate gene/protein list analysis,” Bioinformatics, vol. 28, no. 13, pp. 1805–1806, 2012.
- G. Duan and D. Walther, “The roles of post-translational modifications in the context of protein interaction networks,” PLoS Computational Biology, vol. 11, no. 2, Article ID e1004049, 2015.
- P. Minguez, I. Letunic, L. Parca, and P. Bork, “PTMcode: a database of known and predicted functional associations between post-translational modifications in proteins,” Nucleic Acids Research, vol. 41, no. 1, pp. D306–D311, 2013.
- D. Szklarczyk, A. Franceschini, S. Wyder et al., “STRING v10: protein-protein interaction networks, integrated over the tree of life,” Nucleic Acids Research, vol. 43, pp. D447–D452, 2015.
- The Cancer Genome Atlas Research Network, J. N. Weinstein, E. A. Collisson et al., “The Cancer Genome Atlas Pan-Cancer analysis project,” Nature Genetics, vol. 45, no. 10, pp. 1113–1120, 2013.
- P. Radivojac, P. H. Baenziger, M. G. Kann, M. E. Mort, M. W. Hahn, and S. D. Mooney, “Gain and loss of phosphorylation sites in human cancer,” Bioinformatics, vol. 24, no. 16, pp. I241–I247, 2008.
- G. A. Khoury, R. C. Baliban, and C. A. Floudas, “Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database,” Scientific Reports, vol. 1, article 90, 2011.
- P. Chen, L.-J. Xie, G.-Y. Huang, X.-Q. Zhao, and C. Chang, “Mutations of connexin43 in fetuses with congenital heart malformations,” Chinese Medical Journal, vol. 118, no. 12, pp. 971–976, 2005.
- G. Richard, T. W. White, L. E. Smith et al., “Functional defects of Cx26 resulting from a heterozygous missense mutation in a family with dominant deaf-mutism and palmoplantar keratoderma,” Human Genetics, vol. 103, no. 4, pp. 393–399, 1998.
- D. Campion, C. Dumanchin, D. Hannequin et al., “Early-onset autosomal dominant Alzheimer disease: prevalence, genetic heterogeneity, and mutation spectrum,” The American Journal of Human Genetics, vol. 65, no. 3, pp. 664–670, 1999.
- B. Dix, P. Robbins, S. Carrello, A. House, and B. Iacopetta, “Comparison of p53 gene mutation and protein overexpression in colorectal carcinomas,” British Journal of Cancer, vol. 70, no. 4, pp. 585–590, 1994.
- The Cancer Genome Atlas Network, “Comprehensive molecular characterization of human colon and rectal cancer,” Nature, vol. 487, no. 7407, pp. 330–337, 2012.
- M. F. Lavin and N. Gueven, “The complexity of p53 stabilization and activation,” Cell Death and Differentiation, vol. 13, no. 6, pp. 941–950, 2006.
- M. van Slegtenhorst, R. de Hoogt, C. Hermans et al., “Identification of the tuberous sclerosis gene TSC1 on chromosome 9q34,” Science, vol. 277, no. 5327, pp. 805–808, 1997.
- A. Sarkozy, E. Conti, D. Seripa et al., “Correlation between PTPN11 gene mutations and congenital heart defects in Noonan and LEOPARD syndromes,” Journal of Medical Genetics, vol. 40, no. 9, pp. 704–708, 2003.
- B. Keren, A. Hadchouel, S. Saba et al., “PTPN11 mutations in patients with LEOPARD syndrome: a French multicentric experience,” Journal of medical genetics, vol. 41, no. 11, article e117, 2004.
- M. Tartaglia, K. Kalidas, A. Shaw et al., “PTPN11 mutations in noonan syndrome: molecular spectrum, genotype-phenotype correlation, and phenotypic heterogeneity,” The American Journal of Human Genetics, vol. 70, no. 6, pp. 1555–1563, 2002.
- J. Rutherford, C. E. Chu, P. M. Duddy et al., “Investigations on a clinically and functionally unusual and novel germline p53 mutation,” British Journal of Cancer, vol. 86, no. 10, pp. 1592–1596, 2002.
- T. Sjöblom, S. Jones, L. D. Wood et al., “The consensus coding sequences of human breast and colorectal cancers,” Science, vol. 314, no. 5797, pp. 268–274, 2006.
- E. Arbustini, A. Pilotto, A. Repetto et al., “Autosomal dominant dilated cardiomyopathy with atrioventricular block: a lamin A/C defect-related disease,” Journal of the American College of Cardiology, vol. 39, no. 6, pp. 981–990, 2002.
- J. V. Olsen, B. Blagoev, F. Gnad et al., “Global, in vivo, and site-specific phosphorylation dynamics in signaling networks,” Cell, vol. 127, no. 3, pp. 635–648, 2006.
- N. Rifai, M. A. Gillette, and S. A. Carr, “Protein biomarker discovery and validation: the long and uncertain path to clinical utility,” Nature Biotechnology, vol. 24, no. 8, pp. 971–983, 2006.
- J. Zhang, M. J. Guy, H. S. Norman et al., “Top-down quantitative proteomics identified phosphorylation of cardiac troponin I as a candidate biomarker for chronic heart failure,” Journal of Proteome Research, vol. 10, no. 9, pp. 4054–4065, 2011.
Copyright © 2015 Bo Sun et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.