Mathematical Modelling of IoT-Based Health Monitoring SystemView this Special Issue
The Prediction of a 3-Protein-Based Model on the Prognosis of Head and Neck Squamous Cell Carcinoma
Background. Head and neck squamous cell carcinoma (HNSCC) is one of the commonest malignant tumors. Using high-throughput genomic methods, RNA-based diagnostic and prognostic models for HNSCC with potential clinical value have been developed. However, the clinical utility and reproducibility of these models are uncertain. Because the complex regulatory processes occurring after mRNA is transcribed, the abundance of proteins in a cell can never be fully predicted or explained by their corresponding mRNA expression. We aimed to assume and verify a novel protein signature for checking the HNSCC patients’ prognosis. Methods. The functional proteomic data of 332 HNSCC cases were collected from The Cancer Proteome Atlas (TCPA), and the related follow-up and clinical data were acquired from The Cancer Genome Atlas (TCGA). This study adopted multivariate and univariate Cox regression analysis, Akaike Information Criterion, receiver operating characteristic (ROC) analysis, and Kaplan-Meier method. Results. Patients’ clinical features in both sets were comparable (all, ). The area under the ROC curve (AUC) for the 3-protein signature (X4EBP1_pT37T46, HER3_pY1289, and NF2) in the test set was 0.655 and in the combined cohort (all 332 patients combined) was 0.699. In addition, the 3-protein signature exhibited better predictive value for the survival of HNSCC patients as in comparison with conventional clinical factors like age, gender, tumor stage, and smoking history (TNM stage). Conclusion. The 3-protein signature developed in this study exhibits good performance in predicting the overall survival of with HNSCC patients. The 3-protein signature exhibited better predictive value for survival than conventional clinical factors just like gender, TNM stage, smoking history, and age.
Head and neck squamous cell carcinoma (HNSCC) is a malignancy originating from the oropharynx, hypopharynx, oral cavity, and larynx. More than 550,000 persons worldwide are diagnosed with HNSCC annually, resulting in 380,000 deaths . Epidemiological studies have indicated HNSCC’s incidence is growing, and the 5-year survival rate is <50% despite advances in treatments such as surgery, radiation therapy, and chemotherapy [2–5]. The survival rate is <1 year in locally advanced HNSCC patients who develop metastases or relapse . Alcohol consumption, human papillomavirus (HPV) infection, and smoking are related to the occurrence, progression, and prognosis of HNSCC . However, the reliability of these risk factors is unclear . HNSCC associated with tobacco use and HPV have been shown to have different molecular signatures, complicating the use of molecular techniques to predict survival and develop targeted treatments . Because of the molecular heterogeneity and etiological complexity of HNSCC, it is difficult to determine novel biomarkers that can help prognosis prediction and therapy guidance [8, 10]. Using high-throughput genomic methods, RNA-based models with potential values clinically have been developed for the prognosis and diagnosis of HNSCC [11–14]. However, the clinical utility and reproducibility of these models are uncertain . The modified proteome represents the final result of different molecular pathways and has the potential for the therapeutic targeting of malignancies. However, due to the complex regulatory processes occurring after mRNA is transcribed, the abundance of proteins in a cell can never be fully predicted or explained by their corresponding mRNA expression . As such, proteomic analysis of tumors can provide researchers with large amounts of bioinformatics data different from that obtained by RNA or DNA sequencing.
In recent years, protein-based prognostic signature models have been developed to predict cancer survival. For example, Xie et al.  developed a 3-protien predictive risk score model for high-grade serous ovarian cancer’s progression-free survival (PFS) and overall survival (OS). Han et al.  identified 4 protein biomarkers that are prognostic for kidney renal clear cell carcinoma. Patil and Mahalingam  successfully predicted lower-grade glioma patients’ survival using a 4-protein prognostic signature.
The Cancer Proteome Atlas (TCPA) is an open-access bioinformatics resource that belongs to The Cancer Genome Atlas (TCGA) Project [20, 21]. It contains protein expression data of many tumor cell lines formed by reverse-phase protein arrays (RPPAs) [20, 21]. In this paper, a novel protein signature was constructed and checked for determining the prognosis of HNSCC patients using the functional proteomic data collected from TCPA.
2. Material and Methods
2.1. Patients and Proteomic Data
The functional proteomic data of 347 HNSCC patients were obtained from TCPA online database (http://tcpaportal.org), and TCGA (https://cancergenome.nih.gov/) provided corresponding clinical and follow-up data. Upon removing fragmentary clinical follow-up records, this article enrolled the data of 332 cases.
The 332 patients were grouped randomly as training set () and test set (), with the aim of comparability of variables in the 2 sets. The prognostic model was developed using the training set and verified via the test set.
2.2. Survival Analysis Based on the Functional Proteomic Data in TCPA Database
Candidate proteins were selected from the functional proteomic data using the Kaplan-Meier method and univariate Cox proportional hazards regression analysis in the survival R package software version 3.6.3. First, the univariate Cox regression analysis was performed for the associations between patient OS and protein expression. Next, we repeatedly classified the 168 patients in the training set into low or high expression on the basis that the protein expression identification was < median (low) or > median (high). Both groups were kept the same patients in number. The median level was determined by the number of patients in both groups. Survival differences between the low- and high-expression groups were examined with the 2-sided log-rank test. Only proteins with a value of were considered candidate proteins.
2.3. Definition of Protein-Related Prognostic Model and Risk Score
Based on the above method, 7 proteins were chosen to be candidates and received a multivariate Cox regression analysis to identify the preferred mathematical model with the Akaike Information Criterion (AIC). A predictive model by AIC has the best informative efficacy and goodness of fit. After the multivariate Cox regression analysis, the risk score of each patient was calculated by a formula: . Specifically, is prognostic proteins’ number; is the protein’s coefficient in the multivariate Cox regression analysis; and represents the protein expression value. Proteins were considered to have a high-risk signature () and a low-risk signature (). All functional proteomic data were analyzed using the R package software version 3.6.3.
2.4. Risk Stratification and Survival Curve
Based on the calculated risk score, the 168 patients were pigeonholed as low-risk (< median score) and high-risk (> median score) groups. With the Kaplan-Meier method and R software, an OS curve was generated. And the survival time differences were compared by the log-rank test.
We also developed 3 survival curves of the low- and high-expression groups that were based on the final 3 proteins included in the predictive model. Finally, risk curves, survival maps, and heat maps were plotted to show the risk score’s distributions of each protein for training set patients.
2.5. Independent Analysis of Prognosis and Comparison of Receiver Operating Characteristic (ROC) Curves
To appraise clinical factors’ prognostic ability (age, gender, disease stage, and smoking history) and the risk score, the multivariate and univariate Cox regression analyses were conducted using survival state and time as the dependent variables; and was considered that the factors had independent prognostic values.
Besides, the ROC curve analysis was employed for evaluating the performance of the prognostic model and the clinical parameters, and the R Survival ROC package was used for drawing and analyzing the ROC curve. The calculation of areas under the ROC curves (AUCs) was gone on for comparing the prognostic value of clinical factors and the prognostic model.
2.6. Validation in the Testing Set and in Combined Cohorts
Based on the results obtained with the training set, we calculated the 164 patients’ risk scores in the test set. The subjects were partitioned as low- and high-risk groups in the light of the median score. In addition, this process was also carried out in all 332 patients combined (combined cohort). The Kaplan-Meier survival curves of the testing set and that of the combined testing and training set were plotted, and survival differences between the low- and high-risk groups were compared via the log-rank test. And the model’s prognostic value was estimated by the AUC of both ROC curves.
2.7. Protein Coexpression Analysis and the Sankey Diagram
To identify the potential proteins correlated with the 3 proteins in the model, proteins identified in the functional proteomic data whose expressions were significantly correlated with the proteins in the predictive model were identified using 2-sided Pearson’s correlation coefficient analysis and the -test. Proteins with an absolute Pearson’s correlation coefficient value of >0.4 and value < 0.001 were considered to have positive or negative correlation with the 3 proteins in the prognostic model. A Sankey diagram was plotted using the “ggalluvial” R software package to illustrate the potential correlations of the proteins.
3.1. Patient Characteristics
Table 1 presents 332 HNSCC cases’ data clinically in the testing () and training () sets. In these cases, 200 suffered Stage IV, 61 Stage III, 57 Stage II, and 14 Stage I disease. Patients were randomly divided into the testing set () and training set (). Few obvious differences were observed in variables clinically (e.g., age, gender, TNM stage, survival time, and survival status) between the two sets (all, ) (Table 1).
3.2. 7 Proteins Were Selected as Candidate Proteins to Construct a Prognostic Model
A total of 237 proteins were screened for the functional proteomic data of HNSCC from TCPA datasets. As shown in Figure 1 (volcano plot), there were 8 proteins that were defined as low risk and 16 proteins that were defined as high risk. The 24 proteins’ prognostic values were determined via the univariate Cox regression analysis (all, ). Then we conducted the Kaplan-Meier analysis, and 7 proteins were selected as candidate proteins to build a prognostic model (Table 2).
3.3. The 3-Protein Signature Constructed from X4EBP1_pT37T46, HER3_pY1289, and NF2 Was Established by the Multivariate Cox Regression Analysis
A 3-protein prognostic model was established by 3 of the 7 proteins selected with the stepwise multivariate Cox regression analysis. The 3 proteins selected were X4EBP1_pT37T46, HER3_pY1289, and NF2. The predictive model was based on the summed expressions of the 3 proteins weighted by their relative coefficients. The relative coefficients were calculated using the multivariate Cox regression and represented each protein’s risk degree (Table 3). The multivariate survival analysis outcomes using the 3 proteins are shown in Figure 2.
Every patient’s survival risk score was calculated through the formula: . Of the 3 proteins, the coefficient of X4EBP1_pT37T46 was negative in the Cox regression analysis indicating it is protective since high expression is associated with longer OS. Conversely, the coefficients of the other 2 proteins (HER3_pY1289 and NF2) were positive and thus were considered risk factors because higher expression of the 2 proteins meant shorter OS.
3.4. The 3-Protein Signature Can Predict the Survival of HNSCC Patients
First, 3 survival curves of the high- and low-expression groups on basis of the expression of the 3 proteins in the predictive model were developed (Figures 3(a)–3(c)). The Kaplan-Meier survival curves of the 2 groups based on the 3 proteins’ expression were significantly different ( value, log-rank test).
Next, with the median risk score described previously as a standard, training set’s patients were divided into a high-risk group and a low-risk group. Survival analysis indicated a great difference in the high- and low-risk groups’ survival time, further confirming the prognostic effectiveness of the 3-protein signature (Figure 3(d)). The risk curve, survival map, and heat map of the 3-protein signature are shown in Figure 4. As shown in Figures 4(a)–4(b), the deaths in the high-risk areas were obviously larger than those in the low-risk areas. As shown in Figure 5(c), the expression patterns of the 3 proteins were correlated with risk scores.
3.5. The 3-Protein Signature Have Better Prognostic Value than Clinical Factors
Figures 5(a)–5(b) exhibit the results of the univariate and multivariate Cox regression analyses of the 3-protein signature and clinical factors. The OS had significant association with the risk score and N stage, and the 3-protein signature risk score and N stage were both independent predictors of survival. To compare the 3-protein signature risk score and the clinical factors’ prognostic power, ROC curves of each independent variable were plotted, and the AUCs were calculated (Figure 6). The results showed greater AUC of the 3-protein signature (0.750) than the AUC of N stage (0.624) in the training set, indicating that the 3-protein signature exhibited better sensitivity and specificity in predicting survival. Taken together, these results indicate that the 3-protein signature exhibits better predictive value for survival of HNSCC cases (hazard ratio , 95% confidence interval (CI): 1.255-1.726, , Figure 5(b)), as compared with conventional clinical factors like age, sex, smoking history, and TNM stage.
3.6. The 3-Protein Prognostic Signature Model Exhibits Good Performance in the Testing Set
The Kaplan-Meier survival curves and ROC curves of the 3 proteins in the testing set and combined cohort set are shown in Figures 7(a)–7(d). Consistent with results of the training set, differences () of OS were statistically significant between the low- and the high-risk groups in the testing set and combine cohort. The AUC for the 3-protein signature in the testing set was 0.655 (Figure 7(b)) and in the combined cohort was 0.699 (Figure 7(d)), suggesting good performance of the 3-protein signature for predicting OS.
3.7. Other Proteins Potentially Correlated with the Survival of HNSCC
Applying Pearson’s correlation and the -test to the 3 proteins in the model and other proteins identified showed that there was coexpression between 12 proteins and 1 of the 3 proteins in the model ( and ). The Sankey diagram showing the correlations of the proteins is shown in Figure 8. Thus, the 12 proteins may be related to HNSCC’s prognosis.
In this study we identified 3 proteins (X4EBP1_pT37T46, HER3_pY1289, and NF2) related to HNSCC patients’ survival and developed a model using the 3 proteins for predicting their OS. A training set was used to develop the model, and the model was validated with a testing set. The AUC for the 3-protein signature in the testing set was 0.655 and in the combined cohort was 0.699, indicating great performance of the 3-protein signature in the OS prediction of HNSCC patients. In addition, the 3-protein signature exhibited better predictive value for survival of HNSCC patients as compared with conventional clinical factors (age, sex, smoking history, and TNM stage).
HNSCC is a relatively common malignancy and is very common in certain parts of the world . Although there have been many advances in understanding of the molecular biology of HNSCC [1, 4, 7–9], as well as treatment options, the mortality of patients with HNSCC remains high. As such, there is a need for the development of novel markers to predict prognosis and help guide treatment.
Bioinformatics studies have screened molecular biomarkers such as mRNA, miRNA, and lncRNA to predict the prognosis for HNSCC patients [11, 12, 23]. Advances in high-throughput proteomics techniques allow the quantitative assessment of large numbers of proteins in multiple specimens. As an antibody-based protein microarray dot-blot platform, the reverse-phase protein array (RPPA) allows a large number of biological samples’ quantitative measurement in protein expression level simultaneously as antibodies with high quality are available [24–26]. Many studies have used the RPPA technique to study protein biomarkers relevant to cancer progression, treatment selection, and prognostic prediction [19, 27].
With major advances in bioinformatics, proteomics, and techniques of gene analysis, a great deal of researchers has contributed themselves to developing signatures using different methods for predicting the prognosis of patients with head and neck cancer. Prognostic signatures have been developed using miRNA [28, 29], alternative splicing , immune function molecules , and a signature according to m6A RNA methylation regulators .
In a study similar to ours, Zhao et al.  reported a 5-protein signature for predicting HNSCC prognosis. Notably, the OS was much worse in patients with high-risk scores than that in those with low-risk scores in the subgroups of male sex, tumor grade 1-2, years, and disease Stages III-IV. OS differences were not significant in patients in the subgroups of female sex, years, tumor grade 3-4, and disease Stages I-II. In other notable research, Jin et al.  reported that p53-targeted lncRNA-p21 serves as a tumor suppressor through suppressing JAK2/STAT3 signaling pathways in HNSCC. Zhang et al.  developed a model using 5 genes as a novel signature for the prognosis prediction of people with laryngeal cancer (KLHDC7B, MMP1, DPY19L2P1, HOXB9, and EMP1). The ROC curve analysis suggested good effect of the 5-gene signature on predicting laryngeal cancer prognosis (, ). Guo et al.  reported a 6-mRNA (ZNF324B, YIPF4, TMC8, PDGFA, PCMT1, and FRMD5) signature model for determining HNSCC prognosis. The AUC of the model for predicting OS was 0.745 (). Wang et al.  recently reported that 3 microRNAs (has-miR-1911, has-miR-499a, and has-miR-99a) were independent risk factors significantly related to patients with head and neck cancer in survival (all, ). In addition, GO and KEGG analyses presented the association of cancer prognosis with the JAK STAT signaling pathway and certain metabolic pathways. In a unique study, You et al.  used cDNA microarrays and bioinformatics methods to study radio-resistance in head and neck carcinoma and identified 4 key functional pathways and molecular markers that greatly promoted radio-resistance. A recent report by Ribeiro et al.  studied tumor specimens of 40 patients with HNSCC undergoing tumor resection, and tumor-adjacent tissues from 32 of the patients. The authors identified a proteomic signature based on 3 proteins (DHB12, HMGB3, and COBA1) and developed a model that included the 3 proteins and tumor stage that exhibited >80% predictive accuracy for the development of metastasis and recurrence.
This study’s primary demerit is that the analysis was based on information contained in large databases. While this method provides important information and we were able to develop a protein signature predictive of the OS of patients with HNSCC, clinical validation of the results was not performed. Clinical validation of the results was not part of the research design and hence was not performed. While the results are compelling, they need to be verified through clinical study of HNSCC patients.
In this report, we developed a 3-protein signature to predict HNSCC patients’ survival. The AUC for the 3-protein signature in the testing set was 0.655 and in the combined cohort was 0.699, indicating the favorable role of the 3-protein signature in HNSCC patients’ OS prediction. In addition, the 3-protein signature exhibits better predictive value for survival of HNSCC patients as compared with conventional clinical factors like gender, smoking history, age, and TNM stage. These results add relevant information to the medical literature to help guide the management of patients with HNSCC.
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there is no conflict of interest.
Xiaoting Chen and Kaiyi Wong contributed equally to this work.
This study was supported by the National Science Foundation of China, Molecular Mechanism of SOX1 Regulating Maintenance and Self-Renewal of Nasopharyngeal Carcinoma Stem Cells (Grant No. 81572648).
C. Fitzmaurice, C. Allen, R. M. Barber et al., “Global, regional, and national cancer incidence, mortality, years of life lost, years lived with disability, and disability-adjusted life-years for 32 cancer groups, 1990 to 2015: a systematic analysis for the global burden of disease study,” JAMA Oncology, vol. 3, no. 4, pp. 524–548, 2017.View at: Publisher Site | Google Scholar
F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, and A. Jemal, “Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” CA: a Cancer Journal for Clinicians, vol. 68, no. 6, pp. 394–424, 2018.View at: Publisher Site | Google Scholar
K. Yang, S. Zhang, D. Zhang et al., “Identification of SERPINE1, PLAU and ACTA1 as biomarkers of head and neck squamous cell carcinoma based on integrated bioinformatics analysis,” International Journal of Clinical Oncology, vol. 24, no. 9, pp. 1030–1041, 2019.View at: Publisher Site | Google Scholar
G. Zhang, E. Fan, G. Yue et al., “Five genes as a novel signature for predicting the prognosis of patients with laryngeal cancer,” Journal of Cellular Biochemistry, 2019.View at: Google Scholar
X. Zhao and L. Cui, “Development and validation of a m6A RNA methylation regulators-based signature for predicting the prognosis of head and neck squamous cell carcinoma,” American Journal of Cancer Research, vol. 9, no. 10, pp. 2156–2169, 2019.View at: Google Scholar
X. Wang, Z. Yin, Y. Zhao, M. He, C. Dong, and M. Zhong, “Identifying potential prognostic biomarkers in head and neck cancer based on the analysis of microRNA expression profiles in TCGA database,” Molecular Medicine Reports, vol. 21, no. 3, pp. 1647–1657, 2020.View at: Publisher Site | Google Scholar