Objective. Currently, clinical classification of osteosarcoma cannot accurately predict the survival outcomes and responses to chemo- or immunotherapies. Our goal was to classify osteosarcoma patients into clinical/biological subtypes based on EMT molecules. Methods. This study retrospectively curated the RNA expression profiling of osteosarcoma patients from the TARGET and GSE21257 cohorts. Consensus clustering analyses were conducted in accordance with the expression profiling of prognostic EMT genes derived from univariate analyses. Immunological features were evaluated through immune cell infiltration, immune checkpoint expression, and activity of cancer immunity cycle. Drug sensitivity was estimated with the GDSC database. WGCNA approach was adopted to determine the EMT-derived genes. Following univariate analyses, a multivariate cox regression model was developed and externally verified. Predictive independency was evaluated with uni- and multivariate analyses. GSEA was presented to uncover relevant molecular mechanisms. Results. Prognostic EMT genes across osteosarcoma patients were stratified into distinct subtypes, namely, subtypes A and B. Patients in subtype B presented desirable prognosis, high immune activation, and enhanced sensitivity to cisplatin. Meanwhile, patients in subtype A were more sensitive to gemcitabine. In total, 86 EMT-derived hub genes were determined, and an EMT score was conducted for osteosarcoma prognosis. Following external verification, this EMT score was reliably and independently predictive of patients’ survival outcomes. Additionally, it was positively linked to steroid biosynthesis. Conclusion. Overall, our findings proposed EMT-relevant molecular subtypes and signatures for predicting prognosis and therapeutic responses, contributing to personalized treatment and clinical implication for osteosarcoma.

1. Introduction

Osteosarcoma represents the most prevalent primary bone sarcomas, which originates from mesenchymal stem cell populations, with an annual incidence of 3-5 cases per million individuals [13]. It most often affects children, adolescents, and young adults, with aberrant bone growth areas [4]. The difficulty in biology research is in relation to the complexity of the osteosarcoma genome as well as remarkable biological discrepancy among osteosarcoma subtypes [5]. Hence, an in-depth understanding of osteosarcoma biology highlights the heterogeneity as well as uncovers the molecular abnormality that defines patient subpopulations. The standard treatment of osteosarcoma comprises surgical resection and chemotherapy [6]. Patients are usually resistant to conventional chemotherapy; meanwhile, high-dose chemotherapeutic agents cause severe side effects [7]. Hence, to improve the survival duration of osteosarcoma patients has been proven to be challenging.

Epithelial-mesenchymal transition (EMT) is a key process by which epithelial cells acquire mesenchymal characteristics [8]. Growing evidences demonstrate that EMT exerts a crucial role in stemness [9], metabolic reprogramming [10], immune evasion [11], and therapeutic resistance [12] for cancer cells. Osteosarcoma cells that have experienced EMT process will lose the cellular polarity and acquire aggressive and metastatic features. This process has been regarded as a crucial event for osteosarcoma metastases [13]. Previously, Yiqi et al. proposed an EMT-relevant model of osteosarcoma as a prognostic indicator through integrated multicohorts [14], indicative of the crucial prognostic implication of EMT during osteosarcoma progression. Limited evidences demonstrate that chemotherapy (cisplatin, etc.) resistance contributes to EMT activation in osteosarcoma cells [15, 16]. Recent research has presented that EMT genes are remarkably linked to immunity in osteosarcoma [17]. Tumor-associated macrophages trigger EMT in osteosarcoma through activation of COX-2/STAT3 signaling [18]. Thus, extensive research of EMT on clinical outcomes and therapeutic responses of osteosarcoma is urgently required. In our research, integrated genomic analyses were conducted for characterizing distinct EMT-relevant subtypes with prognostic and prediction implications in osteosarcoma.

2. Materials and Methods

2.1. Acquisition of Gene Expression Profiling

This study acquired the TARGET data matrix (https://ocg.cancer.gov/programs/target/data-matrix) comprising gene expression data and clinical data of 84 osteosarcoma patients, as the discovery cohort. Genome-wide gene expression profiling and prognostic information of 53 osteosarcoma patients were curated from the GSE21257 cohort in the Gene Expression Omnibus (GEO) repository (https://www.ncbi.nlm.nih.gov/gds/) [19], as the testing cohort. This cohort was on the basis of the platform of GPL10295 Illumina human-6 v2.0 expression beadchip. The gene expression series matrix files were directly retrieved, and the probe IDs were mapped to the gene symbols in accordance with the matched annotation files. Thereafter, expression values of multiple probes mapping to the same gene were averaged while probes mapping to multiple genes were eliminated. The gene set of EMT was curated from the Molecular Signatures Database (MSigDB), which was listed in Supplementary Table 1 [20].

2.2. Identification of Molecular Subtypes

Univariate analyses were presented for assessment of the association of EMT genes with osteosarcoma prognosis. Genes with were determined for the subsequent analyses. -means-based consensus clustering was carried out utilizing the ConsensusClusterPlus package on the basis of the transcriptome files of prognostic EMT genes [21]. Cumulative distribution function (CDF), delta area, and consensus matrix diagrams were conducted in accordance with default parameters. The number of clusters was determined in accordance with the following criteria: (i) the consistency within the clusters was relatively high; (ii) the coefficients of variation were relatively low; (iii) the area under the CDF curve was not remarkably elevated. The area under the CDF curves was utilized for defining the clustering number. Principal component analyses (PCA) were conducted for verifying the accuracy of this classification.

2.3. Gene Set Variation Analyses (GSVA)

Fifty hallmark pathways were acquired from the MSigDB, and the activity of above pathways was estimated with GSVA package in osteosarcoma specimens [22]. Limma package was adopted to compare the discrepancy in activity of hallmark pathways between subtypes [23].

2.4. Analyses of Immunological Features

The infiltration of tumor-infiltrating immune cells was estimated across osteosarcoma tissues utilizing the single-sample gene set enrichment analyses (ssGSEA) [24] and TIMER2 database [25]. The RNA expression of immune checkpoint molecules and HLA molecules was determined across osteosarcoma specimens. Cancer immunity cycle comprises release of cancer cell antigens (step 1), cancer antigen presentation (step 2), priming and activation (step 3), trafficking of immune cells to tumors (step 4), infiltration of immune cells into tumors (step 5), recognition of cancer cells by T cells (step 6), and killing of cancer cells (step 7) [26]. The activity of all steps was determined with ssGSEA approach in accordance with the transcriptome profiling [27]. The gene set of each step within the cancer immunity cycle was listed in Supplementary Table 2.

2.5. Estimation of Drug Sensitivity

Through adopting the pRRophetic algorithm [28], the half-maximal inhibitory concentration (IC50) values of cisplatin and gemcitabine were determined in osteosarcoma in accordance with the Genomics of Drug Sensitivity in Cancer (GDSC; http://www.cancerrxgene.org/) [29].

2.6. Weighted Gene Coexpression Network Analyses (WGNCA)

The data matrix of mRNA expression profiling in the TARGET cohort was analyzed utilizing the WGCNA package [30]. The first 25% genes with the largest variance were determined. For selecting a standard scale-free network, the sample hierarchical clustering approach was utilized for detecting and removing outlier specimens, followed by selection of the appropriate soft thresholding analyses. Thereafter, the adjacency matrix and topological overlap matrix (TOM) were established as well as the matched dissimilarity (1-TOM) was determined. Dynamic tree cutting methods were utilized for completing the gene tree and module clustering. Thereafter, the module characteristic genes were clustered as well as the highly similar modules were merged. The dissimilarity of the module eigengenes (ME) was determined with the moduleEigengenes function. The interactions of ME with EMT subtypes were analyzed with Pearson’s correlation. Module membership (MM) represents the relevance of the expression profile to ME, while gene significance (GS) represents the correlation between the expression profiles and clinical traits. In this study, MM was calculated with Pearson’s correlation of the expression profiling and ME. Meanwhile, GS was measured for evaluating the genes with EMT subtypes. Thereafter, the correlation between MM and gene signature was evaluated, and EMT-derived genes were determined.

2.7. Analysis of Hub Genes

Protein–protein interactions (PPIs) of genes in the lightcyan module were assessed through the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) online tool (http://string-db.org/) [31]. Hub genes in this PPI network were determined utilizing the Molecular Complex Detection (MCODE), a plug-in of Cytoscape software [32].

2.8. Functional Enrichment Analyses

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were presented utilizing the clusterProfiler package [33]. The parameter utilized in this package was default as well as the threshold for identifying the GO functions and KEGG pathways was set as value < 0.05.

2.9. Gene Set Enrichment Analyses (GSEA)

For uncovering the difference in functional phenotypes between high- and low-risk subpopulations, GSEA was carried out. The “c5.all.v7.0.symbols.gmt” was utilized as a reference gene set. GSEA was implemented utilizing GSEA software. A nominal was regarded as a significant enrichment.

2.10. Statistical Analyses

Statistical analyses were implemented utilizing R software (version 4.0.2). Kaplan-Meier survival curves were drawn among osteosarcoma cases, and survival difference was determined with log-rank test. Receiver operator characteristic (ROC) curves were presented with timeROC package. Comparison between two subgroups was conducted with student’s or Wilcoxon test. Two-sided value < 0.05 was indicative of statistical significance.

3. Results

3.1. Characterization of Two EMT Molecular Subtypes in Osteosarcoma

Figure 1 illustrates the flowchart of this study. We curated 200 EMT genes from the MSigDB and analyzed their prognostic implication in osteosarcoma through univariate cox regression analyses. As a result, 40 EMT genes were remarkably linked to osteosarcoma prognosis (Figure 2(a); Table 1), in which 28 served as protective factors of osteosarcoma prognosis while the others served as risk factors. The consensus clustering of prognostic EMT genes was conducted to determine distinct EMT subtypes for osteosarcoma. For two categories, the area under the CDF curves began to stabilize (Figures 2(b) and 2(c)). Meanwhile, the consensus matrix was conducted for identifying the optimal number of subtypes. In Figure 2(d), the consensus matrix displayed a well-defined block structure when . Overall, two EMT molecular subtypes were conducted, namely, subtypes A and B. PCA results also fit into two clusters (Figure 2(e)). Survival analyses showed that subtype A presented poorer prognosis in comparison to subtype B (Figure 2(f)), indicative of the discrepancy in survival outcomes between subtypes.

3.2. Immunological Features and Drug Sensitivity of Two EMT Molecular Subtypes

The activity of the 50 hallmark pathways was estimated across osteosarcoma tissues utilizing the GSVA algorithm. We noted that the remarkable activation of stromal pathways (EMT, Wnt β-catenin signaling, etc.), immune pathways (IL6 JAK STAT3 signaling, etc.), and metabolism pathways (bile acid metabolism, heme metabolism, xenobiotic metabolism, etc.) in EMT subtype B than subtype A (Figure 3(a) and Supplementary Table 3). In Figure 3(b), ssGSEA approach showed that EMT subtype B presented remarkably enhanced immune cell infiltrations containing central memory CD4 and CD8 T cells, effector memory CD4 T cells, memory B cells, regulatory T cells, type 1 and 2 helper cells, CD55bright natural killer cells, macrophages, MDSCs, natural killer cells, and natural killer T cells than subtype A. Meanwhile, TIMER2 approach showed the lower infiltrations of B cells and the higher infiltration of CD4 T cells compared with subtype A (Supplementary Figure 1). Additionally, we noted that immune checkpoint molecules ADORA2A, CD44, PDCD1LG2, TNFRSF14, and TNFRSF8 were prominently activated in subtype B than A (Figure 3(c)). Nevertheless, we did not investigate the significant discrepancy in HLA molecule expression between subtypes (Figure 3(d)). The activity of cancer immunity cycle was also estimated in each osteosarcoma specimen. Compared with subtype A, release of cancer cell antigen, B cell/monocyte/Th17 cell recruiting, recognition of tumor cells through T cells, and killer of tumor cells presented remarkably enhanced activity in subtype B (Figure 3(e)). Overall, EMT subtype B possessed the features of immune activation. Also, subtype B displayed increased sensitivity to cisplatin while subtype A was more sensitive to gemcitabine (Figure 3(f)).

3.3. Establishment of EMT Subtype-Relevant Coexpression Modules

Considering the sensitivity of WGCNA to batch effects, this study firstly preprocessed the data of osteosarcoma individuals in EMT molecular subtypes A and B from the TARGET cohort. The first 25% genes with the largest variance were determined for subsequent analyses. Thereafter, the hclust function was adopted for confirming the batch effect removal as well as investigating whether there was any outlier. In Figure 4(a), no outlier sample was found. Because of the premise of WGCNA approach required to hypothesize that the network met the scale-free criteria, this study further screened the optimal soft thresholding value for making a coexpression network was more subject to a scale-free network. Through calculation of the scale-free topology fitting index, value was up to 0.85 (Figure 4(b)), which validated the feasibility of WGCNA. A hierarchical clustering tree was retrieved through adopting hierarchical clustering analyses. Thereafter, in accordance with the dynamic tree cut approach, 29 distinct coexpression modules were assigned (Figure 4(c)). Figure 4(d) demonstrated that the lightcyan module presented the strongest correlation to EMT molecular subtypes. The genes in this module deserved more exploration.

3.4. Exploration of EMT-Derived Genes and Their Relevant Biological Implication

Our further analyses uncovered that the genes in the lightcyan module were remarkably linked to two EMT molecular subtypes (Figures 5(a) and 5(b)), indicative of the genes in the lightcyan module as EMT-derived genes. Through MCODE analyses, we determined 86 EMT-derived hub genes, as depicted in Figure 5(c). Their relevant biological implication was explored in-depth. In Figure 5(d), the EMT-derived hub genes were remarkably linked to endoplasmic reticulum. Additionally, they were significantly enriched in ribosome, oxidative phosphorylation, chemical carcinogenesis-reactive oxygen species, thermogenesis, and HIF-1 signaling pathway (Figure 5(e)).

3.5. Development of an EMT-Derived Prognostic Model for Osteosarcoma

Our univariate cox regression analyses were indicative that 33 EMT-derived genes displayed significant interactions with osteosarcoma prognosis (Table 2). Above genes were input the multivariate cox regression model. In accordance with the expression and coefficient of candidate genes (Table 3), we developed an EMT-derived prognostic model for osteosarcoma, comprising RPS9, RPS23, EIF4A1, RPL12, RPL36, RPL37A, RPL34, EEF1B2, RPS8, RPS28, RPL10, RPS24, RPL35A, RPL11, RPL21, RPS27A, RPS12, and RPL13A. We noted the remarkable discrepancy in their expressions in high- than low-risk subpopulations (Figure 6(a)). Figure 6(b) displayed the EMT score distribution across osteosarcoma patients. As EMT score elevated, dead cases were gradually increased (Figure 6(c)). Survival analyses demonstrated that high-risk cases were indicative of more unfavorable clinical outcomes in comparison to low-risk cases (Figure 6(d)). ROC curves were presented to assess the predictive performance of EMT score in osteosarcoma prognosis. In Figure 6(e), area under the curves (AUCs) at one-, three-, and five-year survival were separately 0.723, 0.858, and 0.860, proving that EMT score was capable of prediction of osteosarcoma prognosis. Compared with previously published two EMT models from Yiqi et al. [14] and Peng et al. [17], our EMT-derived prognostic model had higher C-index (0.828), indicating the prediction superiority of this model (Figure 6(f)).

3.6. External Verification of the Prognostic Value of EMT Score

The prognostic value of EMT score was further verified in the GSE21257 cohort. Figure 7(a) depicted the expression patterns of each gene from EMT score across osteosarcoma specimens. We also visualized the distribution of EMT score in the GSE21257 cohort (Figure 7(b)). In accordance with the median value of EMT score, we stratified osteosarcoma individuals into high- as well as low-risk subpopulations. Additionally, we noted that there were relatively increased dead cases in high-risk subgroup (Figure 7(c)). As expected, high EMT score was remarkably linked to more unfavorable survival outcomes (Figure 7(d)). The AUC at five-year survival was 0.818, proving the excellent predictive performance of EMT score (Figure 7(e)).

3.7. Genes in EMT Score as Prognostic Indicators of Osteosarcoma

Further analyses were conducted for assessing the survival implication of each gene in EMT score in the TARGET cohort. As a result, high expression of EEF1B2, EIF4A1, RPL10, RPL11, RPL12, RPL13A, RPL21, RPL34, RPL35A, RPL36, RPL37A, RPS8, RPS9, RPS12, RPS23, RPS24, RPS27A, and RPS28 was prominently linked to more undesirable survival outcomes in comparison to their low expression (Figures 8(a)8(r)).

3.8. EMT Score Is Independent of Clinicopathological Indicators for Osteosarcoma Prognosis

In the TARGET cohort, our univariate cox regression analyses displayed that this EMT score was prominently linked to osteosarcoma survival outcomes (Figure 9(a)). Further multivariate analyses demonstrated that the EMT score was independently predictive of patients’ prognosis (Figure 9(b)). For uncovering the underlying biological phenotypes between subpopulations, this research carried out GSEA. As a result, steroid biosynthesis was remarkably activated in high-risk subpopulation (Figure 9(c)), and hypertrophic cardiomyopathy displayed prominent activation in low-risk subpopulation (Figure 9(d)).

4. Discussion

Osteosarcoma presents diverse clinical courses and biological heterogeneity [34]. Hence, to reliably predict prognosis and therapeutic responses, it is critical for comprehensively investigating the molecular mechanisms. Our study determined forty prognostic EMT genes in osteosarcoma. Through consensus clustering approach, two EMT molecular subtypes were conducted. Especially, EMT subtype A presented poorer prognosis in comparison to subtype B, indicating that there was a remarkable discrepancy in survival outcomes between subtypes. Recently, immunotherapy is a promising treatment option against osteosarcoma, and it is crucial to improve the comprehending about the immune responses [3]. We evaluated the immunological features from multiple perspectives. GSVA demonstrated the activation of immune pathways (like IL6 JAK STAT3 signaling) in subtype B. Our ssGSEA revealed that subtype B displayed remarkably enhanced immune cell infiltrations containing central memory CD4 and CD8 T cells, effector memory CD4 T cells, memory B cells, regulatory T cells, type 1 and 2 helper cells, CD55bright natural killer cells, macrophages, MDSCs, natural killer cells, and natural killer T cells. In addition, subtype B had the features of increased expression of immune checkpoint molecules ADORA2A, CD44, PDCD1LG2, TNFRSF14, and TNFRSF8. Among steps within cancer immunity cycle, release of cancer cell antigen, B cell/monocyte/Th17 cell recruiting, recognition of tumor cells through T cells, and killer of tumor cells presented remarkably enhanced activity in subtype B. Hence, EMT subtype B was characterized by immune activation. Drug resistance severely hinders the improvement of survival rate of osteosarcoma patients [35]. Cisplatin, an alkylating drug, can form irreversible covalent bonds with DNA, which causes DNA strands to cross-link and break and missense mutation [36]. It is widely applied for osteosarcoma chemotherapy as well as cisplatin resistance is frequent across osteosarcoma individuals [37]. Our data demonstrated that EMT subtype B presented enhanced sensitivity to cisplatin. Gemcitabine represents the second cytidine analog developed following cytosine arabinoside [38]. Intriguingly, subtype A was more sensitive to gemcitabine.

Through WGCNA approach, we constructed 29 EMT subtype-relevant coexpression modules in the TARGET cohort. Especially, lightcyan module presented the strongest interactions with EMT molecular subtypes. Further, MCODE analyses determined 86 EMT-derived hub genes. Functional enrichment analyses unraveled the remarkable interactions of these EMT-derived hub genes with endoplasmic reticulum (ER), ribosome, oxidative phosphorylation, chemical carcinogenesis-reactive oxygen species, thermogenesis, and HIF-1 signaling pathway. For instance, the ER represents the central intracellular organelle of diverse cellular functions and endoplasmic reticulum stress response participates in osteosarcoma pathogenesis [39]. SENP1/HIF-1alpha regulates hypoxia-mediated EMT in osteosarcoma cells [40].

This study conducted an EMT-derived prognostic model for osteosarcoma in the TARGET cohort, comprising EEF1B2, EIF4A1, RPL10, RPL11, RPL12, RPL13A, RPL21, RPL34, RPL35A, RPL36, RPL37A, RPS8, RPS9, RPS12, RPS23, RPS24, RPS27A, and RPS28. ROC curves proved that the EMT score was capable of accurately predicting osteosarcoma individuals’ clinical outcomes. Additionally, external verification proved the feasibility of this model in the GSE21257 cohort. Each gene in the EMT score was linked to undesirable prognosis of osteosarcoma individuals. Previously, mTOR inhibitor blunts the p53 response to nucleolar stress through modulating RPL11 and MDM2 expressions [41]. RPL34 upregulation is indicative of undesirable clinical outcomes in osteosarcoma as well as silencing RPL34 weakens osteosarcoma proliferation [42]. Suppressing RPS9 blunts osteosarcoma cell growth via inactivating MAPK signaling [43]. Nevertheless, the functions of most genes in osteosarcoma progression are lack of experimental evidences. Multivariate analyses suggested the independency of the EMT score in prediction of osteosarcoma outcomes. Further, GSEA results presented that steroid biosynthesis was remarkably activated in high-risk osteosarcoma individuals, indicating that steroid biosynthesis might affect osteosarcoma progression.

A few limitations of our study should be taken into consideration. First, further analyses are required for elucidating the molecular mechanisms underlying the impact of genes in the EMT-derived model on osteosarcoma through in-depth experiments. Second, more osteosarcoma patients can produce more convincing and accurate findings. Hence, abundant specimens will be included for improving our conclusions as well as more reliably illustrating the underlying mechanisms by which genes in the EMT-derived model affect osteosarcoma progression.

5. Conclusion

Collectively, this study adopted consensus clustering analyses for determining two EMT molecular subtypes in the TARGET cohort as well as uncovered the discrepancy in survival outcomes, immunological features, and drug sensitivity between subtypes. Through system biology-based WGCNA approach, we determined EMT-derived genes and conducted an EMT score for predicting osteosarcoma prognosis. To our knowledge, we firstly characterized the EMT-relevant subtypes for osteosarcoma. Additionally, the EMT score we proposed was externally verified in the external cohort. Overall, our findings might contribute to personalized treatment and be of much clinical implication for osteosarcoma.


EMT:Epithelial-mesenchymal transition
GEO:Gene Expression Omnibus
MSigDB:Molecular Signatures Database
CDF:Cumulative distribution function
PCA:Principal component analyses
GSVA:Gene set variation analyses
ssGSEA:Single-sample gene set enrichment analysis
GDSC:Genomics of drug sensitivity in cancer
WGNCA:Weighted gene coexpression network analyses
TOM:Topological overlap matrix
ME:Module eigengenes
MM:Module membership
GS:Gene significance
PPI:Protein–protein interaction
STRING:Search Tool for the Retrieval of Interacting Genes/Proteins
MCODE:Molecular Complex Detection
GO:Gene Ontology
KEGG:Kyoto Encyclopedia of Genes and Genomes
GSEA:Gene set enrichment analyses
ROC:Receiver operator characteristic
AUCs:Area under the curves.

Data Availability

The data used to support the findings of this study are included within the supplementary information files.

Conflicts of Interest

The authors declare no conflicts of interest.

Authors’ Contributions

Yang Zhou and Gai Li contributed equally to this work.


This work was funded by National Natural Science Foundation of China (81760393 and 81401790) and Natural Science Foundation of Jiangxi Province (Grant no. 20202BABL206034).

Supplementary Materials

Supplementary 1. Supplementary Figure 1: the differences in immune cell infiltrations between two EMT subtypes using TIMER2 approach.

Supplementary 2. Supplementary Table 1: the information of EMT genes from the Molecular Signatures Database.

Supplementary 3. Supplementary Table 2: the gene set of each step within the cancer immunity cycle.

Supplementary 4. Supplementary Table 3: comparison of the activity of hallmark pathways in two EMT molecular subtypes.