Abstract

Purpose. The molecular mechanism underlying the tumorigenesis and progression of lung adenocarcinoma (LUAD) in nonsmoking patients remains unclear. This study was conducted to select crucial therapeutic and prognostic biomarkers for nonsmoking patients with LUAD. Methods. Microarray datasets from the Gene Expression Omnibus (GSE32863 and GSE75037) were analyzed for differentially expressed genes (DEGs). Gene Ontology (GO) enrichment analysis of DEGs was performed, and protein-protein interaction network was then constructed using the Search Tool for the Retrieval of Interacting Genes and Cytoscape. Hub genes were then identified by the rank of degree. Overall survival (OS) analyses of hub genes were performed among nonsmoking patients with LUAD in Kaplan-Meier plotter. The Cancer Genome Atlas (TCGA) and The Human Protein Atlas (THPA) databases were applied to verify hub genes. In addition, we performed Gene Set Enrichment Analysis (GSEA) of hub genes. Results. We identified 1283 DEGs, including 743 downregulated and 540 upregulated genes. GO enrichment analyses showed that DEGs were significantly enriched in collagen-containing extracellular matrix and extracellular matrix organization. Moreover, 19 hub genes were identified, and 12 hub genes were closely associated with OS. Although no obvious difference was detected in ITGB1, the downregulation of UBB and upregulation of RAC1 were observed in LUAD tissues of nonsmoking patients. Immunohistochemistry in THPA database confirmed that UBB and ITGB1 were downregulated, while RAC1 was upregulated in LUAD. GSEA suggested that ribosome, B cell receptor signaling pathway, and cell cycle were associated with UBB, RAC1, and ITGB1 expression, respectively. Conclusions. Our study provides insights into the underlying molecular mechanisms of the carcinogenesis and progression of LUAD in nonsmoking patients and demonstrated UBB, RAC1, and ITGB1 as therapeutic and prognostic indicators for nonsmoking LUAD. This is the first study to report the crucial role of UBB in nonsmoking LUAD.

1. Introduction

Lung cancer (LC) is the most prevalent cancer type, with approximately 228,820 cancer cases and 135,720 death cases in 2020 worldwide, thus causing considerable socioeconomic burdens [1]. Patients with non-small-cell lung cancer (NSCLC) constitute approximately 85% of the total LC cases. Lung adenocarcinoma (LUAD) is the most common histological type of NSCLC [2]. It is widely known that tobacco smoking is a crucial risk factor for LC; however, approximately 20% of LUAD cases occur among nonsmoking patients [3]. Patients with smoking-related LUAD have some altered genes such as KRAS and STK11. Moreover, these known cancer-related genes may alter simultaneously, leading to a larger tumor mutational burden (TMB) [4, 5]. However, the molecular mechanism underlying the carcinogenesis and progression of nonsmoking-related LUAD remains unclear. In addition, nonsmoking patients are easily ignored as they are not exposed to smoking [6], leading to a high rate of missed diagnosis at early stages. Hence, it is extremely necessary to identify key indicators in the carcinogenesis and development of nonsmoking-related LUAD.

In the past two decades, gene chip technologies and bioinformatics analyses have made great progress in screening genetic alterations at the genome level [7]. These technologies are adopted to find differentially expressed genes (DEGs) that play crucial roles in the occurrence and adverse progression of nonsmoking-related LUAD. However, false-positive rates of independent microarray studies probably weaken the reliability of outcomes [8]. Thus, we selected two microarray datasets on the same platform from the Gene Expression Omnibus (GEO) to acquire DEGs between LUAD tissues of nonsmoking patients and matched adjacent lung tissue samples. Gene Ontology (GO) enrichment analyses of DEGs were then performed, and a protein-protein interaction (PPI) network was constructed for a better understanding of the molecular mechanism underlying tumorigenesis and invasion of nonsmoking-related LUAD. Hub genes were then identified from the PPI network, which are candidates for therapeutic and prognostic biomarkers for nonsmoking-related LUAD. Subsequently, the overall survival (OS) analysis of hub genes was performed. Finally, we validated the findings using The Cancer Genome Atlas (TCGA), The Human Protein Atlas (THPA) database, and Gene Set Enrichment Analysis (GSEA).

2. Materials

2.1. Microarray Data

GEO (http://www.ncbi.nlm.nih.gov/geo) [9] is a public functional genomics data repository of high-throughput gene expression data, chip, and microarray. All microarray datasets were selected only if they met the following criteria: (1) the topic was on nonsmoking-related LUAD and matched adjacent lung tissues; (2) the platform was GPL6884 Illumina HumanWG-6 v3.0 expression BeadChip; (3) the organism was Homo sapiens; (4) the size of matched adjacent lung tissue sample was more than 3; and (5) the last update time was in 2019. In this study, two gene expression profiles (GSE32863 [10] and GSE75037 [11]) from GEO were selected, which met these criteria. GSE32863 included 29 nonsmoking LUAD samples and 30 matched adjacent lung samples, whereas GSE75037 contained 30 nonsmoking LUAD samples and 30 matched adjacent lung tissue samples.

2.2. Identification of Differentially Expressed Genes

The DEGs between nonsmoking LUAD and adjacent lung tissues were obtained using GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r) [12]. As an interactive web tool, GEO2R allows users to compare at least two datasets in one GEO series for selecting DEGs across experimental conditions. Both adjusted values (adj.) and Benjamini-Hochberg’s false discovery rates were adopted to balance the finding of a statistically significant gene and false-positive limitation. Probe sets with no corresponding gene symbols were deleted, whereas genes with at least one probe set were averaged. The cutoff criteria were set as adj. and .

2.3. Gene Ontology Enrichment Analysis of Differentially Expressed Genes

GO is an important bioinformatics tool for annotating genes and analyzing the biological process of genes [13]. GO enrichment analyses consisted of 3 terms: biological processes (BP), cellular component (CC), and molecular function (MF). All these were performed using the clusterProfiler and GOplot packages in the R software to analyze the functions and signaling pathways of DEGs [14]. value of < 0.05 was regarded statistically significant.

2.4. Protein-Protein Interaction Network and Significant Module Construction

The PPI network of DEGs was constructed in the Search Tool for the Retrieval of Interacting Genes (http://string-db.org, version 11.0) [15], and the interaction with a combined score of >0.90 was considered statistically significant. The analysis of the function of PPI provided insights into the mechanism of the development of the disease. Cytoscape (version 3.7.2), a bioinformatics software, was adopted to construct visual networks of molecular interactions [16]. The plug-in Molecular Complex Detection (MCODE) (version 1.4.2) of Cytoscape, a crucial application, was adopted to find closely correlated modules from the PPI network [17]. Genes in significant modules were graphically shown through MCODE plug-in. The selection criteria were set as follows: , , , , and .

2.5. Hub Gene Selection and Analysis

The cytoHubba plug-in of Cytoscape was used to calculate the degree rank of hub genes, and hub genes with degrees greater than 20 were selected. The Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) is a crucial database to understand high-level functions and biological systems from large-scale molecular datasets generated by high-throughput experimental technology [18]. To explore their biological function, the BP and KEGG pathway of these hub genes were analyzed and visualized through the ClueGO plug-in [19]. Subsequently, a heat map of hub genes was created and visualized in TCGA database (https://portal.gdc.cancer.gov/). Then, the Kaplan-Meier plotter was used to perform the survival analysis of these hub genes for understanding their prognostic roles (http://kmplot.com/analysis/) [20]. Moreover, TCGA database is an external and relatively authoritative database, which was used for the verification of the difference in the expression levels of hub genes between LUAD and normal lung tissues.

2.6. Gene Set Enrichment Analysis

AS a computing approach, GSEA can identify whether previously defined gene sets were statistically significant and concordantly different between the two biological states [21]. Nonsmoking-related LUAD samples were categorized into two groups (high and low expression) by the median expression levels of hub genes. The timing of their expression on many gene sets was then explored to find related KEGG pathways using the molecular signatures database (MSigDB) (c2.cp.kegg.all.v7.1.symbols.gmt) [22]. The number of permutations was set as 1000 times in every analysis. , NOM value < 0.05, and FDR value < 0.25 were considered statistically significant.

3. Results

3.1. Identification of Differentially Expressed Genes in Nonsmoking Lung Adenocarcinoma

The volcano plots illustrated the selection process for DEGs in GSE32863 (Figure 1(a)) and GSE75037 (Figure 1(b)). After normalization of microarray outcomes, we identified DEGs in nonsmoking LUAD and adjacent lung tissues. In addition, the Venn diagram in Figure 1(c) shows that the overlap between both datasets included 1283 DEGs, including 743 downregulated and 540 upregulated genes.

3.2. Gene Ontology Enrichment Analysis of Differentially Expressed Genes

GO enrichment analyses of downregulated DEGs revealed that collagen-containing extracellular matrix (ECM) was considerably enriched in BP, extracellular structure organization in CC, and ECM structural constituent in MF (Figure 2(a)). Moreover, GO enrichment analyses of upregulated DEGs suggested that the ECM organization was primarily enriched in BP, apical plasma membrane in CC, and cell adhesion module binding in MF (Figure 2(b)).

3.3. Protein-Protein Interaction Network and Significant Module Construction

The PPI network of these DEGs is illustrated in Figure 3(a), which consisted of 607 nodes and 1841 edges. The most significant module was then detected through MCODE plug-in. Moreover, this module consisted of 48 nodes and 247 edges, as shown in Figure 3(b), wherein upregulated genes are marked in red and downregulated genes in blue.

3.4. Hub Gene Selection and Analysis

A total of 19 DEGs with were selected as hub genes. Table 1 shows the gene symbol, degree, full name, and function of these hub genes. BP (Figure 4(a)) and KEGG analysis (Figure 4(b)) of these hub genes are clearly shown in figures. Moreover, a heat map demonstrated the upregulation or downregulation of 19 hub genes in nonsmoking LUAD samples using TCGA dataset (Figure 4(c)). In addition, Figure 5 reveals that UBB, RAC1, ITGB1, CDC20, EGFR, UBE2C, TIMP1, P4H8, and MMP9 are negatively correlated with OS in nonsmoking-related LUAD, whereas CXCL12, GAS6, and FPR1 are positively correlated with OS. Table S1 displays the hazard ratio (HR), 95% confidence interval (CI), and log-rank value of hub genes.

UBB, RAC1, and ITGB1 had the highest degrees among these hub genes, indicating their pivotal roles in the occurrence and progression of nonsmoking-related LUAD. The overexpression of UBB, RAC1, and ITGB1 had worse OS (, , and , respectively), suggesting their potential prognostic implications. Moreover, according to data from TCGA, we found that UBB was significantly downregulated (Figure 6(a)), and RAC1 was upregulated (Figure 6(b)) in nonsmoking-related LUAD. Moreover, ITGB1 was not significantly different in nonsmoking LUAD and normal lung tissues (Figure 6(c)); however, further studies on this are warranted. Immunohistochemistry (IHC) in THPA database verified that UBB (Figure 6(d)) and ITGB1 (Figure 6(e)) were downregulated, and RAC1 (Figure 6(f)) was upregulated in LUAD.

3.5. Gene Set Enrichment Analysis

GSEA showed that ribosome was associated with UBB expression (Figure 7(a)). Furthermore, it suggested that ECM receptor interaction, B cell receptor signaling pathway, T cell receptor signaling pathway, toll-like receptor signaling pathway, and focal adhesion were correlated with RAC1 expression (Figures 7(b)7(f)). In addition, cell cycle, spliceosome, DNA replication, RNA degradation, mismatch repair, and pyrimidine metabolism were associated with ITGB1 expression (Figures 7(g)7(m)). The detailed outcomes of the analysis were shown in Table 2.

4. Discussion

LC commonly has a high mortality rate and results in great socioeconomic pressure for patients, families, and countries. Certainly, smoking contributes to the occurrence and development of LC. However, one in five LUAD cases occurs among patients who do not smoke [3]. The alterations of some genes, including EGFR, ERBB2, ALK, ROS1, and RET, are evidently associated with the occurrence and progression of LUAD [23]. However, the underlying molecular mechanism of nonsmoking-related LUAD remains unclear. Therefore, it is essential to identify crucial biomarkers for understanding the molecular mechanism of nonsmoking-related LUAD. Microarray technology is available for finding new biomarkers, which will be the basis of future studies on the potential mechanism of nonsmoking-related LUAD.

We analyzed two microarray datasets to obtain DEGs between nonsmoking LUAD and matched adjacent lung tissues. A total of 1283 DEGs were selected, containing 743 downregulated and 540 upregulated genes. Function enrichment analyses manifested that DEGs were mainly enriched in collagen-containing ECM, ECM organization, and apical plasma membrane. Moreover, 19 hub genes with degrees greater than 20 were selected from the PPI network: UBB, RAC1, ITGB1, SRC, C3, IL6, CDC20, EGFR, UBE2C, TIMP1, GNG11, CXCL12, GAS6, P4HB, CXCR4, FPR1, ADRB2, LYZ, and MMP9. BP enrichment analysis of these hub genes suggested that apoptotic cell clearance, negative regulation of cysteine-type endopeptidase activity involved in apoptotic process, and leukocyte adhesion to vascular endothelial cell were primarily enriched, and the most enriched KEGG pathway was leukocyte transendothelial migration, bladder cancer, and intestinal immune network for IgA production. Among 19 hub genes, 12 hub genes were closely associated with poorer OS in nonsmoking patients with LUAD. From the results of GO enrichment analysis, KEGG pathway analysis, and survival analysis and degree rank, UBB, RAC1, and ITGB1 were believed to be the core genes in the occurrence and development of nonsmoking-related LAUD at the molecular level. Besides, IHC outcomes in THPA database confirmed that UBB and ITGB1 were downregulated, and RAC1 was upregulated in nonsmoking-related LUAD.

This study was the first one to report the key role of UBB in nonsmoking-related LUAD. Ubiquitin B (UBB) is a crucial member of gene families encoding ubiquitin. Ubiquitin is involved in several cellular processes, and aberrant events in ubiquitin-mediated processes promote the carcinogenesis and progression of NSCLC [24]. UBB is strongly suppressed in some cancers, including endometrial carcinoma and ovarian cancer [25]. Tang et al. revealed that ubiquitin was highly expressed in NSCLC tissues; however, increased ubiquitin was attributed to the increased transcripts of ubiquitin C (UBC) rather than UBB. Moreover, they showed that no significant difference was observed in the UBB mRNA level between NSCLC and normal lung tissues () [26]. This finding contradicted our results, and there can be two probable reasons for this difference. First, Tang et al. compared NSCLC and normal lung tissues, whereas we compared nonsmoking-related LUAD and normal lung tissues, thus suggesting higher accuracy of our study. Second, this study compared nonsmoking-related LUAD and matched adjacent lung tissues with similar features instead of unmatched normal lung tissues, revealing higher reliability of our study. However, future studies on the expression level of UBB are warranted to confirm findings. Additionally, according to the survival analysis, UBB may play a crucial prognostic role in nonsmoking-related LUAD.

As one member of the RAS superfamily of small GTP-binding proteins, Rac Family Small GTPase 1, RAC1, can interact with effector proteins, and downstream kinases were activated to regulate multiple cellular processes [27]. In addition, RAC1 improved nuclear factor kappa B activity to regulate cell proliferation and migration in NSCLC [28]. RAC1 is upregulated in various cancers, such as LUAD, breast cancer, and kidney cancer [29], and the overexpression of RAC1 is frequently reported to be associated with worse prognosis [30]. One study suggested that high expression of RAC1 plays a key role in the epithelial-mesenchymal transition and malignant progression in LUAD [31]. In addition, KIF18B promotes cell proliferation and invasion through activating RAC1 and mediating the AKT/mTOR signaling pathway in LUAD, indicating the crucial role of RAC1 in the cell proliferation and adverse progression of LUAD [32]. Similarly, Li et al. showed that intracellular mature interleukin 37 can inhibit tumor metastasis through inhibiting RAC1 activation [33], suggesting a crucial role of RAC1 in the tumor metastasis. In addition, RAC1 may predict the prognosis of nonsmoking patients with LUAD in light of our survival analysis.

Integrin subunit beta 1 (ITGB1) encodes the beta subunit of integrins, which is a heterodimeric cell surface receptor and participates in the carcinogenesis, migration, and invasion of LUAD [34]. ITGB1 is abnormally expressed in several cancers, including LUAD and breast cancer [35]. MicroRNA- (miR-) 134 inhibits the migration and metastasis by targeting ITGB1 in NSCLC [36]. Similarly, high expression of miR-493–5p is correlated with better survival of NSCLC via targeting ITGB1 [37], indicating the crucial role of ITGB1 in the carcinogenesis and worse prognosis of NSCLC. Zheng et al. observed that ITGB1 is a predictive biomarker of NSCLC after matching clinical factors (, 95% CI: 1.10–1.55) [38]. Our survival analysis also confirmed the prognostic value of ITGB1. Furthermore, GSEA revealed that ribosome was correlated with UBB expression; ECM receptor interaction, B cell receptor signaling pathway, and T cell receptor signaling pathway with RAC1 expression; and cell cycle, spliceosome, and DNA replication with ITGB1 expression. From these findings, we concluded that UBB, RAC1, and ITGB1 were therapeutic and prognostic biomarkers of nonsmoking-related LUAD.

Undeniably, this study was not the first one to select pivotal genes and pathways in nonsmoking-related LUAD using bioinformatics methods. In fact, three similar analyses were previously published [3941]. However, previous studies had several differences and/or disadvantages compared with this study. First, two analyses [40, 41] compared NSCLC/LC in nonsmoking females and normal lung tissues rather than nonsmoking LUAD, suggesting lower accuracy of the study. Second, LUAD/NSCLC/LC in nonsmoking females and normal lung tissues were compared in the previous studies [3941], whereas male patients with LUAD were excluded. However, this study has several advantages compared with previous studies: (1) this study included both male and female patients with nonsmoking LUAD, which may have potential benefits for nonsmoking male patients with LUAD; (2) this study compared nonsmoking-related LUAD and matched adjacent lung tissues with similar features, thus having higher reliability; (3) this study revealed some novel findings regarding UBB, UBE2C, GAS6, and P4HB; and (4) some advanced analyses such as GSEA were also performed in this study.

This study also has some limitations. (1) Only two datasets were included. Several similar microarray datasets exist, but they did not meet the selection criteria of our study. Therefore, to decrease the bias, these similar datasets had to be excluded from this study. (2) The included datasets did not reveal detailed information about survival time; therefore, survival analysis of hub genes had to be performed using the Kaplan-Meier plotter. (3) These findings were not verified by performing experiments, which was warranted in future studies.

5. Conclusion

Our study was conducted to select DEGs that may correlate with the carcinogenesis and malignant invasion of nonsmoking-related LUAD. A total of 19 hub genes were selected from the PPI network, and 12 hub genes correlated with the prognosis of nonsmoking-related LUAD. Furthermore, UBB, RAC1, and ITGB1 were potential therapeutic and prognostic indicators of nonsmoking-related LUAD. Moreover, this study is the first to report the key role of UBB in nonsmoking-related LUAD. This study provided evidence for future genomic-based individualized treatments of LUAD in nonsmoking patients. However, future studies are warranted to explore further the biological relationships among these DEGs in nonsmoking-related LUAD.

Abbreviations

LC:Lung cancer
NSCLC:Non-small-cell lung cancer
LUAD:Lung adenocarcinoma
TMB:Tumor mutation burden
DEGs:Differentially expressed genes
GEO:Gene Expression Omnibus
TCGA:The Cancer Genome Atlas
THPA:The Human Protein Atlas
GO:Gene Ontology
KEGG:Kyoto Encyclopedia of Genes and Genomes pathway
PPI:Protein-protein interaction network
adj.:Adjusted values
FC:Fold change
STRING:Search Tool for the Retrieval of Interacting Genes
MCODE:Molecular Complex Detection
KM:Kaplan-Meier plotter
BP:Biological processes
CC:Cellular component
MF:Molecular function
OS:Overall survival
HR:Hazard ratio
CI:Confidence interval
GSEA:Gene Set Enrichment Analysis
miRNA:MicroRNA
OR:Odds ratio.

Data Availability

The datasets downloaded and analyzed during the current study are available on the GEO database: GSE32863, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE32863;GSE75037, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE75037.

Disclosure

The funder had no role in the design or performance of the study; the collection, management, analysis, and interpretation of the data; the preparation, review, and approval of the manuscript; or the decision to submit the manuscript for publication.

Conflicts of Interest

The authors declare that they have no competing interests.

Authors’ Contributions

Huan Deng had full access to all of the data in the manuscript and takes responsibility for the integrity of the data and the accuracy of the data analysis. Concept and design was done by all authors. Acquisition, analysis, and interpretation of data were done by all authors. Huan Deng and Ming Chen contributed to the drafting of the manuscript. Critical revision of the manuscript for important intellectual content was done by all authors. Huan Deng, Yichao Huang, and Li Wang contributed to the statistical analysis. Supervision was done by Ming Chen and Yichao Huang. Revision was done by Huan Deng, Yichao Huang, and Ming Chen. Huan Deng and Yichao Huang are co-first authors of this study, and they contribute to this study equally.

Acknowledgments

The authors gratefully acknowledge the contributions from the GEO, TCGA, and THPA database. This study is supported by the National Natural Science Foundation of China (grant No. 81672972) and Major Program of Provincial and Ministerial Co-construction, Ministry of Health Science Foundation (grant No. WKJ-ZJ-1701), without the involvement of commercial entities.

Supplementary Materials

Table S1: overall survival analysis of hub genes using K-M plotter. (Supplementary Materials)