Abstract

Background. Lung adenocarcinoma (LUAD) comprises around 40% of all lung cancers, and in about 70% of patients, it has spread locally or systemically when first detected leading to a worse prognosis. Methods. We filtered out differentially expressed genes (DEGs) based on the RNA sequencing data in the Gene Expression Omnibus database and verified and deeply analyzed screened DEGs using a combined bioinformatics approach. Results. Expressions of 11,143 genes in 694 nontumor lung tissues and LUAD cases from 8 independent laboratories were analyzed; 188 mRNAs were identified as differentially expressed genes (DEGs). A PPI network constructed with 188 DEGs screened out 8 hub DEGs (CDH5, PECAM1, VWF, CLDN5, COL1A1, MMP9, SPP1, and IL6) which highly interconnected with other nodes. The expression levels of 8 hub genes in LUAD and control were assessed in the Oncomine database, and the results were consistent. The survival curves of 8 hub genes showed that their expressions are significantly related to the prognosis of lung cancer and LUAD patients except for IL6. Since the expression of IL6 is nonspecific and highly sensitive, we choose the other 7 hub genes we had verified to do the next analysis. Mutual exclusivity or cooccurrence analysis of 7 hub genes identified a tendency towards cooccurrence between CDH5, PECAM1, and VWF in LUAD. The coexpression profiles of CDH5 in LUAD were identified, and we found that PECAM1 and VWF coexpressed with CDH5. Immunohistochemistry and RT-PCR analysis showed that higher levels of CDH5, PECAM1, and VWF were expressed in normal lung tissues but a low or undetectable level was found in LUAD tissues. Conclusions. Taken together, we speculate that CDH5, PECAM1, and VWF played an important role in LUAD.

1. Background

Lung cancers (LC) account for 13% of all cases in 2019 in the United States, and the greatest number of deaths are from LC whether in men (account for 24%) or women (account for 23%). One-quarter of all cancer deaths are due to LC which made it the leading cause of cancer-related mortality [1]. LC are mainly divided into two subtypes: non-small-cell lung carcinoma (NSCLC) and small-cell lung carcinoma (SCLC), accounting for 85% and 15% of all cases, respectively [24]. NSCLC can be classified into the major subtypes adenocarcinoma (AD) and squamous cell carcinoma (SCC). Lung adenocarcinoma (LUAD) is the most frequent histologic type of LC which comprises around 40% of all LC [57].

Patients whose LC have spread locally or systemically when first detected, constituting 70% of all patients, usually receive chemotherapy and/or radiation therapy instead of surgery [810]. Hence, local extension and metastases are also primary causes of death in LUAD patients. In the meantime and even more ominously, the recurrence rates in resected stage I NSCLC still range between 22% and 38% [11, 12], and nearly half of LUAD patients suffered a relapse and would die as a result of disease recurrence [13].

LUAD patients’ risk assessment and therapeutic plan determination were usually dependent on traditional risk factors including tumor size, stage, and lymph node status. However, these existing clinical methods for prognosis evaluation still have defects such as invasiveness, unsystematic, and subjective; they cannot offer help for an effective targeted therapy and even do not clearly distinguish between patients who have a high or a low risk [14]. Therefore, it is necessary to establish a more accurate method to manage this high-mortality disease. It is urgent to find one or a few accurate indicators in the genesis and development of LUAD. We hope to shed light on exploring potential diagnostic and therapeutic targets in LUAD by our results of data analysis.

Although there are many studies about the mechanism of LUAD, the definite molecular cause of LUAD is still unclear. It is extremely vital and sorely demanded to reveal the pathogenesis and underlying molecular mechanisms of LUAD; it is beneficial for early diagnosis, prevention, and targeted therapy molecular biomarkers. In the present study, we want to find one or several molecular biomarkers which may eventually be applied to effective diagnosis and therapy of LUAD.

Microarray was a high-throughput platform which could measure the expression of the global gene. It was widely used for searching for possible genetic or epigenetic alternations, identifying molecular biomarkers such as for carcinomas [15, 16]. Huge amounts of core slice data were produced with extensive use of microarrays, and most of them were stored and shared in public databases [17, 18]. However, because of the limitations of some of these studies which included small study populations, single-center cohorts, and model overfitting, different researchers sometimes reached different conclusions. For getting more accurate reasons about onset and progression of LUAD, we integrated, reanalyzed, and verified the data stored in public databases. Some studies had been done to seek differentially expressed genes (DEGs) in LUAD though gene expression profiling microarrays [1921]. However, for independent researches involving heterogeneous tissues or samples, in addition, their results were obtained from single cohort study, so their conclusions were limited or inconsistent. Consequently, key genes and pathways were difficult to confirm according to different studies. With our study, via integrating, reanalyzing, and verifying available and relevant expression profiling microarray datasets that have been uploaded in the Gene Expression Omnibus (GEO) database by different laboratories, one-sidedness of individual researches is overcome and statistical power increased; therefore, the screening results are more precise and reliable.

In the present study, we have downloaded 8 original microarray datasets, GSE32863 (58 nontumor lung tissues, 58 LUAD tissues), GSE7670 (28 nontumor lung tissues, 28 LUAD tissues), GSE40791 (100 nontumor lung tissues, 94 LUAD tissues), GSE63459 (32 nontumor lung tissues, 33 LUAD tissues), GSE75037 (83 nontumor lung tissues, 83 LUAD tissues), GSE85841 (8 nontumor lung tissues, 8 LUAD tissues), GSE116959 (11 nontumor lung tissues, 57 LUAD tissues), and GSE118370 (6 nontumor lung tissues, 6 LUAD tissues), from the GEO database (https://www.ncbi.nlm.nih.gov/geo). There were a total of 326 nontumor lung tissues and 367 LUAD tissues available. Subsequently, the DEGs were screened using R language, and 188 DEGs were filtered out from 11,143 genes based on 8 independent datasets which contained 694 cases. To better clarify the pathological mechanisms of LUAD, we performed cluster analysis, functional analysis, and biological pathway and process enrichment analysis for 188 screened DEGs. To determine hub genes with significant expression difference between normal lung and LUAD, we constructed a protein-protein interaction (PPI) network for the 188 DEGs screened with the threshold of , and 8 hub genes were screened out. They are CDH5, PECAM1, VWF, CLDN5, COL1A1, MMP9, SPP1, and IL6. To verify our screening results, the expression signatures of hub DEGs in clinical cancer tissue were assessed by several databases. Their expressions in normal lung and LUAD tissues were analyzed in the Oncomine database. The survival times of normal and LUAD patients with high or low DEG expressions were identified with the KM Plotter database. The coexpression analysis of hub DEGs which was conducted by cBioPortal reveals the cooccurrence or mutual exclusivity relationship and provided the information for the possible underlying mechanism. All in all, we hope to gain further insight of LUAD at the molecular level and explored the potential candidate biomarkers for diagnosis, prognosis, and drug targets.

2. Materials and Methods

2.1. Microarray Data Selection

In the current study, the gene expression profiling datasets (ID: GSE32863, GSE7670, GSE40791, GSE63459, GSE75037, GSE85841, GSE116959, and GSE118370) were obtained from the Gene Expression Omnibus database of the National Center for Biotechnology Information (NCBI). “Lung adenocarcinoma,” “Homo sapiens [organism],” and “expression profiling by array [dataset type]” were used as keywords for searching. There were 260 results under this search condition. We selected the microarray datasets according to the following rules: the samples must contain LUAD tissues and normal lung tissues, no special treatment on patients, , and . Under these conditions, we obtained 8 datasets to perform further analysis. We extracted expression data of all sequenced genes from the original studies by 8 independent researchers. The following information was extracted from each screened study: GEO accession number, sample type, platform, number of normal and LUAD tissues, and gene expression data. The information of the selected GEO series is listed in Table 1. We download the raw data of 693 specimens from 8 independent GEO series. In total, 326 nontumor tissues and 367 LUAD specimens were enrolled in 8 GEO series. The process of data filing is shown in Figure 1.

2.2. Data Preprocessing before Difference Analysis

We utilized the robust multiarray average algorithm of the Affy package in R language to convert the raw data to expression data. According to the platform annotation files, the expression levels of the probe sets were converted into gene expression levels by the Bioconductor annotation function of R. Expression values of multiple probes for a given gene were averaged. With this, we obtained 8 tables containing expression value of tested genes based on 8 GEO series. Then, we make use of the sameGene package in R to merge the gene expression data of 693 patients from datasets of GSE32863, GSE7670, GSE40791, GSE63459, GSE75037, GSE85841, GSE116959, and GSE118370 into one output table according to the same gene names. Then, the datasets of the output table were assigned into 2 groups: normal lung group and LUAD group. Batch normalization was conducted on all expression profiling data using the ComBat algorithm in the Surrogate Variable Analysis package of R language. The normalization can eliminate the systematic variations among different studies.

2.3. Differentially Expressed Gene (DEG) Screening

The DEGs were selected from the normalized data of normal lung and LUAD tissues using the linear models for microarray data (Limma) package in Bioconductor (http://www.bioconductor.org/packages/release/bioc/html/limma.html). The filter criteria is , also known as and adjusted value < 0.05.

A volcano plot, representing the distribution of the fold change and value of all genes, was drawn. A heat map of expression hierarchical clustering analysis for 188 DEGs was performed to investigate probable discrepancies between normal lung and LUAD tissues.

2.4. Functional and Pathway Enrichment Analysis for All DEGs

To explore the main molecular function and pathway that involved DEGs, we did functional enrichment analysis using FunRich. The FunRich software is a standalone functional enrichment and network analysis tool. It was utilized to perform cellular component, functional (molecular function and biological process) and pathway (biological pathway) enrichment analysis for the obtained DEGs with value < 0.05 as a strict cutoff.

2.5. Protein-Protein Interaction (PPI) Network Construction and Hub Gene Identification

The functional protein-protein interaction (PPI) analysis is essential to interpret the molecular mechanisms of key cellular activities in carcinogenesis. It is constructed on the basis of the Search Tool for the Retrieval of Interacting Genes (STRING) database [22]. Our study constructed a PPI network for all DEGs and visualized the interaction network with the cutoff criterion of .

Hub genes were selected with , and finally, there are 8 hub genes we selected which were highly interconnected with other nodes.

2.6. Oncomine Database Analysis and Kaplan-Meier Plotter Analysis for DEGs

Oncomine is a cancer transcriptomic database and web-based discovery platform with genome-wide expression analyses of various cancers [23, 24]. The expression levels of 8 screened hub DEGs were analyzed using the Oncomine Cancer Profiling Database (https://www.oncomine.org). We analyzed and compared the expression of 8 screened hub genes between LUAD tissues and normal lung tissues in the Oncomine database.

The Kaplan-Meier Plotter is a database that can be used to assess the effect of 54,675 genes on patient survival using 10,461 cancer samples (breast, ovarian, lung, and gastric cancer) [25]. For survival analyses, we analyzed the prognostic value of 8 screened hub DEGs in normal lung and LUAD using the Kaplan-Meier Plotter (http://kmplot.com/analysis/) and tested for significance using logrank tests. The analysis was performed according to the manufacturer’s instructions. ALL hub DEGs had a significant correlation with the overall survival of LC and LUAD patients except IL6.

Coexpression analysis in Oncomine was used to identify sets of genes with synchronous expression patterns. The coexpression profiles of CDH5 in LUAD was identified and presented as the pattern of heat map.

2.7. Genetic Alteration and Coexpression Analysis of Screened Hub DEGs

The cBioPortal (http://www.cbioportal.org) is an open-access resource for interactive exploration of multidimensional cancer genomics datasets [26]. We studied alterations (amplification, deep deletion, missense mutation, inframe mutation, truncating mutation, mRNA upregulation, and mRNA downregulation) in VWF, CLDN5, CDH5, COL1A1, MMP9, PECAM1, and SPP1 genes in LUAD (TCGA, provisional) case set using cBioPortal. The cBioPortal is also used for cooccurrence or mutual exclusivity and customizable correlation analysis.

3. Quantitative Real-Time PCR Analysis and Immunohistochemistry

Total RNA in 2 paired lung and LUAD tissues was extracted using TRIzol (Invitrogen). The cDNA was reverse-transcripted from 1 mg of total RNA using the reverse transcriptase kit (Toyobo). Q-PCR was performed using a 7500 Real-Time PCR System (Applied Biosystems) and SYBR Green PCR Master Mix (BioRad). GAPDH was used for normalization. Primers sequences were listed in Table 2.

4. Results

4.1. Normalization of Gene Expression Data

Expression data of 11,143 genes from 693 samples (326 nontumor lung tissues and 367 LUAD specimens) were normalized with the median method following batch normalization. The expression values of all specimens before and after normalization are shown in the top and bottom box figures in Figure 2. The horizontal axis stands for different samples.

The vertical axis stands for gene expression value. The black horizontal line represents the median of expression value of the sample, which is almost on a straight line after batch normalization, suggesting that normalized data were qualified.

4.2. Selection of DEGs and Expression Hierarchical Clustering Analysis

We used R Limma package software to analyze which gene sets were aberrantly expressed in comparisons with the threshold of and . The DEGs were identified using test statistical algorithm. The significant genes’ lists were selected according to fold change of gene expression values.

In total, 188 DEGs (44 upregulated and 144 downregulated) were obtained based on the gene expression data of 693 patients (326 normal lung and 367 LUAD specimens from 8 GEO series). We list the top 40 DEGs according to the fold change of the gene expression value in Table 3. The volcano plot (Figure 3) showed the distribution of all DEGs. The volcano plot shows the distributions of fold change [(log2FoldChange] (-axis) and values [-log10 ( value)] (X-axis). In Figure 4, fold change patterns of all DEGs were selected, analyzed, and displayed in a heat map to evaluate and compare differences in gene expression between normal lung and LUAD.

4.3. Function and Pathway Enrichment Analysis of all DEGs

Cellular component enrichment analysis of all DEGs described their distribution and structure (Figure 5(a)). About the molecular function, the DEGs significantly enriched in cell adhesion molecule activity, extracellular matrix structural constituent, metallopeptidase activity, calcium ion binding, and receptor activity (Figure 5(b)). To better clarify the pathological mechanisms, we performed biological pathway enrichment analysis. According to the result of the pathway enrichment analysis, DEGs were mainly enriched in epithelial-to-mesenchymal transition (EMT), cell surface interactions at the vascular wall, mesenchymal-to-epithelial transition (MET), platelet adhesion to exposed collagen, and so on (Figure 5(c)).

To further investigate the biological effects of aberrantly expressed DEGs in LUAD, the biological process enrichment analysis of 188 screened DEGs was carried out. The top 9 enriched biological processes are shown in Figure 5(d). The functions in the biological process category were enriched in cell communication, signal transduction, cell growth and/or maintenance, aldehyde metabolism, and so on.

4.4. PPI Network Construction and Hub Gene Selection

Based on the information in the STRING protein query from public databases, we constructed the PPI network for 188 DEGs using as the screening index (Figure 6); there are 8 hub genes selected with . They are shown in the innermost circle. The op 8 hub genes were CDH5, PECAM1, VWF, CLDN5, COL1A1, MMP9, SPP1, and IL6.

4.5. Validation of the Expression of Obtained Hub DEGs in Oncomine Database

To further elucidate whether the expressions of the DEGs in LUAD patients were consistent with our analysis result based on GEO data, a clinical study was performed in the light of previous results in cancer microarray database of Oncomine. The expressions of 8 hub DEGs were verified and are shown in Figure 7. There are 5 downregulated DEGs and 3 upregulated DEGs in LUAD. The expression trend of 8 DEGs is in accordance with our results obtained from the GEO sequenced data. The differences had statistical significance in upregulated DEGs (), but were not statistically significant in downregulated DEGs although the expression of DEGs had a trend of downregulation in LUAD.

4.6. Survival Analysis for Obtained Hub DEGs with Kaplan-Meier Plotter

According to our previous bioinformatics analyses and validation, the hub genes’ expression in LUAD patients in the Oncomine database is consistent with our research results from the GEO series. To explore the association of 8 hub gene expressions with the prognosis of LUAD patients, the survival curves were drawn using the Kaplan-Meier Plotter database. As show in Figure 8, the low expressions of CDH5, PECAM1, VWF, and CLDN5 were associated with worse prognosis(), and the high expressions of COL1A1, MMP9, and SPP1 were associated with worse prognosis (). The differences were statistically significant. In other words, LUAD patients with low expression of screened upregulated hub genes had a better prognosis, and low expression of screened downregulated hub genes had worse prognosis except for the IL6 gene. As a characteristic cytokine expressed in plasma and associated with inflammation, IL6’s expression is nonspecific and sensitive, so we choose the other 7 hub genes we had verified to do the next analysis.

4.7. Coexpression Analysis and Genetic Alterations of Obtained Hub DEGs in LUAD

The OncoPrint from cBioPortal is a concise and compact graphical summary of genomic alterations in multiple genes across a set of tumor samples. It summarized distinct genomic alterations including mutations, CNAs (amplifications and homozygous deletions), and changes in gene expression or protein abundance. Based on previous results of difference analysis, expression validation, PPI networks construction, and survival analysis, VWF, CLDN5, CDH5, COL1A1, MMP9, PECAM1, and SPP1 were hub genes highly interconnected with other DEGs. The expression of these hub genes was compared and analyzed in the GEO database and Oncomine database; their expression differences between normal lung and LUAD tissues were certain and obvious. And the relationship between their expressions and overall survival was validated; there is significant correlation between hub genes’ expression and LUAD patients’ survival time. We analyzed genomic alterations of the screened hub DEGs using cBioPortal and visualized gene alterations across a set of LUAD cases (Figure 9(a)). OncoPrint can also help identify trends such as mutual exclusivity or cooccurrence between genes. The mutual exclusivity and cooccurrence from cBioPortal can be exploited to identify previously unknown mechanisms that contribute to oncogenesis and cancer progression, so we used cBioPortal to explore the potential relationship between 7 hub genes. As Table 4 shows, there was a tendency towards cooccurrence between CDH5 and PECAM1 or VWF in LUAD ().

Coexpression analysis in Oncomine is a tool which can be used to identify sets of genes with synchronous expression patterns. The coexpression profiles of CDH5 in LUAD were identified and presented as the pattern of the heat map. We identified the coexpression profiles for CDH5 with a strong cluster of the top 20 genes across a panel of 107 LUAD tissues. The result showed that, as DEGs that we screened out from LUAD and control tissues based on the GEO database, PECAM1 and VWF coexpressed with CDH5 (Figure 9(b)).

4.8. Immunohistochemistry and RT-PCR Analysis of CDH5, PECAM1, and VWF

According to our results, CDH5, PECAM1, and VWF were DEGs we screened with the threshold of and , and they were hub genes in the PPI network; their expression in normal lung and LUAD was verified in the Oncomine database, and their expressions were significantly related to the prognosis of lung cancer and LUAD patients; moreover, there is a tendency towards cooccurrence between CDH5, PECAM1, and VWF in LUAD. We further verified the expression of CDH5, PECAM1, and VWF through immunohistochemistry and RT-PCR analysis; immunohistochemistry (IHC) data from the Human Protein Atlas (http://www.proteinatlas.org) indicated strong expression of CDH5, PECAM1, and VWF protein in lung tissues, but not in LUAD tissues (Figure 10(a)). The mRNA levels of CDH5, PECAM1, and VWF were noticeably decreased in LUAD tissues compared to paired lung tissues (Figure 10(b)).

5. Conclusion

Our study utilized analysis of whole genome sequencing results from different laboratories and screened out DEGs from 5 different sequencing platforms containing 8 original microarray datasets and 694 cases. There were 44 upregulated DEGs and 144 downregulated DEGs in LUAD with the threshold of and . Biological process analysis, biological pathway analysis, and PPI network analyses provided a set of related genes and pathways to help elucidate the molecular mechanisms of LUAD. Validation experiments verified that the expression levels of DEGs in the Oncomine database are consistent with their expression levels in the GEO series. The survival curves of hub genes showed that the expressions of hub genes were significantly related to the prognosis of LUAD patients () except for IL6. At this point, we believe CDH5, PECAM1, VWF, CLDN5, COL1A1, MMP9, and SPP1 play a vital role in LUAD. Mutual exclusivity or cooccurrence analysis of screened 7 hub genes showed that there was a tendency towards cooccurrence between CDH5, PECAM1, and VWF in LUAD (). Then, the coexpression profiles for CDH5 obtained based on Oncomine showed that PECAM1 and VWF coexpressed with CDH5 in LUAD, and they were also DEGs that were screened out from LUAD based on our previous results. Immunohistochemistry and RT-PCR analysis showed that higher levels of CDH5, PECAM1, and VWF were expressed in normal lung tissues but a low or undetectable level was found in LUAD tissues. From all above results, we speculate that CDH5, PECAM1, and VWF play an important role in LUAD. Though analyzed all GSE series-compared normal lung and LUAD tissues in the GEO database; the prediction is more accurate and bias of individual studies can be overcome. Our study provides information for researchers to identify possible candidate genes and pathways which may be involved in LUAD for further studies.

6. Discussion

Worldwide, approximately 2,093,800 patients are diagnosed with lung cancer each year, and 1,761,000 are expected to succumb to the disease in 2018. Statistically, in both sexes combined, lung cancer is the most commonly diagnosed cancer (11.6% of the total cases) and the leading cause of cancer death (18.4% of the total cancer deaths) [27, 28]. LC is the most frequent cancer and the leading cause of cancer death among males and women in 2019 in the United States [1]. There are 2 main forms of LC: NSCLC (85% of patients) and small-cell lung cancer (SCLC) (15%). Adenocarcinoma is the most common type of NSCLC and accounts for approximately 40% of lung cancers [2931]. The most common diagnostic mean for LC is fiberoptic bronchoscopy; it can help to diagnose NSCLC, but quite often, the amount of obtained material is not sufficient to subclassify NSCLC in more detail or for targeted therapies [32]. The vast majority of LC patients are diagnosed until advanced-stage LC, so that they had a worse prognosis and a high risk of distant recurrence and death [33]. We know little about the target for early detection of LUAD. Consequently, there is an urgent need for diagnostic molecular features or biomarkers that can be associated with survival and disease recurrence in LUAD.

A field which has recently contributed significantly to improve diagnostics, classification, and prognostics is the LUAD transcriptomics microarray, a whole transcriptome high-throughput sequencing and analysis technique which identifies changes in the mRNA expression, and is now being used to gain a more detailed understanding of the molecular mechanism of LUAD [34, 35]. Employing analysis of whole transcriptome sequencing results from different laboratories, statistical power increased and prediction is more accurate; moreover, bias of individual studies can be overcoming. In the current study, we focused on the aberrantly expressed mRNAs in LUAD based on GEO RNA-seq data, and the common DEGs that were screened out from different researchers containing 693 samples were listed. There were 44 upregulated DEGs and 144 downregulated DEGs in LUAD with the threshold of and .

Biological pathway analysis of all DEGs showed that the DEGs were mainly involved in epithelial-to-mesenchymal transition (EMT), cell surface interactions at the vascular wall, mesenchymal-to-epithelial transition (MET), platelet adhesion to exposed collagen, and glypican pathway. In the past decades, an increased number of studies have shown that EMT is associated with poor prognosis in different tumor types including NSCLC [36, 37]. EMT, as well as its reverse process, MET, is thought to be involved in the pathogenesis of numerous lung diseases ranging from developmental disorders and fibrotic tissue remodelling to lung cancer [38]. Kakolyris et al. have shown previously in NSCLC an association between high mitogenic/angiogenic factor expression with high angiogenesis and poor prognosis [39]. Glypican-3 (GPC3) is a membrane-bound proteoglycan, belonging to the glypican-related integral membrane proteoglycan family, which includes six members (GPC1–GPC6). It has been identified as a potential biomarker candidate in lung carcinoma, severe pneumonia, and acute respiratory distress syndrome (ARDS) [40]. Glypican-5 (GPC5) was a novel tumor metastasis suppressor in LUAD through suppresses EMT [41]. Function analysis can help us better understand the mechanism of LUAD and provide guidance for LUAD prevention and treatment; however, further laboratory and clinical researches are required.

The PPI network of 188 DEGs which were screened from 693 LUAD and control tissues using as the screening index helped us find 8 hub DEGs which had the most functional connections: CDH5, PECAM1, VWF, CLDN5, COL1A1, MMP9, SPP1, and IL6. The 8 hub genes interact with a protein number at least >15. To verify our previous results in this paper, we assessed the expression levels of the 8 hub DEGs. The expression levels of the 8 hub DEGs were analyzed in the Oncomine database, respectively. The expression trend of 8 DEGs is in accordance with our results obtained from the GEO sequenced data. To verify our results, we analyzed further the relationship between hub genes’ expression and prognosis. The hub genes that we screened as upregulated in LUAD were correlated with poor prognosis, and the hub genes that we screened as downregulated in LUAD were associated with favorable prognosis except for IL6. From all the above results, considering candidate biomarkers’ characteristics of relative stability, further analysis was performed with the remaining 7 hub genes except IL6.

We draw survival curves of the screened hub genes and found that the prognoses of LC/LUAD patients were statistically significant with hub genes’ expression (). OncoPrint helped us identify trends such as mutual exclusivity or cooccurrence of screened hub genes. We found that there was a tendency towards cooccurrence between CDH5, PECAM1, and VWF in LUAD (). Then, coexpression analysis with the Oncomine database for CDH5 found that CDH5 coexpressed with PECAM1 and VWF in LUAD, and they were also DEGs that were screened out from LUAD based on our previous results. Our results seem to show that CDH5, PECAM1, and VWF play a vital role in LUAD. CDH5 encodes Cadherin-5, which is localized at intercellular junctions of endothelial cells and plays an important role in the control of vascular integrity and permeability, and contributes to endothelial cell assembly in tubular structure [42]. Many studies had reported that CDH5 expression is associated with multiple tumors [43, 44], such as gastric cancer and breast cancer, but the relationship between CDH5 and LUAD is still to be determined. PECAM1 is a multifunctional cell adhesion molecule involved in numerous physiologic processes within the vasculature; Abraham et al. found that the activity of PECAM1 appears to be associated with the tumor microenvironment and tumor cell proliferation [45]; Kuang et al. demonstrated that PECAM1 could be a potential prognostic factor and therapeutic target in NSCLC [46]. The von Willebrand factor (VWF) is a multimeric glycoprotein and plays an essential role in mediating platelet-tumor cell interactions [47]. The relationship between VWF and LUAD is still underway. From all the above results, we speculate that CDH5, PECAM1, and VWF play an important role in LUAD.

This study had several limitations. Firstly, the expression of screened downregulated hub genes in LUAD patients in the Oncomine database was not statistically significant (), but based on the figure, the trends of hub genes’ expression were consistent with the GEO database; statistical nonsense may be because of insufficient samples. Second, even though we performed preliminary validation of the results, more in-depth studies are needed in the future. Therefore, we hope that these results can be integrated into future experiments and facilitate further understanding of the molecular mechanisms of LUAD.

Despite these limitations, we believe that this analysis represents a valuable resource and can be considered as a preliminary study for future studies of LUAD. Our study provides information for researchers to identify possible candidate genes and pathways which may be involved in LUAD for further studies. We gained further insight of LUAD carcinogenesis at the molecular level and explored the potential candidate biomarkers for diagnosis, prognosis, and drug targets.

Abbreviations

CDH5:Cadherin-5
DEGs:Differentially expressed genes
EMT:Epithelial-to-mesenchymal transition
GEO:Gene Expression Omnibus
KM Plotter:Kaplan-Meier Plotter
LC:Lung cancer
LUAD:Lung adenocarcinoma
Log2FC:Log2 fold change/logarithm of fold change
MET:Mesenchymal-to-epithelial transition
NSCLC:Non-small-cell lung carcinoma
PECAM1:Platelet endothelial cell adhesion molecule 1
PPI:Protein-protein interaction
SCLC:Small cell lung carcinoma
VWF:von Willebrand factor.

Data Availability

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article. The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

Conflicts of Interest

The authors have no conflicts of interest to declare.

Authors’ Contributions

HF conceived of the study and drafted the manuscript. HF and SC performed the experiments. HF participated in the design of the study and performed the statistical analysis. CX supervised all the work and revised the manuscript. All the authors have read and approved the manuscript. Hongjun Fei and Songchang Chen contributed equally to this work and co-first authors.

Acknowledgments

The study was supported by the Nosocomial Scientific Research Fund Projects from the International Peace Maternity and Child Health Hospital of Shanghai Jiao Tong University School of Medicine (No. GFY5801), clinical research special projects from the Shanghai Municipal Health Commission (No. 20204Y0230), and Shanghai Sailing Program from the Shanghai Science and Technology Committee (No. 19YF1452200).