Gastric cancer (GC) is usually diagnosed in an advanced stage at the first visit due to the atypical clinical symptoms. The low surgical resection rate and chemotherapy sensitivity result in dismal survival. Therefore, it is urgent to develop novel biomarkers with high sensitivity and specificity to accurately assess the prognosis of GC patients. In the present study, 3385 differentially expressed genes (DEGs) were obtained from the single-cell RNA sequencing data of GC specimens. Using the unsupervised dimensionality reduction, we further found 3 subsets of cells including gastric cells, plasmacytoid dendritic cells, and memory T cells. Based on the cell clustering, we explored the key regulatory genes for GC progression by pseudo-time analysis and functional enrichment analysis. According to the results, the significant differentially expressed fatty acid-binding protein 1 (FABP1) verified by pseudo-time analysis was identified as the hub gene of GC progression. FABP1 was shown to be closely related to the long-term survival and the age at diagnosis of patients with GC in analysis based on the TCGA (The Cancer Genome Atlas) database. To further verify the role of FABP1 in GC, we performed immunohistochemical (IHC) analysis using the GC tissue microarray and found that the expression level of FABP1 was higher in GC tissues than in the adjacent tissues. Moreover, GC patients with higher expression of FABP1 had a worse clinical outcome. In summary, our study revealed that FABP1 is a potential effective biomarker for the prognosis of GC, and high expression of FABP1 predicts unsatisfactory survival.

1. Introduction

Gastric cancer (GC) is one of the most common gastrointestinal cancers in the world, with the second-highest mortality rate [1]. Global GC incidence varies widely, with the highest incidence in East Asia [1]. The current treatment strategies for GC are mainly surgery, chemotherapy, radio-chemotherapy, and targeted therapy. However, chemotherapy resistance and high postoperative cancer recurrence rate usually lead to cancer-related death [2, 3]. Therefore, it is critical to study the molecular mechanisms of gastric cancer progress and to identify novel biomarkers for early gastric cancer.

Recently, single-cell RNA sequencing (scRNA-seq) technology has made significant progress and is widely used in the study of various cancers [4]. This approach is crucial for the discovery, identification, and validation of new biomarkers. Individualized clinical treatment will also be the goal of scRNA-seq technology [5].

Fatty acid-binding proteins (FABPs) are a group of highly conserved cytoplasmic proteins with a low molecular weight [6]. As lipid chaperones, FABPs mainly solubilize fatty acids, which are involved in intracellular metabolism and other signaling processes [7]. At present, more than nine kinds of FABPs have been identified. The human FABP1 gene was first found in the liver, located on chromosome 2, encoding a protein of 127 amino acids (aa). FABP1 protein has a classical β structural fold and two short alpha-helices [810]. FABP1 has multiple functions, such as a protective agent against oxidative stress, and is involved in the regulation of adipogenesis and lipid metabolism [11]. In addition, FABP1 is associated with various diseases such as liver fibrosis, nonalcoholic steatohepatitis, acute kidney injury, renal ischemia/reperfusion injury, type I diabetes, and type II diabetes [12]. Studies have shown that the expression level of FABP1 is closely related to the occurrence and progression of various tumors. FABP1 is found highly expressed in Barrett’s esophagus [13]. The expression of FABP1 is also positively correlated with the incidence of pancreatic cancer, especially the diabetes-related pancreatic cancer [14]. FABP1 has also been found to be downregulated in cancers. Low expression of FABP1 was found in the early-stage colorectal cancer and 93% of microsatellite unstable colorectal cancers [15, 16].

In this study, we analyzed the single-cell sequencing data of gastric cancer and found that FABP1 was one of the most significant differentially expressed genes (DEGs) in GC tissues. FABP1 was identified as the hub gene in GC progression. The TCGA analysis also showed that FABP1 was closely related to the prognosis of GC patients. The IHC analysis based on GC tissue microarray further verified that FABP1 was highly expressed in GC tissues, confirming that GC patients with higher expression of FABP1 have a lower long-term survival rate. Therefore, we proposed that FABP1 is a promising biomarker for GC.

2. Methods

2.1. Data Acquisition and Preprocessing

After searching through the Gene Expression Omnibus (GEO, http://ncbi.nlm.nih.gov/geo/) database [17], we downloaded the reads per kilo base per million mapped reads (RPKM) scRNA-seq data from the GSE134520 [18]. The data were constructed by thirteen gastric antral mucosa biopsies with the pathologic diagnosis including nonatrophic gastritis (NAG), chronic atrophic gastritis (CAG), intestinal metaplasia [17], or early gastric cancer (EGC). These data also match the platform annotation of GPL20795 HiSeq X Ten (Homo sapiens) [18]. The study flow chart is shown in Figure 1.

During data preprocessing, we read the original expression values by the Seurat function, and the number of genes and the Unique Molecular Identifiers (UMIs), representing the non-normalized expressional values within a cell, were automatically calculated [19, 20]. The sum of the percentages for mitochondrial was calculated with the criterion of filtration of 5%. Here, cells with expressed genes <100 and genes expressed in <3 cells were removed from the dataset. The LogNormalize algorithm was used to normalize the original data, and the FindVariableGenes algorithm was used to find the variable features [19, 20].

In addition, based on the orthogonal transformation algorithm, the principal component analysis (PCA) analysis was applied to the dimension reduction process of scRNA-seq data to highlight data features in lower dimensions [21]. After selecting the key components, the important dedicator contributing to data differentiation, the tSNE method was selected to detect the cell subtypes with the data resolution of 1.0(3, 4). For the patterns of the gene expression matrix of a cell corresponding to different cell subtypes, we selected the SingleR and scCATCH methods to identify the cell subtypes [19, 20, 22].

2.2. Differential Gene Analysis

The FindAllMarkers function is a frequently used method to detect the differentially expressed genes (DEGs), based on the Wilcox analysis, with the criterion of log |fold changes (FC)| over 0.25 and after FDR correction [19].

2.3. Analysis of Biological Functions of Differential Genes

Using the clusterProfiler package and MetaScape database (http://metascape.org/gp/index.html#/main/step1), we further analyzed the Gene Ontology (GO) [23] and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment, respectively [24]. The MetaScape database is an international authoritative functional database of gene annotation, visualization, and integrated exploration, facilitating the integration of genetic pathways and functional enrichment. A P-value was considered statistically significant in our GO function and KEGG pathway analysis.

2.4. Pseudo-Time Analysis and Pseudo-Time DEG Analysis

Many of the cellular states of the various cell fate processes are not perfectly synchronized, some cells are at the beginning of a particular process, while others are already at the completion of that process, which is also known as “asynchronous.” Based on the Monocle reverse embedding graph algorithm, we quantified the transformation under different cell states and the transcription state of the corresponding gene set and form a trajectory by sorting these cells according to this transcriptional process, thus tracking the process change function that accompanies the trajectory, called pseudo-time analysis. Pseudo-time is an abstract unit of differentiation, which is only a distance from the cell to the start of the trajectory, measured along the shortest path. By estimating SizeFactors algorithm, we evaluated the gene expression range of each cell and used it for subsequent normalization and calculation of gene variance. Differential GeneTest algorithm is used to calculate the differential core regulatory genes in pseudo-time analysis, and B-H correction value <0.01 is considered to be pseudo-time DEGs.

2.5. Analysis of Protein Interaction Networks Based on DEGs

After deriving the DEGs of the hub pathway, we performed the core expression analysis in a volcano map based on the average Log |FC| and Negative Log10 (adjusted value). Subsequently, the hub tagged target proteins were calculated based on the COMPPI database (http://comppi.linkgroup.hu/) [25]. The COMMPI platform was used to integrate the subcellular localizations, protein-protein interactions, and scores of localizations and interactions, with the locations of cytosol, mitochondrion, nucleus, extracellular secretory pathway, and membrane.

2.6. Functional Enrichment and Prognosis Analysis of Hub Genes

We used the ToppGene database (https://toppgene.cchmc.org/) to annotate and visualize the hub marker’s functional enrichment based on Gene Ontology [23] and pathways [26]. The terms with were significantly enriched. To gain further insight into the hub gene expression and its association with prognosis, we applied a friendly online web tool Wanderer (http://maplab.imppc.org/wanderer/) to identify the important clinical features correlated with transcriptional expression. Additionally, the SurvExpress (http://bioinformatica.mty.itesm.mx:8080/Biomatec/SurvivaX.jsp) online database was applied to validate the prognostic relationship among the FABP1 expressions [23]. Besides, the statistical difference was considered with the value <0.05.

2.7. Immunohistochemical Analysis and Statistical Analysis

Gastric cancer tissue microarray (TMA, HStmA180Su19) was obtained from Shanghai Outdo Biotech, including 94 cases of gastric cancer tissues and 86 cases of adjacent tissues; with complete case data and follow-up information, more detailed sample information and clinical features of colorectal cancer are shown in Table 1. The IHC assays and IHC scores were performed with a previously described protocol [27], and the antibodies against FABP1 were purchased from Abcam (MA, US, ab171739). The statistical analysis was conducted by the GraphPad Prism software 8.0 (GraphPad Software, Inc., San Diego, CA, USA). Data are represented as means ± standard deviations. The expression level of FABP1 in gastric cancer tissues and adjacent tissues was analyzed by Student's t-test, the Chi-square test was used for the analysis of clinicopathological features, and the Kaplan–Meier method and the log-rank test were used for survival analysis. was considered statistically significant.

3. Results

3.1. Differential Gene Expression Analysis

Using the analytical methods described in Methods, we filtered out cells with unique feature counts over 4000 or less than 200. The sum of 3385 variable features expressed in 4110 early gastric cancer (EGC) cells was subjected to scRNA-seq background correction, normalization, and differentially expressed (DE) analysis (Figures 2(a) and 2(b)). A total of 15 PCs were found to simultaneously meet the selection criteria of the contribution degree model and PCA analysis (Figures 2(c) and 2(d)).

3.2. tSNE Cell Subtype Detection

After annotation and identification, we detected the gastric cells, plasmacytoid dendritic cells, and memory T cells. The corresponding top DEGs expression profile is presented in Figure 3(b). The expression levels of the top 8 hub genes in the corresponding cell clusters are shown in Figure 3(c).

3.3. Biological Function Analysis of Differential Genes

After selecting the appropriate cellular principal components (Figure 4(a)), we performed a pseudo-time analysis based on the monocle algorithm with the data reduction by the DDtree method (Figures 4(b) and 4(c)). In addition, we further analyzed the most significant GO functions and KEGG pathways of DEGs by the pseudo-time analysis. The significant GO functions mainly involved three main aspects: biological processes (BP), cellular components (CC), and molecular functions (MF). In terms of BP-related functions, the DEGs were mainly associated with protein targeting (enriched genes = 68, value = 3.85E-36), the establishment of protein localization to the membrane (enriched genes = 62, value = 2.78E-26), and neutrophil degranulation (enriched genes = 58, value = 4.01E-22). For CC functions, the DEGs are closely related to an adherent junction (enriched genes = 48, value = 1.05E-10), cell-substrate junction (enriched genes = 43, value = 1.85E-10), and focal adhesion (enriched genes = 38, value = 2.23E-10). In terms of MF, DEGs are mainly associated with cell adhesion molecule binding (enriched genes = 51, value = 5.12E-06), structural constituent of ribosome (enriched genes = 42, value = 4.87E-06), and enzyme inhibitor activity (enriched genes = 28, value = 2.91E-05). Visualization of the GO functions occupied by DEGs is shown in Figure 4(d). In terms of their KEGG pathways, the DEGs were significantly correlated with oxidative phosphorylation (enriched genes = 45, value = 3.21E-12), protein processing in the endoplasmic reticulum (enriched genes = 38, value = 2.75E-10), and TNF signaling pathway (enriched genes = 31, value = 1.85E-09). The KEGG pathway enrichment results are shown in Figure 4(e).

3.4. Hub Marker Detection and COMPPI Network Analysis

The volcano map in Figure 4(f) shows the average fold changes in the expression of the marker and adjusted P value (the fold change value is mainly based on ComPPI to analyze the interaction regulation of target proteins at the subcellular level to explore the possible regulation or interaction proteins at the levels of the nucleus, cytoplasm, mitochondria, and cell membrane). Using the COMPPI database and Cytoscape software, network diagrams were generated and FABP1 targeted genes in the network were screened based on the differential cellular locations of correlated connectivity. In this part, we targeted the FABP1, and thus, 7 correlated proteins, located in the cytosol, mitochondrion, nucleus, extracellular secretory pathway, and membrane, were intercepted to construct the PPI network in Figure 4(g).

In Figure 5(a), results indicated that the biological functions of FABP1 were significantly correlated with hormone-sensitive lipase (HSL)-mediated triacylglycerol hydrolysis (P value = 0.0032), fat digestion and absorption ( value = 0.0047), and mechanism of gene regulation by peroxisome proliferators via PPAR-alpha ( value = 0.0051).

3.5. Survival Analysis

Figure 5(b) shows that the expression of FABP1 had a significant influence on the overall survival (OS) of STAD patients ( value = 0.046) in the SurvExpress database. Besides, the expression level of FABP1 was also correlated with age at initial pathologic diagnosis ( and R = 0.31) and OS ( and R = 0.44) in the Wanderer database (Figure 5(c)).

3.6. Upregulation of FABP1 in GC and Its Correlation with Poor Prognosis

To further investigate the expression of FABP1 in GC tissues and its relationship with the prognosis of GC patients, we analyzed the expression of FABP1 in GC tissue microarray by IHC. The results showed that the expression level of FABP1 in GC tissues was higher than that in the noncancer tissues (Figures 6(a) and 6(b)), and the high expression rate of FABP1 in gastric cancer tissues was higher than that in noncancer tissues (Figure 6(c)). However, the expression of FABP1 was not significantly correlated with clinical-pathological features such as age, gender, tumor size, histopathological type, lymph node positive, TNM stage, and HER2 positive (Figure 6(d) and Table 1). More importantly, Kaplan–Meier analysis indicated that upregulation of FABP1 was consistently correlated with a worse prognosis (Figure 6(d)), suggesting a tumor promotion role and prognostic value of FABP1 in GC.

4. Discussion

Studies have shown that about 70% of GC patients have already developed liver and peritoneal metastasis at their first visit [28]. As a result, improving the early diagnosis rate of GC is critical to promoting the survival rate of GC patients. Most GC patients are failed to accept medical examinations in time due to the atypical symptoms. The popularization and promotion of GC screening methods are particularly important. Currently, the diagnostic methods of GC are mainly endoscopy, imaging examination, and serum markers. However, many patients cannot accept the endoscopy because it is an invasive examination. In addition, the lack of skills and experience of endoscopists and pathologists lead to the early missed diagnosis. It is also difficult to detect small lesions by imaging examination, and the disease may be already in an advanced stage when positive symptoms are detected. Nowadays, the serum markers used for the diagnosis of GC are mainly CEA, CA 19-9, and CA72-4. However, these serum markers are not or less expressed in some GC, leading to false negatives and early misdiagnosis of GC. Therefore, it is important to explore and develop new biomarkers with high sensitivity and specificity for the early diagnosis of GC.

Recently, as an emerging sequencing technology, single-cell RNA sequencing can further explore the heterogeneity of malignant tumors, tumor evolution, clinical diagnosis, and treatment at different omics levels of single cells [29]. The key regulatory factors of various tumor cells have been identified using single-cell RNA sequencing technology, including factors in the immune microenvironment, drug resistance, and metastasis. Some researchers conducted single-cell RNA sequencing analysis of human liver cancer T cells and found that there are many dysfunctional CD8+ T cells and regulatory T cells in tumor tissue. By analyzing the DEGs of the two types of cells, they found that the gene Layilin can inhibit the killing function of CD8+ T cells and may become a potential target for liver cancer immunotherapy [30]. In esophageal cancer cell lines resistant to paclitaxel, single-cell RNA sequencing results showed that proteasome genes and HIF-1 signaling genes were associated with acquired paclitaxel resistance in esophageal cancer cells [31]. In a colorectal cancer study, single-cell RNA sequencing analysis was performed, respectively, on the primary, metastatic, and circulating tumor cells of the metastatic colorectal cancer. Results showed that circulating tumor cells have not only the same mutated driver genes (such as APC, KRAS, or PIK3CA) with the primary and metastatic lesions but also new variant genes [32].

In this study, we analyzed the single-cell RNA sequencing data of GC. Firstly, we analyzed 3385 variable cell features expressed by 4110 EGC cells. After annotation and identification, we detected gastric cells, plasmacytoid dendritic cells, and memory T cells. Expression levels of the top 8 (OLFM4, TFF3, TTR, CHGA, SRGN, CCL5, KRT7, and FABP1) hub genes of the corresponding cell clusters were also identified. In addition, using the pseudo-time analysis, we further analyzed the most significant GO functions and KEGG pathways of DEGs, revealing the regulatory effects of DEGs on the biological function of GC according to biological processes, cellular components, and molecular functions, as well as the closed relationship between the DEGs and oxidative phosphorylation, endoplasmic reticulum protein processing, and TNF signaling pathways. Moreover, we identified the significant DEG-FABP1 and found that FABP1 may regulate the PPAR signaling pathway, hormone-sensitive lipase (HSL)-mediated triacylglycerol hydrolysis, fat digestion, and absorption in gastric cancer progression. Survival analysis showed that higher FABP1 expression predicts a lower survival rate in GC patients. The expression of FABP1 is also correlated with the age of patients at initial pathological diagnosis. Single-cell RNA sequencing can obtain genomic and transcriptome information of cancer center cells, pericancerous cells, and distant metastasis cancer cells, so as to find effective therapeutic targets for cancer. For our analysis, we found the abnormal expression of FABP1 in early gastric cancer tissues by analyzing the data of single-cell RNA sequencing, which plays a very important role in the treatment of gastric cancer. In recent years, a number of studies have reported the use of single-cell RNA sequencing to find treatment for gastric cancer. Immune cells and stromal cells were found to exhibit cellular heterogeneity in tissues with distant metastases from gastric cancer, and genes regulating CD8+ cell depletion were screened [33]. Both inflammatory cancer-associated fibroblasts and extracellular matrix cancer-associated fibroblasts can mobilize surrounding immune cells to build a microenvironment conducive to the growth of gastric cancer cells [34]. These results obtained by single-cell RNA sequencing undoubtedly can provide new ideas for the treatment of gastric cancer.

FABP1 is a low-molecular-weight protein composed of 127 amino acids. As a lipid chaperone, each FABP1 molecule can bind to two long-chain fatty acid molecules. FABP1 can also bind to other hydrophobic ligands to regulate various biological processes such as cell growth, differentiation, and apoptosis [35]. Studies have shown that FABP1 is detected in about 38% of GC patients, mainly in gastric papillary adenocarcinoma, female cases, and patients with age less than 50. FABP1 is highly expressed in gastric intestinal metaplasia and gastric adenocarcinoma tissues, but not or less expressed in gastric tissues [36]. We performed IHC by GC tissue microarray and found that the expression level of FABP1 was significantly higher in the GC tissues than in the adjacent tissues. Furthermore, GC patients with higher expression of FABP1 had a worse prognosis. Previously, researchers had reported that FABP1 is expressed in early-stage GC, with a specificity of 95% and a sensitivity of 67% for the diagnosis of early recurrence, and patients with multiple positive results of this gene have a worse prognosis [37].

Interestingly, the expression of FABP1 has no significant correlation with clinicopathological features such as age, sex, tumor size, histopathological type, lymph node positivity, TNM stage, and HER2 positivity. However, some researchers reported that FABP1 expression was detected in the peritoneal lavage fluid of GC patients, and the prognosis of FABP1-positive patients was worse than that of CEA. At least half of them had a peritoneal recurrence, and the recurrence rate was 67% [38]. The results reported are inconsistent with ours, which may be caused by the diversity of clinical samples. In our future study, we will expand the sample size to clarify these issues, and the content of FABP1 can be detected in serum and feces of patients with a confirmed diagnosis of GC, to prove that FABP1 can be used as a marker for the diagnosis of gastric cancer. In addition, we need to conduct further research in other aspects, such as observing the biological effect of FABP1 on the GC cells after downregulating and upregulating the expression of FABP1 in vitro and in vivo and elucidating the molecular mechanism of FABP1 promoting cancer from different perspectives in terms of fat metabolism.

In conclusion, we found that FABP1 is a key regulatory gene of GC and is associated with poor prognosis based on the single-cell RNA sequencing data. Tissue microarray analysis also showed that FABP1 is highly expressed in GC tissues, and the survival rate of patients with high FABP1 expression is lower. FABP1 is expected to become a promising marker for early diagnosis and targeted therapy of GC.


FABP1:Fatty acid-binding protein 1
OLFM4:Olfactomedin 4
TFF3:Trefoil factor3
CHGA:Chromogranin A
SRGN:Proteoglycan serglycin
CCL5:C-C chemokine ligand 5
KRT7:Keratin 7.

Data Availability

The data used to support the findings of this study are available from the corresponding authors upon request.

Conflicts of Interest

All the authors declare no conflicts of interest.

Authors’ Contributions

Ling Huang and Fan Yang participated in the research design. Fan Yang, Lianfang Gan, Junhua Pan, Yaying Chen, Hong Zhang, and Ling Huang conducted experiments. Fan Yang and Lianfang Gan performed data analysis. Fan Yang, Lianfang Gan, Junhua Pan, Yaying Chen, Hong Zhang, and Ling Huang wrote or contributed to the writing of the manuscript. Fan Yang and Lianfang Gan contributed equally to this work.


This work was supported by the National Natural Science Foundation of China (grant numbers 82060678 and 81760674).