BioMed Research International

BioMed Research International / 2021 / Article
Special Issue

Prognostic Assessment and Management of Liver Cirrhosis 2020

View this Special Issue

Research Article | Open Access

Volume 2021 |Article ID 1093702 | https://doi.org/10.1155/2021/1093702

Zhe Yu, Xuemei Ma, Wei Zhang, Xiujuan Chang, Linjing An, Ming Niu, Yan Chen, Chao Sun, Yongping Yang, "Microarray Data Mining and Preliminary Bioinformatics Analysis of Hepatitis D Virus-Associated Hepatocellular Carcinoma", BioMed Research International, vol. 2021, Article ID 1093702, 18 pages, 2021. https://doi.org/10.1155/2021/1093702

Microarray Data Mining and Preliminary Bioinformatics Analysis of Hepatitis D Virus-Associated Hepatocellular Carcinoma

Academic Editor: Xingshun Qi
Received18 May 2020
Revised04 Jun 2020
Accepted19 Jan 2021
Published30 Jan 2021

Abstract

Several studies have demonstrated that chronic hepatitis delta virus (HDV) infection is associated with a worsening of hepatitis B virus (HBV) infection and increased risk of hepatocellular carcinoma (HCC). However, there is limited data on the role of HDV in the oncogenesis of HCC. This study is aimed at assessing the potential mechanisms of HDV-associated hepatocarcinogenesis, especially to screen and identify key genes and pathways possibly involved in the pathogenesis of HCC. We selected three microarray datasets: GSE55092 contains 39 cancer specimens and 81 paracancer specimens from 11 HBV-associated HCC patients, GSE98383 contains 11 cancer specimens and 24 paracancer specimens from 5 HDV-associated HCC patients, and 371 HCC patients with the RNA-sequencing data combined with their clinical data from the Cancer Genome Atlas (TCGA). Afterwards, 948 differentially expressed genes (DEGs) closely related to HDV-associated HCC were obtained using the R package and filtering with a Venn diagram. We then performed gene ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis to determine the biological processes (BP), cellular component (CC), molecular function (MF), and KEGG signaling pathways most enriched for DEGs. Additionally, we performed Weighted Gene Coexpression Network Analysis (WGCNA) and protein-to-protein interaction (PPI) network construction with 948 DEGs, from which one module was identified by WGCNA and three modules were identified by the PPI network. Subsequently, we validated the expression of 52 hub genes from the PPI network with an independent set of HCC dataset stored in the Gene Expression Profiling Interactive Analysis (GEPIA) database. Finally, seven potential key genes were identified by intersecting with key modules from WGCNA, including 3 reported genes, namely, CDCA5, CENPH, and MCM7, and 4 novel genes, namely, CDC6, CDC45, CDCA8, and MCM4, which are associated with nucleoplasm, cell cycle, DNA replication, and mitotic cell cycle. The CDCA8 and stage of HCC were the independent factors associated with overall survival of HDV-associated HCC. All the related findings of these genes can help gain a better understanding of the role of HDV in the underlying mechanism of HCC carcinogenesis.

1. Introduction

Hepatocellular carcinoma (HCC) is the sixth most commonly diagnosed cancer and the fourth leading cause of cancer-related mortality globally [1, 2] and the second in China [3]. More than 80% of all HCC causes are associated with infection with hepatitis B virus (HBV), hepatitis C virus (HCV), and hepatitis delta virus (HDV) [4]. Approximately 292 million people worldwide are chronically infected with HBV, which causes liver injury that can progress to cirrhosis, resulting in HCC, liver failure, and eventually death [5, 6]. The HDV is known as the satellite of HBV and affects 15–20 million people in the world [7]. Concurrent HBV and HDV infections significantly increase both the incidence and mortality of HCC among patients with chronic hepatitis B (CHB) [8]. HDV is a kind of defective RNA virus which uses HBV envelope protein for successful spread in hepatocytes [9]. Although the risk of HCC is thought to be higher when a HBV-infected patient is superinfected with HDV, the molecular mechanisms of carcinogenesis remain unclear [10]. Chronic hepatitis D (CHD) is more severe than any other type of hepatitis, but its carcinogenesis mechanism remains poorly understood. Additionally, it has been found that the intrahepatic HBV DNA levels in patients with HDV-associated HCC and non-HCC cirrhosis are significantly reduced [11]. This phenomenon of HDV-mediated inhibition of HBV replication suggests that the effects of HDV are mediated through a unique molecular mechanism. While HBV and HCV are both included in International Agency for Research on Cancer (IARC) group 1 (high evidence of carcinogenicity to humans), HDV was assigned several years ago to group 3 (not sufficient evidence of carcinogenicity) [12], due to inadequacy to support the contribution of HDV to HBV-induced HCC. Due to the dependency of HDV on HBV, there are still controversies regarding the increased risk of HCC development in chronically HDV-infected patients [4], and the available data on the particular mechanism by which HDV contributes to HCC are sparse. With the development of genomics and other “-omics” disciplines, substantial omics data from HCC specimens have been accumulated [13]. Therefore, researchers have taken advantage of gene ontology (GO) and signal pathway analysis tools to identify and characterize many differentially expressed genes (DEGs).

In the present study, the GSE55092 and GSE98383 mRNA expression profile datasets were retrieved from Gene Expression Omnibus (GEO) online [14, 15]. And the RNA-sequencing data of 371 HCC patients and their clinical data were from the Cancer Genome Atlas (TCGA). We performed DEG analysis between cancerous specimens of HBV-associated HCC and HDV-associated HCC patients and their respective paracancerous specimens using Linear Models for Microarray Data (LIMMA) [16] and other packages implemented in R/Bioconductor [17]. We aimed to investigate potential HDV carcinogenesis mechanisms by PPI network, Gene Expression Profiling Interactive Analysis (GEPIA), and Weighted Gene Coexpression Network Analysis (WGCNA), particularly to screen and identify key genes and pathways to determine their possible role in HCC pathogenesis, and to help determine their mechanism of inhibition of HBV replication and their effects in diagnosis, treatment and prognosis.

2. Methods and Materials

2.1. Acquisition of Data and Preprocessing

The RNA-sequencing data and clinical data of 371 HCC patients were downloaded from TCGA (http://cancergenome.nih.gov/. http://cancergenome.nih.gov/). The expression of genes was represented by fragments per kilobase of exon per million fragments mapped (FPKM). Microarray data were available at the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) database. The inclusion criteria for selection GEO datasets in this study were as follows: (1) hepatocellular carcinoma containing cancer and paracancer tissue; (2) HBsAg positive at least 6 months with serum HBV DNA positive; (3) anti-HDAg positive with serum HDV RNA positive (applies only to filter HDV-associated HCC dataset); and (4) sample size more than 10 with data unbiased. The exclusion criteria were as follows: (1) the second liver cancer, (2) HCV-related HCC, (3) dataset biased, and (4) no paracancer tissue and carcinoma tissue present at the same patients. We searched the GEO database using “Hepatitis D Virus,” “Hepatocellular Carcinoma,” and “Homo sapiens” as keywords, and there were only 2 results, between which only GSE98383 met our criteria for HDV-associated HCC. Similarly, we searched the keywords “Hepatitis B Virus,” “Hepatocellular Carcinoma,” and “Homo sapiens” and obtained 581 search results. The flow chart of screening HBV-associated HCC could be seen in Figure 1, and only GSE55092 was suited for an in-depth study.

Microarray data GSE55092 [18] and GSE98383 [11] were generated using the GPL570 Affymetrix HG-U133 Plus 2.0 Array platform, and we performed the analysis on the data from whole liver tissue. GSE55092 contains 39 cancer specimens and 81 paracancer specimens from 11 HBV-associated HCC patients ( years; 10 male patients and 1 female patient). GSE98383 contains 11 cancer specimens and 24 paracancer specimens from 5 HDV-associated HCC patients ( years, 5 male patients). The comparison of baseline characteristics between patients with HBV-associated HCC and HDV-associated HCC is shown in Table 1.


HBV-associated HCCHDV-associated HCC

Patients115
Age (years old)0.849
Male (%)10 [90.9]5 [100]1.000
ALT (U/L)0.000
AST (U/L)0.001
GGT (U/L)0.917
PT (INR)0.001
TB (mg/dL)0.005
PLT (103/mL)0.000
Liver pathology
Activity grade0.034
Fibrosis stage0.226
F5/F695
Tumor grade0.107
G271
G334
G410
Tumor size0.407
<2 cm40
≥2 and ≤3 cm43
>3 cm32
Serum HDV RNA positive, no.05
Serum HBV DNA positive, no.115

ALT: alanine aminotransferase; AST: aspartate aminotransferase; GGT: γ-glutamyl transferase; TB: total bilirubin; PT: prothrombin time; PLT: Platelets.

We downloaded the GSE55092 and GSE98383 datasets, normalized them by the Affy package of the R Bioconductor, and then converted the gene expression profile at the probe level into gene symbol level and removed the duplicated symbols. When numerous probes were mapped to one gene, the average value was defined as the expression level of that gene. According to the description of the uploader, an unsupervised multidimensional scaling (MDS) of all specimens obtained from HBV-associated HCC patients showed a clear separation between two distinct clusters that corresponded to cancer areas and paracancerous areas. A similar separation between two clusters that corresponded to cancerous areas and paracancerous areas was observed for the specimens from HDV-associated HCC patients. Therefore, all the data are available for the identification of DEGs.

2.2. Identification of DEGs

DEG analysis refers to the identification of genes with significantly different expression levels between two groups through multiple analysis modes [19]. We performed differential expression analysis using Bayes -statistics from the LIMMA implemented in the R Bioconductor and corrected values for multiple testing using the Benjamini-Hochberg method [20]. We identified the DEGs in primary cancerous specimens of HCC patients by comparing them with paracancerous normal specimens of the same HCC patients. The absolute value of log2-fold change (FC) was set to ≥1.0, and a value of <0.01 was used as the significance criteria, and genes that met these criteria were used for further analysis. The final step is to use Venn diagrams to identify DEGs closely related to HDV-associated HCC [21].

2.3. GO Enrichment and Pathway Analysis

DAVID program (https://david.ncifcrf.gov/) [22] is a bioinformatics resource comprising a biological database and a set of annotation and analytical tool that intuitively integrates functional genomic annotations with graphics. In this study, the DEGs were submitted to DAVID for GO [23] and KEGG [24] enrichment analyses, which included biological process (BP), cellular component (CC), molecular function (MF), and related biological metabolic pathways. A value < 0.05 was considered statistically significant.

2.4. Weighted Gene Coexpression Network Construction and Module-Clinical Characteristic Associations

We compared DEGs with the genes of 371 HCC patients downloaded from TCGA website. Expression data of the matched genes from TCGA were applied to find gene modules significantly associated with clinical trait (stage of HCC) by WGCNA [25]. In this analysis mode, the soft thresholding acts as the lowest power based on the criterion of approximate scale-free topology [26], and analogous modules would be merged together due to the similarity. The heatmap of module-clinical characteristic relationship could reveal modules significantly associated with clinical characteristics.

2.5. Construction of PPI Networks and Module Analysis

The visual protein-to-protein interaction (PPI) networks of DEGs were predicted using the web resource Search Tool for the Retrieval of Interacting Genes (STRING) [27] to search the STRING database (https://string-db.org/), which contains over 5,000 organisms as well as their over 24.6 million proteins and over 2 billion interactions. We correlated the target DEGs with the STRING database and set the significant threshold to the highest confidence level (interaction ). Subsequently, we used Cytoscape [28], a software to construct PPI networks and analyze highly interconnected modules using the built-in Molecular Complex Detection (MCODE) clustering algorithm. The parameters were set by default except for the -core value which was equal to 8.

2.6. Validation of Module Gene

First, we uploaded the potential genes identified by PPI-network analysis to GEPIA [29] (http://gepia.cancer-pku.cn/, an online server containing TCGA/GTEx datasets) to validate the gene expression consistency between the microarray datasets (GSE55092 and GSE98383) and TCGA/GTEx HCC dataset, setting the threshold parameters as follows: |log2FC| and value cutoff < 0.01. Afterwards, we performed the overall survival analysis as follows: we divided the patients in TCGA/GTEx dataset into high and low expression groups with the TPM (transcripts per kilobase million) midvalue as a breakpoint; a log-rank test was used to determine significance at . Finally, we took the intersection of related genes to the OS of HCC by GEPIA and genes contained in the hub module obtained by WGCNA and got the key genes.

2.7. Univariate and Multivariate Cox Proportional Hazards Model Analysis

We performed univariate and multivariate Cox proportional hazards model analysis in patients from TCGA, to find independent factors associated with the overall survival of HCC.

3. Result

3.1. Comparison of Baseline Characteristics

GSE98383 was the only dataset of HDV-associated HCC that met our criteria, as to the screen of HBV-associated HCC dataset, overall 581 datasets were enrolled and screened for eligibility and 3 datasets met inclusion criteria. Of these 3 datasets, the data processing platform of GSE55092 was GPL570, which was the same with GSE98383, while GSE22058 and GSE94660 were different, so we took GSE55092 to stand for HBV-associated HCC for study. The number of datasets and reasons for exclusion are shown in Figure 1. The HDV-associated HCC patients in GSE98383 had higher levels of alanine aminotransferase (ALT), aspartate aminotransferase (AST), total bilirubin (TB), prothrombin time (PT), platelet counts (PLT), and inflammatory activity grade than the HBV-associated HCC patients in GSE55092, which is consistent with the characteristics of CHD of the most severe hepatitis. On the other hand, there was no significant difference in sex, age, tumor grade, and tumor size between the two groups.

3.2. Identification of DEGs

We identified DEGs from the microarray GSE55092 and GSE98383 datasets using the LIMMA package, setting |log2-FC| to ≥1.0 and adjusted value to <0.01 as the criteria. By comparing the cancerous and paracancerous specimens in GSE55092 up to 1,375, DEGs were identified, comprising 518 upregulated and 857 downregulated genes (Table S1). A similar comparison in GSE98383 contains 1,605 DEGs, including 592 upregulated and 1,013 downregulated genes (Table S1). Volcano plots of the GSE55092 and GSE98383 microarrays are shown in Figures 2(a) and 2(b), respectively. We used Venn diagrams to determine the DEGs closely related to HDV-associated HCC (Figure 2(c)), and 948 DEGs (373 upregulated and 582 downregulated) were identified and selected for further analysis (Table S2).

3.3. GO Enrichment and Pathway Analysis

In order to further screen HDV-associated HCC potential target genes among these DEGs, GO and pathway analysis were performed on HDV-associated HCC DEGs using value < 0.05 as the threshold (Figure 2(d)). The results are presented in Figure 2(d) and show the TOP-7 GO terms (BP, CC, and MF) and KEGG pathway terms significantly enriched in the DEGs. Additionally, the TOP-5 annotations of the DEGs are shown in Table 2. In the BP series, the DEGs were mostly enriched for genes related to the cellular response to chemical stimulus and organic substance, defense response, and cell adhesion. In the CC series, the DEGs were primarily enriched for genes involved in cell surface, side of membrane, membrane-bounded vesicle, external side of plasma membrane, and proteinaceous extracellular matrix. In the MF series, the DEGs were predominantly enriched for genes associated with glycoprotein binding, molecular function regulator, cytokine binding, heparin binding, and glycosaminoglycan binding. In the KEGG pathway series, the enrichment for DEGs was mainly in the chemokine signaling pathway, Staphylococcus aureus infection, transcriptional misregulation in cancer, focal adhesion, and leukocyte transendothelial migration.


CategoryTermCount value

GOTERM_BP_FATGO:0070887~cellular response to chemical stimulus208
GOTERM_BP_FATGO:0071310~cellular response to organic substance180
GOTERM_BP_FATGO:0010033~response to organic substance210
GOTERM_BP_FATGO:0006952~defense response129
GOTERM_BP_FATGO:0007155~cell adhesion139
GOTERM_CC_FATGO:0009986~cell surface69
GOTERM_CC_FATGO:0098552~side of membrane43
GOTERM_CC_FATGO:0031988~membrane-bounded vesicle225
GOTERM_CC_FATGO:0005578~proteinaceous extracellular matrix37
GOTERM_CC_FATGO:0009897~external side of plasma membrane28
GOTERM_MF_FATGO:0001948~glycoprotein binding16
GOTERM_MF_FATGO:0098772~molecular function regulator97
GOTERM_MF_FATGO:0019955~cytokine binding15
GOTERM_MF_FATGO:0008201~heparin binding20
GOTERM_MF_FATGO:0005539~glycosaminoglycan binding23
KEGG_PATHWAYhsa04062: Chemokine signaling pathway25
KEGG_PATHWAYhsa05150: Staphylococcus aureus infection12
KEGG_PATHWAYhsa05202: Transcriptional misregulation in cancer22
KEGG_PATHWAYhsa04510: Focal adhesion25
KEGG_PATHWAYhsa04670: Leukocyte transendothelial migration17

3.4. Weighted Gene Coexpression Network Construction and Module-Clinical Characteristics

We compared 948 DEGs with the genes of 371 samples downloaded from TCGA datasets, matched a total of 883 genes, and performed WGCNA. As shown in Figures 3(a), 3(b), and 3(c), the soft thresholding power was set to 3, and MEblue and MEred were merged together due to the similarity (the ). Then, we found five coexpressed gene modules. The MEturquoise contained the most DEGs with the number of 320. The five modules and contents are stored in supplementary material Table S3. Next, we further analyzed these modules with clinical characteristics (sex, event, OS, stage, grade, and age). Obviously, the MEturquoise module was significantly associated with event (correlation coefficients , ), OS (, ), stage (, ), grade (, ), and age (, ) (Figure 3(d)).

3.5. Construction of PPI Networks and Gene Module Analysis

We uploaded the DEGs onto the STRING online tool and analyzed them with the Cytoscape software. We then selected 353 nodes and 939 edges with the highest confidence () to construct the PPI networks (Figure 4(a)). Then, the MCODE plugin filtered out three important gene modules. Genes within module 1 and module 2 are comprised of downregulated genes, while module 3 is comprised of upregulated genes, except for PPP2R5C and LONRF1. Module 1 contains 15 nodes and 105 edges (Figure 4(b)), which are mainly related to G protein-coupled receptor signaling pathway (BP), plasma membrane (CC), G protein-coupled receptor binding (MF), and chemokine signaling pathway (KEGG) (Table 3). Module 2 contains 12 nodes and 66 edges (Figure 4(c)), which are primarily related to type I interferon signaling pathway (BP), cytosol (CC), 2-5-oligoadenylate synthetase activity (MF), and hepatitis C (KEGG) (Table 4). Module 3 contains 25 nodes and 94 edges (Figure 4(d)), which are predominantly related to the mitotic cell cycle process (BP), chromosomal part (CC), DNA replication origin binding (MF), and cell cycle (KEGG) (Table 5).


CategoryTermCount valueGenes

GOTERM_BP_FATG protein-coupled receptor signaling pathway14ADCY7, ADRA2A, CCL21, CCL4, CCR2, CCR7, CXCL16, CXCL5, CXCR4, FPR1, GNG2, P2RY12, P2RY14, PNOC
GOTERM_BP_FATCell chemotaxis8CCL21, CCL4, CCR2, CCR7, CXCL16, CXCL5, CXCR4, FPR1
GOTERM_BP_FATChemokine-mediated signaling pathway6CCL21, CCL4, CCR2, CCR7, CXCL5, CXCR4
GOTERM_CC_FATPlasma membrane110.0205ADCY7, ADRA2A, CCR2, CCR7, CXCL16, CXCR4, FPR1, GNG2, P2RY12, P2RY14, PNOC
GOTERM_CC_FATExternal side of plasma membrane30.0205CCR2, CCR7, P2RY12
GOTERM_CC_FATSide of membrane40.0205CCR2, CCR7, GNG2, P2RY12
GOTERM_MF_FATG protein-coupled receptor binding7ADRA2A, CCL21, CCL4, CCR2, CXCL16, CXCL5, PNOC
GOTERM_MF_FATChemokine receptor binding5CCL21, CCL4, CCR2, CXCL16, CXCL5
GOTERM_MF_FATChemokine activity4CCL21, CCL4, CXCL16, CXCL5
KEGG_PATHWAYChemokine signaling pathway10ADCY7, CCL21, CCL4, CCR2, CCR7, CXCL16, CXCL5, CXCR4, GNB4, GNG2
KEGG_PATHWAYCytokine-cytokine receptor interaction7CCL21, CCL4, CCR2, CCR7, CXCL16, CXCL5, CXCR4
KEGG_PATHWAYCircadian entrainment30.00092ADCY7, GNB4, GNG2


CategoryTermCount valueGenes

GOTERM_BP_FATType I interferon signaling pathway12BST2, IFI27, IFI6, IFIT1, IRF7, IRF9, ISG15, MX1, MX2, OAS1, OAS2, XAF1
GOTERM_BP_FATDefense response to virus9BST2, IFIT1, IRF7, IRF9, ISG15, MX1, MX2, OAS1, OAS2
GOTERM_BP_FATNegative regulation of viral genome replication5BST2, IFIT1, ISG15, MX1, OAS1
GOTERM_CC_FATCytosol100.004BST2, IFIT1, IRF7, IRF9, ISG15, MX1, MX2, OAS1, OAS2, XAF1
GOTERM_CC_FATMitochondrion60.0067IFI27, IFI6, MX1, MX2, OAS1, XAF1
GOTERM_CC_FATCytoplasmic part120.0067BST2, IFI27, IFI6, IFIT1, IRF7, IRF9, ISG15, MX1, MX2, OAS1, OAS2, XAF1
GOTERM_MF_FAT2-5-Oligoadenylate synthetase activity20.00028OAS1, OAS2
GOTERM_MF_FATDouble-stranded RNA binding20.0229OAS1, OAS2
KEGG_PATHWAYHepatitis C5IFIT1, IRF7, IRF9, OAS1, OAS2
KEGG_PATHWAYMeasles5IRF7, IRF9, MX1, OAS1, OAS2
KEGG_PATHWAYInfluenza A5IRF7, IRF9, MX1, OAS1, OAS2


CategoryTermCount valueGenes

GOTERM_BP_FATMitotic cell cycle process16CDC45, CDC6, CDCA5, CDCA8, CENPE, DBF4, ESPL1, KIF18A, MCM10, MCM4, MCM7, ORC6, POLE2, PPP2R5C, SKP2, ZWILCH
GOTERM_BP_FATCell cycle17CDC45, CDC6, CDCA5, CDCA8, CENPE, DBF4, ESPL1, KIF18A, KLHL13, MCM10, MCM4, MCM7, ORC6, POLE2, PPP2R5C, SKP2, ZWILCH
GOTERM_BP_FATG1/S transition of mitotic cell cycle9CDC45, CDC6, DBF4, MCM10, MCM4, MCM7, ORC6, POLE2, SKP2
GOTERM_CC_FATChromosomal part13CDC45, CDCA5, CDCA8, CENPE, CENPH, CENPI, KIF18A, MCM10, MCM7, ORC6, POLE2, PPP2R5C, ZWILCH
GOTERM_CC_FATChromosome, centromeric region8CDCA5, CDCA8, CENPE, CENPH, CENPI, KIF18A, PPP2R5C, ZWILCH
GOTERM_CC_FATIntracellular nonmembrane-bounded organelle18CDC45, CDC6, CDCA5, CDCA8, CENPE, CENPH, CENPI, ESPL1, KCTD6, KIF18A, MCM10, MCM7, ORC6, POLE2, PPP2R5C, RNF213, SKP2, ZWILCH
GOTERM_MF_FATDNA replication origin binding30.00011CDC45, MCM10, ORC6
GOTERM_MF_FATSingle-stranded DNA binding40.00037CDC45, MCM10, MCM4, MCM7
GOTERM_MF_FATDNA helicase activity30.00069CDC45, MCM4, MCM7
KEGG_PATHWAYCell cycle8CDC45, CDC6, DBF4, ESPL1, MCM4, MCM7, ORC6, SKP2
KEGG_PATHWAYDNA replication30.00022MCM4, MCM7, POLE2
KEGG_PATHWAYUbiquitin-mediated proteolysis40.00024KLHL13, SKP2, TCEB1, TRIM37

3.6. Validation of Module Genes

We compared the gene expression changes between the three module genes of HDV-associated HCC (a total of 52 genes) and the validation HCC (TCGA/GTEx) datasets in the GEPIA website to verify whether their expression in both datasets is consistent. We noticed that CCL21 and FPR1 in module 1 as well as XAF1 in module 2 were downregulated in tumors compared to normal specimens in the HCC datasets, which is in accordance with the HDV-associated HCC specimens. However, IFI6, IFI27, and ISG15 in module 2, which were downregulated in cancerous specimens compared to paracancerous normal specimens, were conversely expressed in HCC datasets. The genes in module 3, including CDC6, CDC45, CDCA5, CDCA8, CENPH, MCM4, MCM7, and TCEB1, were upregulated in tumor compared to normal specimens in the HCC datasets, which is consistent with the HDV-associated HCC patients. All the box plots comparing gene expression are shown in Figure 5(a). For further verification, 11 genes whose gene expression trends are consistent with HCC datasets were selected and used to conduct overall survival analysis. In Figure 5(b), there were the upregulated genes (CDC6, CDC45, CDCA5, CDCA8, CENPH, MCM4, MCM7, and TCEB1) which are associated with a lower survival rate in the high expression group than in the low expression group.

The reason for excluding CCL21, FPR1, IFI6, IFI27, ISG15, and XAF1 is that their value did not comply with the standards or the opposite gene expression. All the 8 retained potential genes are related to the nucleoplasm, and most of them are related to the mitotic cell cycle process, cell cycle, and DNA replication (Figure 5(c)). Taken together, the intersection of the above validated 8 genes and 320 genes in MEturquoise module by WGCNA, the potential 7 key genes (CDC6, CDC45, CDCA5, CDCA8, CENPH, MCM4, and MCM7) were found (Figure 5(d)).

3.7. Identification of Independent Factors of Overall Survival of HCC

We performed the univariate analysis in 371 patients from TCGA and found that CDCA8, stage, CDC45, CDC6, CDCA5, MCM4, CENPH, MCM7, sex, and age were significantly associated with OS of HCC. The multivariate Cox proportional hazards model showed that CDCA8 and stage of HCC were independent factors of OS of HCC (Table 6).


VariablesUnivariate analysisMultivariate analysis
HR95% CI valueHR95% CI value

CDCA81.111.08-1.15<0.0011.101.06-1.14<0.001
Stage I<0.0010.019
Stage II1.430.88-2.340.1511.230.75-2.020.416
Stage IIIA2.681.70-4.21<0.0012.111.31-3.390.002
Stage IIIB2.871.02-8.040.0451.840.62-5.440.273
CDC451.131.07-1.19<0.001
CDC61.101.05-1.15<0.001
CDCA51.091.05-1.13<0.001
MCM41.061.04-1.09<0.001
CENPH1.181.08-1.28<0.001
MCM71.011.00-1.010.029
Gender (F vs. M)0.820.57-1.160.257
Age (years)1.010.99-1.020.403

F: female; M: male.

4. Discussion

As the virus causing the most severe type of hepatitis, HDV affects 15-20 million people worldwide, but its specific pathogenic mechanism remains unclear. Accordingly, we undertook to find potential genes and pathways involved in the pathogenesis of this disease through text mining to help explain the underlying carcinogenic mechanism of HDV as well as the HBV inhibitory mechanism.

In this study, we compared cancerous and paracancerous specimens of patients suffering from HBV or HDV-associated HCC with the aim of identifying potential genes closely related to HDV-associated HCC. The study identified 373 upregulated DEGs and 582 downregulated DEGs. These DEGs were subjected to GO and KEGG annotation and enrichment analyses. In addition, we constructed PPI networks and sorted out 353 nodes with 939 edges, from which the three most significant modules were selected and 52 central nodes/genes were selected for validation using the GEPIA database. In the module confirmed by the WGCNA that was significantly associated with clinical features including event, OS, stage, grade, and age, only CDC6, CDC45, CDCA5, CDCA8, CENPH, MCM4, and MCM7 were consistent with genes identified by the PPI network, which were found to be significantly correlated with nucleoplasm, cell cycle, DNA replication, and mitotic cell cycle. Univariate and multivariate Cox proportional hazards model analysis showed the stage of HCC and CDCA8 are the independent factors associated with the OS of HCC.

Cell division cycle 6 (CDC6) is thought to be significantly associated with pancreatic cancer and colorectal cancer (CRC) [30]. It has a pivotal role in regulating the process of DNA replication as well as tumorigenesis; its overexpression could interfere with the expression of tumor suppressor genes (INK4/ARF) through the mechanism of epigenetic modification [31]. During the S phase of DNA replication in eukaryotic cells, cell division cycle 45 (CDC45) is an essential component of CMG (CDC45–MCM–GINS) helicase. CDC45 acts as a hubprotein, significantly upregulated in cancerous tissues from CRC and non-small-cell lung cancer (NSCLC) patients, and promotes tumor progression [32]. It is established that the MCM4/6/7 (minichromosome maintenance complex component 4/6/7) hexamer complex acts as a DNA helicase. Additionally, in endometrial cancer and skin cancer studies, it was found that MCM4 mutations may affect the interaction with MCM7, thereby disrupting the stability of the MCM4/6/7 complex [33]. In addition, MCM4 is also a member of significant predictors of poor prognosis in CRC patients [34]. Moreover, MCM7 is also a promising biomarker for early diagnosis of gastric cancer and even a predictor of meningioma recurrence after surgery [35, 36]. Another study found that high expression of MCM7 may be involved in the progression of HCC through the MCM7-cyclin D1 pathway, and MCM7 may serve as a prognostic marker for patients with HCC [37]. The cell division cycle-associated protein 5 (CDCA5) is a member of the CDCA family that comprises CDCA1-8. It plays a crucial role as a regulator of sister-chromatid cohesion and separation during cell division, and its upregulation has been shown to be associated with various cancers, including breast cancer, esophageal squamous cell carcinoma, CRC, and HCC [38, 39]. Also, a study found that the activation of the ERK and AKT pathways may be involved in the regulation of HCC cell proliferation by CDCA45 [39]. CDCA8 is an essential regulator of mitosis, and its overexpression is significantly associated with bladder cancer, cutaneous melanoma, and the progression and prognosis of breast cancer [40, 41]. Centromere protein H (CENPH) is considered to be an essential part of the active centromere complex, and its overexpression is highly related to poor prognosis in renal cell carcinoma, nasopharyngeal carcinoma, CRC, and HCC [42, 43]. Another study found that CENPH may promote the proliferation of HCC through the mitochondrial apoptosis pathway [43].

Most of the abovementioned genes are significantly associated with the cell cycle and DNA replication, and their overexpression may affect the replication of HBV DNA, thereby promoting the unique phenomenon of HDV inhibits HBV replication. All the findings related to these genes may also help us understand the mechanisms of HDV-induced liver injury and HCC. In the future, we will further verify those genes’ function by performing animals, cells, and clinical trials.

5. Conclusions

In summary, 7 potential candidate genes closely related to HDV-associated HCC were identified in this study. Through comparative analysis with previous studies, these genes were found to be involved in many pathways related to tumorigenesis which provided clues to elucidate the mechanism of hepatitis D virus-induced HCC or its unique molecular mechanism in the inhibition of HBV replication. However, additional in-depth molecular biological research on these candidate genes closely related to HDV-associated HCC is necessary to confirm their functions.

Data Availability

The datasets used to support the findings of this study are available from GEO (https://www.ncbi.nlm.nih.gov/geo/) and TCGA (https://portal.gdc.cancer.gov/).

Conflicts of Interest

The authors declare that they have no competing interests.

Authors’ Contributions

Zhe Yu, Xuemei Ma, and Wei Zhang contributed equally to this work.

Acknowledgments

This work was supported by the State Key Projects Specialized on Infectious Disease, Chinese Ministry of Science and Technology (2013ZX10005002 and 2018ZX10725506), and Beijing Key Research Project of Special Clinical Application (Z151100004015221). Thanks are due to Dr. Zhao’s team (Biomedical Official Wechat Account: SCIPhD) of ShengXinZhuShou for assistance.

Supplementary Materials

Table S1: DEGs from microarray datasets GSE55092 and GSE98383. Table S2: 948 DEGs related to HDV-associated HCC including 373 upregulated and 582 downregulated genes. Table S3: the five modules and contents obtained by WGCNA. (Supplementary Materials)

References

  1. F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, and A. Jemal, “Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” CA: a Cancer Journal for Clinicians, vol. 68, no. 6, pp. 394–424, 2018. View at: Publisher Site | Google Scholar
  2. A. Villanueva, “Hepatocellular carcinoma,” The New England Journal of Medicine, vol. 380, no. 15, pp. 1450–1462, 2019. View at: Publisher Site | Google Scholar
  3. W. Chen, R. Zheng, P. D. Baade et al., “Cancer statistics in China, 2015,” CA: a Cancer Journal for Clinicians, vol. 66, no. 2, pp. 115–132, 2016. View at: Publisher Site | Google Scholar
  4. R. Romeo, A. Petruzziello, E. I. Pecheur et al., “Hepatitis delta virus and hepatocellular carcinoma: an update,” Epidemiology and Infection, vol. 146, no. 13, pp. 1612–1618, 2018. View at: Publisher Site | Google Scholar
  5. W. K. Seto, Y. R. Lo, J. M. Pawlotsky, and M. F. Yuen, “Chronic hepatitis B virus infection,” Lancet, vol. 392, no. 10161, pp. 2313–2324, 2018. View at: Publisher Site | Google Scholar
  6. L. S. Y. Tang, E. Covert, E. Wilson, and S. Kottilil, “Chronic hepatitis B infection,” JAMA, vol. 319, no. 17, pp. 1802–1813, 2018. View at: Publisher Site | Google Scholar
  7. Y. Ni, Z. Zhang, L. Engelskircher et al., “Generation and characterization of a stable cell line persistently replicating and secreting the human hepatitis delta virus,” Scientific Reports, vol. 9, no. 1, p. 10021, 2019. View at: Publisher Site | Google Scholar
  8. G. Brancaccio, M. Fasano, A. Grossi, T. A. Santantonio, and G. B. Gaeta, “Clinical outcomes in patients with hepatitis D, cirrhosis and persistent hepatitis B virus replication, and receiving long-term tenofovir or entecavir,” Alimentary Pharmacology & Therapeutics, vol. 49, no. 8, pp. 1071–1076, 2019. View at: Publisher Site | Google Scholar
  9. M. T. Binh, N. X. Hoan, H. van Tong et al., “HDV infection rates in northern Vietnam,” Scientific Reports, vol. 8, no. 1, p. 8047, 2018. View at: Publisher Site | Google Scholar
  10. C. Sureau and F. Negro, “The hepatitis delta virus: replication and pathogenesis,” Journal of Hepatology, vol. 64, no. 1, pp. S102–S116, 2016. View at: Publisher Site | Google Scholar
  11. G. Diaz, R. E. Engle, A. Tice et al., “Molecular signature and mechanisms of hepatitis D virus-associated hepatocellular carcinoma,” Molecular Cancer Research, vol. 16, no. 9, pp. 1406–1419, 2018. View at: Publisher Site | Google Scholar
  12. Humans, I.W.G.o.t.E.o.C.R.t and I.A.f.R.o. Cancer, Hepatitis viruses, vol. 59, World Health Organization, 1994.
  13. H. Lee, B. T. Lau, and H. P. Ji, “Targeted sequencing strategies in cancer research,” in Next Generation Sequencing in Cancer Research, pp. 137–163, Springer, 2013. View at: Google Scholar
  14. T. Barrett, D. B. Troup, S. E. Wilhite et al., “NCBI GEO: archive for high-throughput functional genomic data,” Nucleic Acids Research, vol. 37, pp. D885–D890, 2009. View at: Publisher Site | Google Scholar
  15. R. Edgar, M. Domrachev, and A. E. Lash, “Gene expression omnibus: NCBI gene expression and hybridization array data repository,” Nucleic Acids Research, vol. 30, no. 1, pp. 207–210, 2002. View at: Publisher Site | Google Scholar
  16. G. K. Smyth, “limma: linear models for microarray data,” in Bioinformatics and computational biology solutions using R and Bioconductor, pp. 397–420, Springer, New York, NY, 2015. View at: Google Scholar
  17. R. B. Dessau and C. B. Pipper, ““R”--project for statistical computing,” Ugeskrift for Laeger, vol. 170, no. 5, pp. 328–330, 2008. View at: Google Scholar
  18. M. Melis, G. Diaz, D. E. Kleiner et al., “Viral expression and molecular profiling in liver tissue versus microdissected hepatocytes in hepatitis B virus-associated hepatocellular carcinoma,” Journal of Translational Medicine, vol. 12, no. 1, p. 230, 2014. View at: Publisher Site | Google Scholar
  19. E. Korpelainen, J. Tuimala, and P. Somervuo, RNA-Seq Data Analysis - a Practical Approach, Chapman and Hall/CRC, 2014.
  20. Y. Benjamini and Y. Hochberg, “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” Journal of the Royal statistical society: series B (Methodological), vol. 57, no. 1, pp. 289–300, 1995. View at: Google Scholar
  21. J. C. Oliveros, Venny. An interactive tool for comparing lists with Venn's diagrams, 2007, Available from: https://bioinfogp.cnb.csic.es/tools/venny/index.html.
  22. G. Dennis Jr., B. T. Sherman, D. A. Hosack et al., “DAVID: database for annotation, visualization, and integrated discovery,” Genome Biology, vol. 4, no. 5, 2003. View at: Publisher Site | Google Scholar
  23. M. Ashburner, C. A. Ball, J. A. Blake et al., “Gene Ontology: tool for the unification of biology,” Nature Genetics, vol. 25, no. 1, pp. 25–29, 2000. View at: Publisher Site | Google Scholar
  24. H. Ogata, S. Goto, K. Sato, W. Fujibuchi, H. Bono, and M. Kanehisa, “KEGG: Kyoto Encyclopedia of Genes and Genomes,” Nucleic Acids Research, vol. 27, no. 1, pp. 29–34, 1999. View at: Publisher Site | Google Scholar
  25. P. Langfelder and S. Horvath, “WGCNA: an R package for weighted correlation network analysis,” BMC Bioinformatics, vol. 9, no. 1, 2008. View at: Publisher Site | Google Scholar
  26. B. Zhang and S. Horvath, “A general framework for weighted gene co-expression network analysis,” Statistical Applications in Genetics and Molecular Biology, vol. 4, no. 1, 2005. View at: Publisher Site | Google Scholar
  27. A. Franceschini, D. Szklarczyk, S. Frankild et al., “STRING v9.1: protein-protein interaction networks, with increased coverage and integration,” Nucleic Acids Research, vol. 41, pp. D808–D815, 2013. View at: Publisher Site | Google Scholar
  28. M. Kohl, S. Wiese, and B. Warscheid, “Cytoscape: software for visualization and analysis of biological networks,” Methods in Molecular Biology, vol. 696, pp. 291–303, 2011. View at: Publisher Site | Google Scholar
  29. Z. Tang, C. Li, B. Kang, G. Gao, C. Li, and Z. Zhang, “GEPIA: a web server for cancer and normal gene expression profiling and interactive analyses,” Nucleic Acids Research, vol. 45, no. W1, pp. W98–W102, 2017. View at: Publisher Site | Google Scholar
  30. Y. Hu, L. Wang, Z. Li et al., “Potential prognostic and diagnostic values of CDC6, CDC45, ORC6 and SNHG7 in colorectal cancer,” Oncotargets and Therapy, vol. Volume 12, pp. 11609–11621, 2019. View at: Publisher Site | Google Scholar
  31. L. R. Borlado and J. Mendez, “CDC6: from DNA replication to cell cycle checkpoints and oncogenesis,” Carcinogenesis, vol. 29, no. 2, pp. 237–243, 2008. View at: Publisher Site | Google Scholar
  32. J. Huang, Y. Li, Z. Lu et al., “Analysis of functional hub genes identifies CDC45 as an oncogene in non-small cell lung cancer - a short report,” Cellular Oncology (Dordrecht), vol. 42, no. 4, pp. 571–578, 2019. View at: Publisher Site | Google Scholar
  33. R. Tatsumi and Y. Ishimi, “An MCM4 mutation detected in cancer cells affects MCM4/6/7 complex formation,” Journal of Biochemistry, vol. 161, no. 3, pp. 259–268, 2017. View at: Publisher Site | Google Scholar
  34. P. Ahluwalia, A. K. Mondal, C. Bloomer et al., “Identification and clinical validation of a novel 4 gene-signature with prognostic utility in colorectal cancer,” International Journal of Molecular Sciences, vol. 20, no. 15, p. 3818, 2019. View at: Publisher Site | Google Scholar
  35. T. L. Winther and S. H. Torp, “MCM7 expression is a promising predictor of recurrence in patients surgically resected for meningiomas,” Journal of Neuro-Oncology, vol. 131, no. 3, pp. 575–583, 2017. View at: Publisher Site | Google Scholar
  36. J. Y. Yang, D. Li, Y. Zhang et al., “The expression of MCM7 is a useful biomarker in the early diagnostic of gastric cancer,” Pathology Oncology Research, vol. 24, no. 2, pp. 367–372, 2018. View at: Publisher Site | Google Scholar
  37. K. Qu, Z. Wang, H. Fan et al., “MCM7 promotes cancer progression through cyclin D1-dependent signaling and serves as a prognostic marker for patients with hepatocellular carcinoma,” Cell Death & Disease, vol. 8, no. 2, 2017. View at: Publisher Site | Google Scholar
  38. N. N. Phan, C. Y. Wang, K. L. Li et al., “Distinct expression of CDCA3, CDCA5, and CDCA8 leads to shorter relapse free survival in breast cancer patient,” Oncotarget, vol. 9, no. 6, pp. 6977–6992, 2018. View at: Publisher Site | Google Scholar
  39. J. Wang, C. Xia, M. Pu et al., “Silencing of CDCA5 inhibits cancer progression and serves as a prognostic biomarker for hepatocellular carcinoma,” Oncology Reports, vol. 40, no. 4, pp. 1875–1884, 2018. View at: Google Scholar
  40. C. Ci, B. Tang, D. Lyu et al., “Overexpression of CDCA8 promotes the malignant progression of cutaneous melanoma and leads to poor prognosis,” International Journal of Molecular Medicine, vol. 43, no. 1, pp. 404–412, 2019. View at: Publisher Site | Google Scholar
  41. Y. Bi, S. Chen, J. Jiang et al., “CDCA8 expression and its clinical relevance in patients with bladder cancer,” Medicine (Baltimore), vol. 97, no. 34, 2018. View at: Publisher Site | Google Scholar
  42. W. Wu, F. Wu, Z. Wang et al., “CENPH inhibits rapamycin sensitivity by regulating GOLPH3-dependent mTOR signaling pathway in colorectal cancer,” Journal of Cancer, vol. 8, no. 12, pp. 2163–2172, 2017. View at: Publisher Site | Google Scholar
  43. G. Lu, H. Hou, X. Lu et al., “CENP-H regulates the cell growth of human hepatocellular carcinoma cells through the mitochondrial apoptotic pathway,” Oncology Reports, vol. 37, no. 6, pp. 3484–3492, 2017. View at: Publisher Site | Google Scholar

Copyright © 2021 Zhe Yu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder
Views295
Downloads540
Citations

Related articles

Article of the Year Award: Outstanding research contributions of 2020, as selected by our Chief Editors. Read the winning articles.