BioMed Research International

BioMed Research International / 2015 / Article

Research Article | Open Access

Volume 2015 |Article ID 391475 | 27 pages | https://doi.org/10.1155/2015/391475

Applying NGS Data to Find Evolutionary Network Biomarkers from the Early and Late Stages of Hepatocellular Carcinoma

Academic Editor: Zheng Li
Received27 Feb 2015
Revised23 Apr 2015
Accepted23 Apr 2015
Published20 Aug 2015

Abstract

Hepatocellular carcinoma (HCC) is a major liver tumor (~80%), besides hepatoblastomas, angiosarcomas, and cholangiocarcinomas. In this study, we used a systems biology approach to construct protein-protein interaction networks (PPINs) for early-stage and late-stage liver cancer. By comparing the networks of these two stages, we found that the two networks showed some common mechanisms and some significantly different mechanisms. To obtain differential network structures between cancer and noncancer PPINs, we constructed cancer PPIN and noncancer PPIN network structures for the two stages of liver cancer by systems biology method using NGS data from cancer cells and adjacent noncancer cells. Using carcinogenesis relevance values (CRVs), we identified 43 and 80 significant proteins and their PPINs (network markers) for early-stage and late-stage liver cancer. To investigate the evolution of network biomarkers in the carcinogenesis process, a primary pathway analysis showed that common pathways of the early and late stages were those related to ordinary cancer mechanisms. A pathway specific to the early stage was the mismatch repair pathway, while pathways specific to the late stage were the spliceosome pathway, lysine degradation pathway, and progesterone-mediated oocyte maturation pathway. This study provides a new direction for cancer-targeted therapies at different stages.

1. Introduction

Cancer is the leading cause of death worldwide, and its etiology occurs at the DNA, RNA, and protein levels. It is a very complex disease involving cascades of spatial and temporal changes in genetic networks and metabolic pathways [1]. Weinberg summarized the important cancer hallmarks [2, 3]. Cancer hallmarks can highlight important cancer mechanisms. In recent years, many systems biology approaches have been applied to study cancers [49]. We performed a series of studies on cancer using systems biology approaches. Recently, we have particularly focused on searching for network biomarkers of cancers [1016].

Hepatocellular carcinoma (HCC) is a major liver tumor (~80%), besides hepatoblastomas, angiosarcomas, and cholangiocarcinomas. Compared to other types of cancer, liver cancer is the third most deadly cancer globally and caused about 700,000 deaths in 2011 [17]. Due to the increasing annual incidences [18] and poor 5-year survival rate (~15%), diagnosis and prognosis of HCC are still important public health issues. Various studies revealed a multistep process involved in liver carcinogenesis [19]; however, the exact biology of HCC remains poorly understood overall [20]. In addition, late confirmation of the occurrence of HCC through traditional histological examinations and tumor sections contributes to the poor survival rate. To accurately diagnose the occurrence of HCC at an early time, it is necessary to understand the cellular and molecular mechanisms of HCC [20]. Hence, in this study, we compared the early and late stages of molecular interaction networks of HCC to reveal the underlying mechanisms of HCC development.

Although the histological and molecular features leading to HCC initiation are still poorly understood, mounting evidence suggests that a gradual accumulation of mutations and genetic changes in hepatocytes, which form the live lobule, may lead to the development of HCC [21]. Exposure to risk factors for liver cancer, such as obesity [22], alcohol addiction, aflatoxin, or hepatitis viruses [21], and the subsequent induction of inflammation may cause advanced hepatocyte necrosis. The recurrence of necrosis and regeneration of hepatocytes provide an opportunity for the introduction of mutations and genetic changes in hepatocytes. During these processes, alternative ways to protect liver lesions occur such as fibrosis and cirrhosis. Although a high correlation between cirrhosis and an HCC diagnosis was reported, the mechanisms of how cirrhosis transforms into HCC are still unknown. Furthermore, after a definite diagnosis of HCC, patients with early-stage liver cancer have better survival rates than those in the late stage. Although HCC’s management has substantially changed in the past few decades, the only systemic standard of care for patients with advanced HCC is sorafenib, a multikinase inhibitor, with a mean survival benefit of only 3 months [20, 23]. With advances in affordable, high-throughput technologies, increasing numbers of systems biology studies of HCC diagnoses [24] and treatments [25] have shed light on applications of systems biology to HCC diagnosis, prognosis, and therapy.

As for different etiologies and heterogenic genomic alterations of HCC, the systems biology methodology that integrates Omics data is suitable to develop accurate diagnoses, novel therapeutic targets, and efficient targeted therapies [26]. In this study, NGS data were used to construct protein-protein interaction (PPI) networks (PPINs) of the early and late stages of HCC. The network structure and protein associations of different stages of HCC were compared to obtain a set of significant proteins which play important roles in processes of the progression of HCC.

Chen et al. developed a dynamical network biomarker (DNB) that can serve as a general early-warning signal to indicate an imminent bifurcation or sudden deterioration before the critical transition occurs, which means that it can identify a predisease state using time series microarray data [2729]. We tried a different approach from their method and used sample microarray data from liver cancer patients at different stages. Our approach could also be extended to predict some similar results as their research. That is, in this study, we simply divided the cancer into early and late stages, but there are more stages of cancer, such as stages I, II, III, and IV. If we could observe the time evolution of cancer biomarkers at these other different stages, we could also predict the predisease state by comparing these cancer biomarkers at different stages.

We reveal the carcinogenesis process from early-stage and late-stage liver cancer. A specific pathway of the early stage was the mismatch repair pathway, while specific pathways of the late stage were the spliceosome pathway, lysine degradation pathway, and progesterone-mediated oocyte maturation pathway.

2. Materials and Methods

2.1. Overview of the Process for Constructing Liver Cancer Network Markers

We successfully used our methods to find core and specific network markers of four different kinds of cancer and the evolution of network markers from the early stage to late stage of liver cancer [15, 16]. A similar theoretical framework was employed in this study to find the evolution of network markers in the early and late stages of liver cancer. The theoretical systems method in this paper was developed from a previous study, but we used a new large dataset of NGS expression data which differ from traditional microarray expression data. A flowchart of the construction of network biomarkers for the early and late stages of liver cancer is shown in Figure 1. We know that a lot of NGS gene expression data have been generated in recent years. Some consider that they are more accurate than traditional microarray expression data. Another key point of this work was to make comparisons with our previous results, such as network biomarkers of liver cancer from microarray data and the evolution of network markers from the early and late stages of liver cancer. We combined two data sources: (1) NGS data of liver cancer and noncancer samples from the GEO database, among which cancer samples were divided into two groups of early-stage and late-stage liver cancer, and (2) the PPI database, which is required to construct PPINs for liver cancer. These data were used for PPI pool selection, and the selected PPIs and microarray data were then used for PPIN construction. Through regression modeling and the maximum-likelihood parameter estimation method, a cancer PPIN (CPPIN) and noncancer PPIN (NPPIN) were obtained. We also constructed a differential PPIN (DPPIN). The constructed CPPIN and NPPIN were compared to obtain a set of significant proteins for liver cancer based on the carcinogenesis relevance value (CRV) for each protein and a statistical assessment. Significant proteins and PPIs among these proteins were used to construct network markers for the early and late stages of liver cancer.

2.2. Data Selection and Preprocessing

Liver RNA-seq data were collected from liver HCC (LIHC) of The Cancer Genome Atlas (TCGA) with a batch number of 100. Normalized results, which consisted of reads per kilobase of exon per million mapped reads (RPKM) values, were used to represent gene expressions. The NGS gene expression dataset of liver cancer was obtained from TCGA database. The same dataset contained early-stage and late-stage liver cancer and noncancer samples. We only used data derived from nonprocessed primary biopsies to avoid discrepancies in gene expressions that are intrinsic to cell culture and fixation. Therefore, the dataset utilized contained primary tumor samples of both stages from patients and adjacent nontumor tissue samples from the same cancer patients, which were considered to be control samples. As shown in Table 1, 19 early-stage (stages I and II) and 18 late-stage (stages III and IV) liver HCC tumors and 24 adjacent normal tissue pairs were collected for further analysis. To describe the extent of a patient’s cancer, the cancers were classified into four stages according to their degree of invasion and migration using the TNM staging system, as defined by the American Joint Committee on Cancer (AJCC) and the International Union against Cancer (UICC). We then divided the cancer samples into two groups. In general, stages I and II described early-stage cancers that have higher curability rates with medical treatment, while stages III and IV described late stages. We also combined the early and late stages for a total stage. However, there were no corresponding noncancer samples in the surrounding area for each stage, and we had only one group of surrounding noncancer samples (Table 1). The reason why we had to take this step was that NGS samples are harder (and much more expensive) to obtain than microarray data, and the number of samples was small, which may have caused overfitting in the parameter estimation process of our model. So we used the total stage to identify a model to avoid overfitting. We built the CPPIN and NPPIN for early-stage, late-stage, and total-stage liver cancer in this study. We obtained 19 and 18 samples for the early-stage and late-stage liver cancer, respectively, and 24 noncancer samples. Prior to further analysis, the gene expression value, , was normalized to -transformed scores, , for each gene, , and then the resulting normalized expression value had a mean and a standard deviation for sample [30, 31].


Cancer typeTCGA dataset numberEarly stage sample#Late stage sample#Normal sample#Platform

Liver cancerBatch100
of LIHC
191824Illumina
HiSeq 2000

PPI data for Homo sapiens were extracted from the Biological General Repository for Interaction Database (BioGRID, downloaded in October 2012). The BioGRID is an open-access archive of genetic and protein interactions that are curated from the primary biomedical literature of all major model organisms. As of September 2012, BioGRID housed more than 500,000 manually annotated interactions from more than 30 model organisms [32]. The above two databases were mined for liver cancer and noncancer PPINs using their corresponding microarray data. These early-stage and late-stage liver cancer and noncancer PPINs were then compared to obtain network markers.

2.3. Selection of a Protein Pool and Identification of PPINs for Cancerous and Noncancerous Cells

To integrate gene expressions with PPI data so we could construct the corresponding CPPINs and NPPINs, we set up a protein pool containing differentially expressed proteins. Gene expression values were reasonably assumed to correlate with protein expression levels. We used a one-way analysis of variance (ANOVA) to analyze the expression of each protein and selected proteins with differential expression levels. This method allowed determination of significant differences between cancer and noncancer datasets. The null hypothesis (Ho) was based on the assumption that mean protein expression levels of cancer and noncancer sets were the same. A Bonferroni adjustment [33], a type of multiple testing, was used to detect and correct proteins with a discrepancy. Proteins with a value of <0.01 were included in the protein pool. However, any proteins in the protein pool which did not have PPI information were eliminated. In addition, proteins that were not already in the protein pool were included if their PPI information determined that they were closely associated with proteins already in the pool. As a result, the protein pool contained proteins that had certain differences in expression levels and proteins that had close relationships with the aforementioned proteins.

On the strength of the significant pool and PPI information, candidate PPINs for early-stage and late-stage liver cancer were constructed for liver cancer and noncancer tissues by linking proteins that interacted with each other. In other words, proteins that had PPI information through the pool were linked together, resulting in candidate PPINs.

As the candidate PPIN included all possible PPIs under various environments, different organisms, and experimental conditions, the candidate PPIN needed to be further confirmed by microarray data to identify appropriate PPIs according to the biological processes that are relevant to cancer. To remove false-positive PPIs from each candidate PPIN for different biological conditions, we used both a PPI model identification scheme and a model order detection method to prune each candidate PPIN using the corresponding microarray data to approach the actual PPIN. Here, the PPIs of a target protein in the candidate PPIN can be depicted by the following protein association model:where represents the expression level of the target protein for sample ; represents the expression level of the th protein interacting with target protein for sample ; denotes the association interaction ability between the target protein and its th interactive protein; represents the number of proteins interacting with the target protein ; and represents the stochastic noise due to other factors or model uncertainty. The biological meaning of (1) is that expression levels of target protein are associated with expression levels of proteins that interact with it. Consequently, a protein association (interaction) model for each protein in the protein pool can be built using (1).

After constructing (1) for the PPI model of each protein in the candidate PPIN, we used the maximum-likelihood estimation method [34] to identify association parameters in (1) using microarray data as follows (see Supplementary Materials S.1 available online at http://dx.doi.org/10.1155/2015/391475):where was identified using microarray data in accordance with the maximum-likelihood estimation method.

Once the association parameters for all proteins in the candidate PPIN were identified for each protein, significant protein associations were determined using the interaction model order detection method based on estimated association abilities, that is, to detect the interaction number in (2). The Akaike information criterion (AIC) [34] and Student’s -test [35] were used for both model order selection and significance determination of protein associations in (see Supplementary Materials S.2).

2.4. Determination of Significant Proteins and Their Network Structures in the Carcinogenesis of Liver Cancer

After the interaction number was determined using the AIC order detection and Student’s -test, spurious false-positive PPIs, , in (2) were pruned away, and only significant PPIs that remained were refined as follows:where denotes the number of significant PPIs of the PPIN, with the target protein . In other words, a number of ’s (or false positives) were pruned from the PPIs of target protein . One protein by one protein (i.e., for all proteins in the refined PPIN in (3)) resulted in the following refined PPIN:where the interaction matrix denotes the PPIs.

If there was no PPI between proteins and or it was pruned away by the AIC order detection due to insignificance in the refined PPIN, then . In general, , but if this was not the case, the larger one was chosen as to avoid the situation where . The above PPIN construction method was used to construct refined PPINs for each stage of liver cancer (early and late) and noncancer cells. The interaction matrices of the refined PPINs in (4) for cancer and noncancer cells of both the early and late stages of liver cancer were constructed, respectively, as follows:where is the early-stage and late-stage liver cancer; and denote interaction matrices of the refined PPIN of the th stage of liver cancer and noncancer, respectively; and is the number of proteins in the refined PPIN. Therefore, the protein association model for CPPIN and NPPIN in the th-stage liver cancer and noncancer samples can be represented by the following equations according to (4) and (5):where is early-stage and late-stage liver cancer, and denote vectors of expression levels, and and indicate noise vectors of the PPINs in the th stage liver cancer and noncancer cells, respectively.

The different matrix of the differential PPI network between CPPIN and NPPIN in the th stage liver cancer is defined as follows:where is early-stage and late-stage liver cancer, denotes the protein association ability difference between CPPIN and NPPIN in the th-stage liver cancer, and matrix indicates the difference in network structures between CPPIN and NPPIN in the th-stage liver cancer. In order to investigate carcinogenesis from the difference matrix, , between CPPIN and NPPIN of the th-stage liver cancer in (7), a score, which we called the carcinogenesis relevance value (CRV), was presented to quantify the correlation of each protein in with the significance of carcinogenesis as follows [30]:where and is early-stage and late-stage liver cancer.

The in (8) quantifies the differential extent of protein associations of the th protein (the absolute sum of the th row of in (7)) and the can differentiate CPPIN from NPPIN in th-stage liver cancer. In other words, the in (8) represents the network structure difference of the th protein between cancer and noncancer networks in the th-stage liver cancer.

In order to investigate what proteins are more likely involved in th-stage liver cancer, we needed to calculate the corresponding empirical value to determine the statistical significance of the . To determine the observed value of each , we repeatedly permuted the network structure of the candidate PPIN of the th-stage liver cancer as a random network of th-stage liver cancer. Each protein in the random network of the th-stage liver cancer had its own CRV to generate a distribution of for = early-stage and late-stage liver cancer. Although the network structure was randomly disarranged, linkages of each protein were maintained. In other words, proteins with which a particular protein interacted were permuted without changing the total number of protein interactions. This procedure was repeated 105 times, and the corresponding value was calculated as the fraction of the random network structure in which the was at least as large as the CRV of the real network structure. According to distributions of the of the random networks, the in (8) with a value of ≤0.01 was regarded as a significant CRV, and the corresponding protein was determined to be a significant protein in the carcinogenesis of the th-stage liver cancer: a protein with a value of >0.01 was removed from the list of significant proteins in carcinogenesis (in other words, if the value of was >0.01, then the th protein was removed from the in (8), and the remainder in the with values of CRVs of <0.01 was considered significant proteins of the th-stage liver cancer).

Based on values of the CRVs for all proteins () and the two stages of liver cancer ( = early-stage and late-stage liver cancer), we generated two lists of significant proteins for each of the two stages according to the CRV and a statistical assessment of each significant protein in the in (8). We found 152 significant proteins in the early-stage liver cancer and 50 significant proteins in the late-stage liver cancer. These proteins showed significant changes between the CPPIN and NPPIN in the carcinogenic process according to their corresponding stage of cancer, and we suspected that these changes might play important roles in the carcinogenesis process of liver cancer. These findings warrant further investigation.

Intersections of these significant proteins in the early and late stages of liver cancer and their PPIs are known as the core network markers appearing in all stages of liver cancer. In contrast, unique significant proteins and their PPIs in each stage of liver cancer are known as specific network markers for each stage of cancer. We found 18 significant proteins that could be classified as core network markers over the entire carcinogenesis process of liver cancer. We also found 134 significant proteins as specific network markers of early-stage liver cancer and 32 significant proteins as specific network markers of late-stage liver cancer.

2.5. Pathway Analysis

Much valuable cellular information can be found using known pathways, which are useful for describing most “normal” biological phenomena. All of these known pathways are the result of repeated testing and verification, and the entire pathway network has defined most links. Therefore, the proteins we identified to be significant in the above network markers were mapped onto known pathway networks (e.g., the Kyoto Encyclopedia of Genes and Genomes (KEGG) and PANTHER pathways) to investigate significant pathways with network markers and explore relationships between these pathways and the carcinogenesis of liver cancer. This approach supports the view that systems biology can help identify significant network biomarkers in both normal and cancerous pathways and their roles in the pathogenesis of cancer.

Together with comprehensive pathway databases such as the KEGG, we used a series of bioinformatics pathway analytical tools to identify biologically relevant pathway networks [36]. The KEGG includes manually curated biological pathways that cover three main categories: systems information (e.g., human diseases and drugs), genomics information (e.g., gene catalogs and sequence similarities), and chemical information (e.g., metabolites and biochemical reactions). At present, the KEGG contains 134,511 distinct pathways generated from 391 original reference pathways [37]. Therefore, to investigate pathways involved in carcinogenesis, the bioinformatics database, DAVID [38, 39], which generates automatic outputs of the results from a KEGG pathway analysis [38], was used for the pathway analysis of significant proteins identified as network markers to determine their roles in the pathogenesis of early- and late-stage liver cancer. Our methodology did not include a pathway analysis or gene set enrichment analysis (GSEA). To complete our research results, we used NOA software for the pathway analysis and GSEA of biological processes, cellular components, and molecular functions [40, 41].

2.6. Explore the Evolution of Network Biomarkers in the Liver Carcinogenesis via PPI Network through NGS Data

Our cancer PPI model was constructed from the differential expression of cancer and noncancer microarray data and data mining of PPI information from the BioGRID database. So, the early-stage and late-stage liver CPPINs and NPPINs were the results of our systems biology model using the original NGS data and PPI databases. There were two key factors that affected our final results.

(i) The Effect of Different Original PPI Databases. We know that PPI databases, such as the BioGRID and MIPS, are constructed from putative samples and validated by wet-lab experiments. Due to advances in many high-throughput experimental skills, the original PPI databases have evolved over time. The newly updated original PPI databases are the second factor to affect the final results.

(ii) The Effect of the Systems Biology Model. Microarray data, PPI databases, and the PPI interaction model in (1) were employed to construct PPI networks of normal and cancer cells by the maximum-likelihood parameter estimation method (see Supplementary Materials S.1). The AIC system order detection method (Supplementary Materials S.2) was used to prune false-positive PPIs to obtain actual PPI networks of normal and cancer cells; that is, we used the so-called reverse engineering method to construct PPI networks of normal and cancer cells. Then, the DPPIN between the CPPIN and NPPIN was obtained in (7) to investigate PPI variations of each protein in the DPPIN due to carcinogenesis. Finally, the CRV based on PPI variations was also proposed to evaluate the significance of carcinogenesis for each protein of the DPPIN. Proteins with a significant CRV () were considered significant proteins of the cancer. Significant proteins in Table 3 were these significant proteins of early-stage and late-stage liver cancer, and these proteins and their PPIs were used to construct the PPI network in Figure 2. Finally, from early-stage and late-stage liver cancer network markers, we investigated mechanisms of the carcinogenesis process with the help of databases (e.g., the GO database and the DAVID and KEGG pathway databases) to find multiple network targets for cancer therapy. Unlike conventional theoretical methods which always give a single mathematical model for a cancer network for a more-detailed theoretical analysis, this study introduced a systems biology approach to cancer network markers based on actual NGS data through so-called reverse engineering, a systems statistical method, and a data mining method in combination with large databases. These are the novelty and significance of our study. Although we described the novelty of our systems biology model, we have validated our results by a literature survey of research. In the future, our results can be validated by other researchers’ wet-lab experiments, and we will repeatedly modify our mathematical system model. This is the third key factor that affected our results. Although not directly, it also influenced the protein interaction network.

We also know that biosystems evolve with time. It is obvious that early-stage and late-stage patients have very different symptoms; these are key features we used to classify early-stage and late-stage liver cancer. Since liver cancer patients in the two stages have very different symptoms, the NGS data of these two stages of patients should undoubtedly greatly differ. As described above, protein expressions from NGS data are one of the key factors of our systems biology model producing the final CPPINs and NPPINs, and the CPPINs and NPPINs gave the final network biomarkers from our systems biology model. So, the most important thing for evolution of network biomarkers is evolution of the NGS data in both stages of liver cancer, which is inherent in the exhibition of cancer-related genes due to DNA mutations in the carcinogenesis process.

3. Results and Discussion

3.1. Evolution of Network Biomarkers in Early-Stage and Late-Stage Liver Cancer

We built the DPPIN to examine the early, late, and total stages of liver cancer (Figure 2). CRVs of each protein in the three networks were calculated. This figure shows more-detailed information than just the CRVs; that is, it shows the changes of edges and nodes at different stages of the network. By screening using values of CRVs, we found significant proteins of network markers for these three stages of liver cancer. Similar to our previous experience with bladder cancer [16], we wanted to reveal the carcinogenesis mechanisms of liver cancer in the early and late stages. Because we combined the early stage (stages I and II) and late stage (stages III and IV) into a total stage (stages I~IV), it is not surprising that most of the pathways associated with either the early or late stage overlapped those of the total stage. We had to perform this step because NGS samples are more difficult (and much more expensive) to obtain than microarray data; that is, there were few samples. This shortage may have caused overfitting in the parameter estimation process of our model in (2). So we used the total stage again to build a model to avoid overfitting. To summarize, we built a model using early, late, and total stages. Numbers of samples in these three stages were 19, 18, and 37, respectively (37 being the sum of 19 and 18). The reason for using the total stage is that 19 and 18 are small numbers of samples, and there was the chance that overfitting may have occurred. So we used the sum of the early and late stages (37) to see the entire picture between cancer and noncancer tissues.

3.2. Network Markers of Early, Late, and Total (Early Plus Late Stages) Stages of Liver Cancer

After value (0.01) screening, we found that there were 43, 80, and 74 significant proteins for the early, late, and total stages of liver cancer, respectively. In addition, their corresponding CRVs ranged 6.0~46.5, 6.1~34.1 and 6.5~81.5, respectively. These significant proteins and their PPIs were used to construct network markers for the early, late, and total stages of liver cancer. The intersection network of markers of the early and late stages was a core feature that contained 27 significant proteins associated with carcinogenesis. We list the 27 significant proteins and their corresponding CRVs and values in both stages of liver cancer in Table 2. In the second step, we separately list the top 20 significant proteins in the early and late stages of liver cancer in Table 3. The full list of the 43, 80, and 74 significant proteins for these three stages of liver cancer is given in Supplementary Tables S3. Table 4 shows the top 30 proteins of the total stage. Table 5 shows interactions with our previous results using microarray data.


Common network marker of early-stage and late-stage liver cancer
ProteinCRV valueCancer_AvgExpControl_AvgExp

APP46.520.00001 1372414041−0.03
APP13.870.00066 15083140410.1
ELAVL122.990.00033 172513190.39
ELAVL129.690.00003 179113190.44
KRTAP10-514.730.00098 110.04
KRTAP10-520.540.00022 110.06
H2AFX14.080.00105 6082711.17
H2AFX15.430.00053 8062711.58
CDK113.690.00108 335343.32
CDK110.310.00119 596344.15
ESR112.810.00119 2281537−2.76
ESR134.10.00000 2451537−2.65
EZH212.220.00127 310492.66
EZH219.050.00027 394493
CEP25012.050.00130 7162681.42
CEP25016.120.00045 6272681.23
AURKB11.410.00138 141143.35
AURKB7.680.00289 199143.85
CDC2010.890.00152 359204.17
CDC208.940.00168 688205.11
E2F110.540.00158 476433.46
E2F18.850.00174 673433.95
OTX110.290.00166 3124.27
OTX16.250.00811 2624.02
SUMO19.860.00181 24222982−0.3
SUMO18.630.00188 26242982−0.18
MCM49.760.00186 13083361.96
MCM46.760.00535 13273361.98
HGS7.960.00312 250411671.1
HGS13.90.00066 244111671.06
KPNA27.860.00323 19276891.48
KPNA210.430.00116 25336891.88
UBC7.850.00323 3438835490−0.05
UBC6.630.00591 36931354900.06
SPRY27.50.00376 409989−1.27
SPRY27.170.00399 328989−1.59
TOPBP17.330.00406 7993461.21
TOPBP16.450.00698 8753461.34
SIRT77.110.00456 4422250.98
SIRT7240.00012 4562251.02
WHSC16.670.00590 12174471.44
WHSC112.060.00086 13694471.61
MCM26.450.00666 10971742.66
MCM28.480.00200 13511742.96
AURKA6.40.00685 570862.73
AURKA9.250.00154 516862.58
COPS56.390.00691 150611840.35
COPS5100.00128 170811840.53
PCNA6.290.00749 16529040.87
PCNA15.950.00046 22909041.34
BUB1B6.270.00762 179123.89
BUB1B7.470.00327 320124.73
DNMT16.020.00920 13764551.6
DNMT110.630.00110 13834551.6

There are two rows with same proteins name while the upper row represents the early stage liver cancer and the lower raw represents the late stage liver cancer.
AvgExp means average expression.
means fold change.

ProteinCRV valueCase_AvgExpControl_AvgExp

Top 20 proteins of early-stage liver cancer
APP46.521372414041−0.03
KRTAP4-1233.4711−0.17
ELAVL122.990.000329172513190.39
KRTAP10-117.170.000756110.04
KRTAP10-514.730.000981110.04
H2AFX14.080.0010486082711.17
CDK113.690.001079335343.32
PRKDC12.910.001176401116841.25
CUL312.830.00118813761606−0.22
ESR112.810.0011942281537−2.76
EZH212.220.001274310492.66
CEP25012.050.0012987162681.42
AURKB11.410.001383141143.35
CDC2010.890.001523359204.17
E2F110.540.001584476433.46
OTX110.290.0016643124.27
C19orf6610.20.00168815513166−1.03
SUMO19.860.0018124222982−0.3
MCM49.760.00185913083361.96
GRB29.190.0022357226080.45

Top 20 proteins of late stage liver cancer
ESR134.12451537−2.65
ELAVL129.69179113190.44
UBD28.542092617813.55
YWHAZ27.311332364081.06
SIRT7240.0001224562251.02
HDAC522.070.00015216968401.01
KRTAP10-520.540.000224110.06
EZH219.050.00027394493
ILF218.860.000283411117371.24
CEP25016.120.0004526272681.23
PCNA15.950.0004622909041.34
SUMO215.770.00049330521120.65
H2AFX15.430.0005328062711.58
HSP90AB114.430.000630949146291.08
HGS13.90.000659244111671.06
APP13.870.00065915083140410.1
WHSC112.060.00086113694471.61
SETDB111.70.00091212034971.28
TRAF211.450.0009386652771.26
SMARCA411.180.000992269611151.27
SFN11.030.0010261008554.19

AvgExp means average expression.
means fold change.

ProteinCRV valueCase_AvgExpControl_AvgExp

APP81.53 <14385140410.03
ELAVL10.000187175713190.41
CCDC330.000464311.7
UBC25.190.00058835625354900.01
HIST3H324.30.000626110.02
KRTAP10-123.960.000643110.02
ESR123.790.0006522361537−2.7
UBD23.340.0006821470117813.04
H2AFX17.280.0011257052711.38
KRTAP10-517.050.001133110.05
HGS15.20.001329247311671.08
TRAF214.280.0014615812771.07
HSP90AB114.160.00148329756146291.02
PCNA13.580.00157219629041.12
SMARCA413.210.001611254611151.19
TAF612.840.00163610754781.17
SUMO212.60.001704296821120.49
UBQLN412.590.00171318819051.05
COPS512.450.001743160411840.44
CEP25012.410.0017476732681.33
EZH212.150.001802351492.84
CDK111.690.001892462343.78
TCF311.350.001999804151.24
CDC2011.280.002011519204.7
MCM210.90.00212612201742.81
GRB210.640.002233348826080.42
WHSC110.50.00228812914471.53
MYOD110.50.002292211.22
AURKB10.480.002305169143.61
SUMO110.360.00237325202982−0.24


ProteinCRV
(in this study)
value
(in this study)
CRV
(previous)
value
(previous)

BUB1B7.580.0058335.56960.00064
CDC2011.280.0020115.15070.00109
CDK26.90.00770814.069 <
CUL38.160.00447812.9519<
E2F110.190.0024973.99470.00862
ESR123.790.00065210.3758<
HDAC410.110.0025315.83970.00048
HGS15.20.0013294.69290.00232
MYC7.550.00588410.7821<
PCNA13.580.00157215.1438<
PRKDC8.190.0044185.93690.00041
SMARCA413.210.0016117.74490.0001
SUMO110.360.00237315.8533<
TRAF214.280.0014614.77030.00207
UBC25.190.000588137.284<

3.3. Pathway Analysis of the Total Stage of Liver Cancer

We first analyzed the pathway of the total stage of liver cancer using the David database. As stated above, the key point of this research was to find evolutionary mechanisms of liver cancer from the early and late stages, but the number of samples of NGS data was small, so we had to combine the early and late stages to see the overall picture of the liver cancer network. The five key pathways we were interested in, which were selected by these 74 key proteins, are listed as follows: (1) 14 proteins in hsa04110 were associated with the cell cycle (Figure 3(a)), (2) seven proteins in hsa04114 were associated with oocyte meiosis (Figure 3(b)), (3) five proteins in hsa03030 were associated with DNA replication (Figure 3(c)), (4) 10 proteins in hsa05200 were associated with cancer pathways (Figure 3(d)), and (5) four proteins in hsa04115 were associated with the p53 signaling pathway (Figure 3(e)). We only focused on those pathways related to liver cancer and eliminated unrelated ones, such as prostate cancer, glioma (brain cancer) pathway, and small cell lung cancer pathway (Table 6).

(a)

RankTermCountSymbol value

KEGG
1hsa04110:Cell cycle14E2F1, CDK1, PRKDC, CDC20, MCM2, MCM3, YWHAE, MCM4, CDK2, MCM7, GSK3B, PCNA, BUB1B, MYC
2hsa05215:Prostate cancer7E2F1, HSP90AB1, GRB2, GSK3B, MAPK3, PTEN, CDK2
3hsa04114:Oocyte meiosis7CDK1, PPP1CA, MAPK3, CDC20, AURKA, YWHAE, CDK2
4hsa03030:DNA replication5MCM7, PCNA, MCM2, MCM3, MCM4
5hsa05200:Pathways in cancer10E2F1, HSP90AB1, TRAF2, GRB2, MSH2, GSK3B, MAPK3, MYC, PTEN, CDK2
6hsa05213:Endometrial cancer5GRB2, GSK3B, MAPK3, MYC, PTEN
7hsa05210:Colorectal cancer5GRB2, MSH2, GSK3B, MAPK3, MYC0.002457
8hsa05222:Small cell lung cancer5E2F1, TRAF2, MYC, PTEN, CDK20.002457
9hsa05214:Glioma4E2F1, GRB2, MAPK3, PTEN0.008946
10hsa04722:Neurotrophin signaling pathway5GRB2, GSK3B, MAPK3, YWHAE, TP730.009832
11hsa04115:p53 signaling pathway4CDK1, PTEN, CDK2, TP730.011029
12hsa05220:Chronic myeloid leukemia4E2F1, GRB2, MAPK3, MYC0.014381
13hsa04914:Progesterone-mediated oocyte maturation4HSP90AB1, CDK1, MAPK3, CDK20.020705
14hsa04012:ErbB signaling pathway4GRB2, GSK3B, MAPK3, MYC0.021345
15hsa04540:Gap junction4CDK1, GRB2, MAPK3, TUBA1B0.022657
16hsa05219:Bladder cancer3E2F1, MAPK3, MYC0.033366
17hsa04510:Focal adhesion5PPP1CA, GRB2, GSK3B, MAPK3, PTEN0.047655
18hsa05223:Non-small cell lung cancer3E2F1, GRB2, MAPK30.052694
19hsa05221:Acute myeloid leukemia3GRB2, MAPK3, MYC0.059845
20hsa04910:Insulin signaling pathway4PPP1CA, GRB2, GSK3B, MAPK30.064636
21hsa05218:Melanoma3E2F1, MAPK3, PTEN0.085159
22hsa04662:B cell receptor signaling pathway3GRB2, GSK3B, MAPK30.09351

BioCarte
1ah_g1Pathway:Cell Cycle:G1/S Check Point 4E2F1, CDK1, GSK3B, CDK20.012616
2ah_p27Pathway:Regulation of p27 Phosphorylation during Cell Cycle Progression3E2F1, NEDD8, CDK20.020162
3ah_p53Pathway:p53 Signaling Pathway3E2F1, PCNA, CDK20.033691
4ah_her2Pathway:Role of ERBB2 in Signal Transduction and Oncology3GRB2, MAPK3, ESR10.049866
5ah_ptenPathway:PTEN dependent cell cycle arrest and apoptosis3GRB2, MAPK3, PTEN0.054277
6ah_RacCycDPathway:Influence of Ras and Rho proteins on G1 to S Transition3E2F1, MAPK3, CDK20.063496
7ah_cellcyclePathway:Cyclins and Cell Cycle Regulation3E2F1, CDK1, CDK20.068295

The significant pathways via DAVID Bioinformatics database are selected for the 74 significant proteins in carcinogenesis. Bold indicates value > 0.05. Most of them are from the KEGG database, and we add several key pathways from BioCarta.
(b)

GO:term valueCorrected valueTerm name

() Biological processes
GO:005132063579124S phase
GO:000008463579124S phase of mitotic cell cycle
GO:000626763579164Prereplicative complex assembly
GO:006500463579685Protein-DNA complex assembly
GO:000072763579264Double-strand break repair via break-induced replication
GO:000627063579284DNA-dependent DNA replication initiation
GO:002261663579314DNA strand elongation
GO:000627163579314DNA strand elongation involved in DNA replication
GO:000072463579384Double-strand break repair via homologous recombination
GO:0022402635794457Cell cycle process

() Cellular components
GO:00425556357964MCM complex
GO:000565663579164Prereplicative complex
GO:003126163579224DNA replication preinitiation complex
GO:003129863579254Replication fork protection complex
GO:003299363579464Protein-DNA complex
GO:00432346357913699Protein complex
GO:0044454635792175Nuclear chromosome part
GO:0044451635792755Nucleoplasm part
GO:00444286357912518Nuclear part
GO:00444270.0011635793025Chromosomal part

() Molecular functions
GO:004356663579856Structure-specific DNA binding
GO:0043138635791843′-5′ DNA helicase activity
GO:000400363579264ATP-dependent DNA helicase activity
GO:000368263579805Chromatin binding
GO:000368863579304DNA replication origin binding
GO:000367863579394DNA helicase activity
GO:000369763579444Single-stranded DNA binding
GO:0016887635792786ATPase activity
GO:004314063579133ATP-dependent 3′-5′ DNA helicase activity
GO:000809463579674DNA-dependent ATPase activity

: number of genes in reference set.
: number of genes in test set.
: number of genes annotated by given term in reference set.
: number of genes annotated by given term in test set.

(1) Cell Cycle. An abnormal cell cycle is highly related to cancer mechanisms. We described this in our previous study [15, 16]. More clues and mechanisms were revealed in this study, so we have to repeat and highlight the previous results while adding the new results. Dysregulation of the cell cycle governs deviant cell proliferation in cancer. Loss of the ability to control cell cycle checkpoints induces abnormal genetic instability. It is well known that the cell cycle is composed of two consecutive periods, characterized by DNA replication, and of sequential differential and segregation of replicated chromosomes into two separate daughter cells. This may be due to activation of tumorigenic mutations, which were recognized in various tumors at different levels in mitogenic signal transduction pathways: (1) ligands and receptors (receptor mutations of HER2/neu [ErB2] or amplification of the HER2 gene), (2) downstream signal transduction networks (Raf/Ras/MAPK or PI3K-AKT-mTOR), and (3) regulatory genes of the cell cycle (cyclin D1/CDK4, CDK6, and cyclin E/CDK2) [42]. There are many reported discussions regarding cell cycle regulators and checkpoint functions involved in HCC [4346]. Herein, 14 of the total stage network markers identified were involved in the cell cycle pathway. In this case, we found that five of these 14 markers were related to the DNA replication pathway. This shows that the cell cycle and DNA replication are highly related to liver cancer at the total stage.

Because the cell cycle is so crucial to cancer, we list other pathways given by BioCarta (Figures 3(e) and 3(f)). The former shows that these key proteins are related to the G1/S checkpoint, while the later shows that they are related to regulation of p27 phosphorylation during cell cycle progression.

Dituri et al. showed that changes in the cell cycle checkpoint frequently occur during HCC. They identified different pathways from us and showed that the phosphatidylinositol-3-kinase (PI3K)/protein kinase B (AKT)/mammalian target of the rapamycin (mTOR) pathway is the key pathway for HCC. They used the human PLC/PRF/5, Hep3B, HepG2, HLE, and HLF HCC cell lines and a normal human hepatocyte cell line. Although they and our group did not identify the same targets, the key point is that an abnormal cell cycle is a complex mechanism. Their scope covered the G0-G1, G2-M, and G1-S phases [45]. Our future work will try to link our network makers with mechanisms discovered by other groups.

Furuta et al. showed that micro- (mi)RNAs are key posttranscriptional regulators of gene expression and are usually deregulated in HCC. They identified four miRNAs, mir-101, mir-195, mir-378, and mir-497 that are always silenced in HCC. In this research, we did not include miRNAs when building the model [46]. We know that miRNAs play significant roles in genetic regulatory networks, and our future work will incorporate miRNAs into our model to reveal more hidden mechanisms of HCC.

(2) Oocyte Meiosis. We did notidentify this pathway in our previous study. Zhang et al. used network-based bioinformatics methods to identify biomarkers for HCC and identified this pathway with 19 upregulated genes with a value of 0.0016 [47]. We identified seven genes in this pathway with a very low value of 8.70−05. Terret et al. claimed that mouse oocytes are a paradigm of cancer cells. They showed asymmetric division in both mitotic cells and oocytes cells, and this causes cancer cells to divide [48].

(3) DNA Replication. As we stated above, we identified five genes of the DNA replication pathway which are also involved in the cell cycle pathway. Macheret and Halazonetis claimed that DNA replication stress is a hallmark of cancer [49], so we have to discuss this topic in more detail here. Hanahan and Weinberg published two review papers on the topic of cancer hallmarks, and the key points are listed here. Based on their first-generation cancer hallmarks [2], Hanahan and Weinberg extended their work to second-generation hallmarks of cancers [3]. It inspired many people and stimulated many ideas on cancer targeted therapies.

(i) Sustaining Proliferative Signaling. Cancer cells can sustain proliferative signaling in various ways, become independent of exogenous growth signals, and so can proliferate. They also have the ability to generate growth factor ligands themselves.

(ii) Evading Growth Suppressors. Cancer cells can ignore signals that inhibit cell proliferation. Retinoblastoma-associated (RB) and TP53 proteins are the two main prototypes of tumor suppressors. They are also important targets of cancer therapies.

(iii) Activating Invasion and Metastasis. Invasion and metastasis comprise multistep processes, usually called the invasion-metastasis cascade. This cascade is related to the mechanisms of miRNAs [50, 51].

(iv) Enabling Replicative Immortality. Cancer cells becoming immortal is a lethal event for cancer patients. There are three important topics to this immortality: reassessing replicative senescence, delayed activation of telomerase, and new noncanonical functions of telomerases.

(v) Inducing Angiogenesis. Both invasion and metastasis need new-born blood vessels to supply nutrients for cancer cells.

(vi) Resisting Cell Death. Cancer cells have many strategies to disable apoptosis. Cancer cells becoming immortal is another lethal event for cancer patients.

When we tried to apply a systems biology approach to cancer therapy, understanding cancer hallmarks was the most important and basic step. There are many investigations of cancer systems biology based on Weinberg’s work. Negrini et al. discussed the genomic instability characteristics of cancers and evolving hallmarks of cancer [52]. Hornberg et al. used the concept of complex signaling networks and increases in complexity to directly name cancers as systems biology disease [53]. Barillot et al. merged the six hallmarks into the following three main properties related to the proliferation, survival, and dissemination of cancers. They are the defective control of the cell cycle, defective control of cell death, and the invasiveness and metastatic potential. There are three key levels at which normal cells transform into cancer cells: mutations of DNA, transcription of DNA to RNA, and RNA translation to proteins. Hallmarks are omens and portent of cancers, and, by a pathway analysis, we can find drug targets at this stage.

Cancer is a network disease, that is, a dysregulation of the entire network. Determining how to build a cancer network is the first important task.

We have briefly reviewed and summarized the key points of Weinberg’s theory. Now we focus on DNA replication stress, because it was identified as another hallmark of cancer. Macheret and Halazonetis claimed that the sustained proliferation hallmark can be regarded as mutations in oncogenes and tumor suppressor genes that are involved in the cell growth pathway. Mutations of TP53, ATM, or MDM2 genes can allow escaping from the apoptosis hallmark. He discussed oncogene-induced DNA replication stress and the role it plays as a cancer progression driver [49].

(4) p53 Signaling Pathway. Soussi et al. discussed a multifactorial analysis of p53 alterations in human cancers [54]. Ueda et al. claimed that the functional inactivation but not the structural mutation of p53 can cause liver cancer [55].

3.4. Pathway Analysis of Early-Stage Liver Cancer

Four of these five key pathways discussed in the total stage (cell cycle, DNA replication, oocyte meiosis, and p53 signaling) were also selected by both early and late stages, although the proteins involved in these pathways were not the same (Table 7). We should pay more attention to the evolution from the early to late stages.

(a)

RankTermCountSymbol value

KEGG
1hsa04110:Cell cycle11E2F1, CCNB1, CDK1, CDKN2A, PCNA, BUB1B, PRKDC, CDC20, MCM2, MCM4, MYC
2hsa03030:DNA replication4POLD1, PCNA, MCM2, MCM4
3hsa05220:Chronic myeloid leukemia4E2F1, CDKN2A, GRB2, MYC0.004415
4hsa04114:Oocyte meiosis4CCNB1, CDK1, CDC20, AURKA0.01273
5hsa05219:Bladder cancer3E2F1, CDKN2A, MYC0.015099
6hsa05223:Non-small cell lung cancer3E2F1, CDKN2A, GRB20.024285
7hsa05214:Glioma3E2F1, CDKN2A, GRB20.03234
8hsa04115:p53 signaling pathway 3CCNB1, CDK1, CDKN2A0.037212
9hsa03430:Mismatch repair2POLD1, PCNA0.09922

BioCarta
1ah_cellcyclePathway:Cyclins and Cell Cycle Regulation4E2F1, CCNB1, CDK1, CDKN2A0.002246
2ah_srcRPTPPathway:Activation of Src by Protein-tyrosine phosphatase alpha3CCNB1, CDK1, GRB20.004
3ah_g2Pathway:Cell Cycle:G2/M Checkpoint3CCNB1, CDK1, PRKDC0.025668
4ah_g1Pathway:Cell Cycle:G1/S Check Point 3E2F1, CDK1, CDKN2A0.039621
5ah_ptc1Pathway:Sonic Hedgehog (SHH) Receptor Ptc1 Regulates cell cycle2CCNB1, CDK10.075537

The significant pathways via DAVID Bioinformatics database are selected for the 43 significant proteins in carcinogenesis. Bold indicates value > 0.05. Most of them are from the KEGG database, and we add several key pathways from BioCarta.
(b)

GO:term valueCorrected valueTerm name

() Biological processes
GO:00513200.002863574122S phase
GO:00000840.002863574122S phase of mitotic cell cycle
GO:00062670.005263574162Prereplicative complex assembly
GO:00007270.014263574262Double-strand break repair via break-induced replication
GO:00062700.016563574282DNA-dependent DNA replication initiation
GO:00226160.020363574312DNA strand elongation
GO:00062710.020363574312DNA strand elongation involved in DNA replication
GO:00007240.030663574382Double-strand break repair via homologous recombination
GO:00007250.047063574472Recombinational repair
GO:00224030.0515635742863Cell cycle phase

() Cellular components
GO:00425556357462MCM complex
GO:00056560.001063574162Prereplicative complex
GO:00312610.002063574222DNA replication preinitiation complex
GO:00312980.002663574252Replication fork protection complex
GO:00329930.009163574462Protein-DNA complex
GO:00001510.017163574632Ubiquitin ligase complex
GO:00432340.00210.06436357413694Protein complex
GO:00335970.00250.07546357441Mitotic checkpoint complex
GO:00314630.00310.09426357451Cul3-RING ubiquitin ligase complex
GO:00435960.00620.188363574101Nuclear replication fork

() Molecular functions
GO:00431380.0021635741823′-5′ DNA helicase activity
GO:00040030.004663574262ATP-dependent DNA helicase activity
GO:00036880.006163574302DNA replication origin binding
GO:00036780.010463574392DNA helicase activity
GO:00036970.013363574442Single-stranded DNA binding
GO:00080940.031063574672DNA-dependent ATPase activity
GO:00036820.044363574802Chromatin binding
GO:00435660.00100.050063574852Structure-specific DNA binding
GO:00048420.00100.050063574852Ubiquitin-protein ligase activity
GO:00700350.00100.051163574862Purine NTP-dependent helicase activity

: number of genes in reference set.
: number of genes in test set.
: number of genes annotated by given term in reference set.
: number of genes annotated by given term in test set.

(1) Mismatch Repair. This is a key pathway that was only selected by the early stage (with higher value of 0.09) but not the late stage (Figure 4). Fujimoto et al. sequenced the entire genome of liver cancers to identify etiological influences on mutation patterns. They discussed transcription-coupled repair and mismatch repair-deficient tumors to show that mismatch repair plays a significant role in liver cancer [56].

3.5. Pathway Analysis of Late-Stage Liver Cancer

(1) Spliceosome. It is important that we selected the spliceosome pathway again in late-stage liver cancer (Table 8), because we selected it in our previous results of late-stage bladder cancer. We review previous results of the spliceosome pathway [16] and then add new results here. Alternative splicing is a modification of premessenger (m)RNA transcripts in which internal noncoding regions of pre-mRNA (introns) are removed and then the remaining segments (exons) are joined (Figure 5). The formation of mature mRNA is subsequently capped at its 5′ end, polyadenylated at its 3′ end, and transported out of the nucleus to be translated into proteins in the cytoplasm. Most genes use alternative splicing to generate multiple spliced transcripts. These transcripts contain various combinations of exons resulting from different mRNA variants, and these are synthesized as protein isoforms. Exons are always around 50~250 base pairs, whereas introns can be as long as several thousands of base pairs. For nuclear-encoded genes, splicing takes place within the nucleus simultaneously with or after transcription. Splicing is necessary for eukaryotic mRNA before it can be translated into the correct protein. The spliceosome is a dynamic intracellular macromolecular complex of multiple proteins and ribonucleoproteins (RNPs). For many eukaryotic introns, the spliceosome carries out the two main functions of alternative splicing: first, it recognizes intron-exon boundaries, and, second, it catalyzes cut-and-paste reactions that remove introns and concatenate exons. Various spliceosomal machinery complexes are formed from five RNP subunits, termed uridine- (U) rich small nuclear (sn)RNPs, that are transiently associated with more than 760 non-snRNP splicing factors (RNA helicases, SR splicing factors, etc.) [57, 58]. Each spliceosomal snRNP (U1, U2, U4, U5, and U6) consists of a U-rich snRNA complexed with a set of seven proteins known as canonical Sm core or SNRP proteins. The seven Sm proteins (B/B′, D1, D2, D3, E, F, and G) form a core ring structure that surrounds the RNA. All Sm proteins contain a conserved sequence motif in two segments (Sm1 and Sm2) that are responsible for the assembly and ordering of snRNAs. They form the Sm core of spliceosomal snRNPs [59] and process pre-mRNA [60]. Spliceosomes not only catalyze splicing by a series of reactions, but also are the main cellular machinery that guides splicing. Recently, scientists found two natural compounds that can interfere with spliceosome function that also display anticancer activity in vitro and in vivo [61, 62]. Therefore, it is thought that inhibiting the spliceosome can be a new target for anticancer drug development [63], and it should be validated in vivo and in vitro in the future [56].

(a)

RankTermCountSymbol value

KEGG
1hsa04110:Cell cycle16E2F1, CDK1, YWHAZ, YWHAB, CDC20, SFN, MCM2, MCM3, CDK4, MCM4, CDK2, MCM7, PCNA, YWHAQ, BUB1B, CCNA2
2hsa04114:Oocyte meiosis8CDK1, YWHAZ, YWHAB, YWHAQ, CDC20, AURKA, PPP1CC, CDK2
3hsa03030:DNA replication5MCM7, PCNA, MCM2, MCM3, MCM4
4hsa04115:p53 signaling pathway4CDK1, SFN, CDK4, CDK20.020496
5hsa03040:Spliceosome5SNRPB, THOC4, LSM2, SF3A2, SF3B40.022722
6hsa05222:Small cell lung cancer4E2F1, TRAF2, CDK4, CDK20.035432
7hsa04914:Progesterone-mediated oocyte maturation4HSP90AB1, CDK1, CCNA2, CDK20.037607
8hsa00310:Lysine degradation3SETDB1, WHSC1, EHMT20.055046
9hsa05200:Pathways in cancer7E2F1, HSP90AB1, TRAF2, MSH2, BIRC5, CDK4, CDK20.061178

BioCarte
1ah_p53Pathway:p53 Signaling Pathway4E2F1, PCNA, CDK4, CDK20.003378
2ah_cellcyclePathway:Cyclins and Cell Cycle Regulation4E2F1, CDK1, CDK4, CDK20.010334
3ah_g1Pathway:Cell Cycle:G1/S Check Point 4E2F1, CDK1, CDK4, CDK20.015615
4ah_rbPathway:RB Tumor Suppressor/Checkpoint Signaling in response to DNA damage3CDK1, CDK4, CDK20.019987
5ah_ranMSpathway:Role of Ran in mitotic spindle regulation3RAN, AURKA, KPNA20.019987
6ah_g2Pathway:Cell Cycle:G2/M Checkpoint3CDK1, YWHAQ, BRCA10.067509
7ah_RacCycDPathway:Influence of Ras and Rho proteins on G1 to S Transition3E2F1, CDK4, CDK20.072809

The significant pathways via DAVID Bioinformatics database are selected for the 74 significant proteins in carcinogenesis. Bold indicates value > 0.05. Most of them are from the KEGG database, and we add several key pathways from BioCarta.
(b)

GO:term valueCorrected valueTerm name

() Biological processes
GO:0051320635710124S phase
GO:0000084635710124S phase of mitotic cell cycle
GO:0006267635710164Prereplicative complex assembly
GO:0000727635710264Double-strand break repair via break-induced replication
GO:0006270635710284DNA-dependent DNA replication initiation
GO:0022616635710314DNA strand elongation
GO:0006271635710314DNA strand elongation involved in DNA replication
GO:0000724635710384Double-strand break repair via homologous recombination
GO:00062606357101205DNA replication
GO:0000725635710474Recombinational repair

() Cellular components
GO:004255563571064MCM complex
GO:0005656635710164Prereplicative complex
GO:0031261635710224DNA replication preinitiation complex
GO:0031298635710254Replication fork protection complex
GO:0032993635710464Protein-DNA complex
GO:00444540.01136357102174Nuclear chromosome part
GO:00444510.02806357102754Nucleoplasm part
GO:00444280.037063571012517Nuclear part
GO:00444270.04006357103024Chromosomal part
GO:00432340.065663571013697Protein complex

() Molecular functions
GO:00431386357101843′-5′ DNA helicase activity
GO:0004003635710264ATP-dependent DNA helicase activity
GO:0003688635710304DNA replication origin binding
GO:0043566635710855Structure-specific DNA binding
GO:0003678635710394DNA helicase activity
GO:0003697635710444Single-stranded DNA binding
GO:0043140635710133ATP-dependent 3′-5′ DNA helicase activity
GO:0008094635710674DNA-dependent ATPase activity
GO:0003682635710804Chromatin binding
GO:0070035635710864Purine NTP-dependent helicase activity

: number of genes in reference set.
: number of genes in test set.
: number of genes annotated by given term in reference set.
: number of genes annotated by given term in test set.

Tian et al. used a phosphoproteomic analytical method on the highly metastatic HCC cell line, MHCC97-H. They reported that phosphoproteins were also found in the spliceosome pathway, and they were related to liver cancer [64].

(2) Progesterone-Mediated Oocyte Maturation. Zhang et al. also identified this pathway [47].

(3) Lysine Degradation. Huang et al. identified this pathway in liver cancer [65].

3.6. Comparison with Our Previous Liver Cancer Network Biomarkers Using a Microarray Analysis

We found 15 common proteins in this study and our previous results (Table 5). That tells us that the NGS is a revolutionary technology, but it can be trusted to not give totally different results from microarray data.

3.7. Comparison of Microarray and NGS Technology

Gene expression profiling by microarray has been successful at demonstrating the patterns of mRNAs within tissues and cells. Although microarray platforms showed similar levels of concordance with the RNA-seq data, the next generation sequencing (NGS) technologies provided high sensitivity, specificity, and accuracy as compared to the microarray platforms [66]. Due to NGS providing a more detailed observation at the transcriptome, scientists and biologists have been eager to apply NGS for gene expression profiling. Microarrays only return results from those regions for probes which have been designed. Therefore, microarrays are only as good as the databases from which they are designed. NGS methods provide aspects of the transcriptome without a priori knowledge, allowing for the analysis and discovery of novel transcripts, noncoding RNAs, and alternative splicing. Thus, RNA sequencing methods enable the most accurate detection and quantification for transcriptome analysis.

4. Conclusions

Liver cancer is the third most deadly cancer causing about 700,000 deaths in 2011 worldwide. It is a lethal disease like other cancers. There are three important topics in this research. The first was to compare this study with our previous research of liver cancer microarray data, because the microarray technology may someday be replaced by the NGS technology. We found the results to be good, because there were a lot of key proteins identified by both methods. The second was to reveal the carcinogenesis process from the early to late stages of liver cancer. The specific pathway of the early stage was the mismatch repair pathway, and the specific pathways of the late stage were the spliceosome pathway, lysine degradation pathway, and progesterone-mediated oocyte maturation pathway. This suggests novel directions for choosing different targeted therapeutic strategies at different stages of cancer. In particular, compared to our previous results of bladder cancer, we found that the spliceosome pathway is a significant pathway in the late stage of both liver cancer and bladder cancer. Our future work will focus greater attention on this pathway related to various cancers and consider it as a new drug target for anticancer therapies.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Authors’ Contribution

Yung-Hao Wong and Chia-Chou Wu contributed equally to this work.

Acknowledgment

The authors are grateful for the support provided by the Ministry of Science and Technology with Grants no. MOST-103-2745-E-007-001-ASP and no. 103-2221-E-038-013-MY2.

Supplementary Materials

Supplementary material S.1 uses the Maximum Likelihood Method to do the parameter identification of regression model in equation (1). S.2 uses the AIC and student’s t-test to calculate the p-values of association abilities, and detect the system model order and determine the significance of the model parameters. Table S3: (a) The 43 identified significant proteins of early stage liver cancer. (b) The 80 identified significant proteins of late stage liver cancer. (c) The 74 identified significant proteins of total stage liver cancer.

  1. Supplementary Material

References

  1. T. N. Seyfried and L. M. Shelton, “Cancer as a metabolic disease,” Nutrition & Metabolism, vol. 7, article 7, 2010. View at: Publisher Site | Google Scholar
  2. D. Hanahan and R. A. Weinberg, “The hallmarks of cancer,” Cell, vol. 100, no. 1, pp. 57–70, 2000. View at: Publisher Site | Google Scholar
  3. D. Hanahan and R. A. Weinberg, “Hallmarks of cancer: the next generation,” Cell, vol. 144, no. 5, pp. 646–674, 2011. View at: Publisher Site | Google Scholar
  4. L. Preziosi, Cancer Modelling and Simulation, Chapman & Hall/CRC, Boca Raton, Fla, USA, 2003.
  5. E. Barillot, Computational Systems Biology of Cancer, Taylor & Francis, Boca Raton, Fla, USA, 2013.
  6. A. Cesario and F. B. Marcus, Cancer Systems Biology, Bioinformatics and Medicine: Research and Clinical Applications, Springer, Dordrecht, The Netherlands, 2011.
  7. E. Wang, Cancer Systems Biology, CRC Press, Boca Raton, Fla, USA, 2010.
  8. Q. Yan, Systems Biology in Drug Discovery and Development: Methods and Protocols, Humana Press, New York, NY, USA, 2010.
  9. H.-F. Juan and H.-C. Huang, Systems Biology: Applications in Cancer-Related Research, World Scientific, River Edge, NJ, USA, 2012.
  10. C.-W. Li, Y. Chu, and B. Chen, “Construction and clarification of dynamic gene regulatory network of cancer cell cycle via microarray data,” Cancer Informatics, vol. 2, pp. 223–241, 2006. View at: Google Scholar
  11. L.-H. Chu and B.-S. Chen, “Comparisons of robustness and sensitivity between cancer and normal cells by microarray data,” Cancer Informatics, vol. 6, pp. 165–181, 2008. View at: Google Scholar
  12. L.-H. Chu and B.-S. Chen, “Construction of a cancer-perturbed protein-protein interaction network for discovery of apoptosis drug targets,” BMC Systems Biology, vol. 2, article 56, 2008. View at: Publisher Site | Google Scholar
  13. Y.-C. Wang and B.-S. Chen, “A network-based biomarker approach for molecular investigation and diagnosis of lung cancer,” BMC Medical Genomics, vol. 4, article 2, 2011. View at: Publisher Site | Google Scholar
  14. C.-T. Tu and B.-S. Chen, “New measurement methods of network robustness and response ability via microarray data,” PLoS ONE, vol. 8, no. 1, Article ID e55230, 2013. View at: Publisher Site | Google Scholar
  15. Y. H. Wong, R.-H. Chen, and B.-S. Chen, “Core and specific network markers of carcinogenesis from multiple cancer samples,” Journal of Theoretical Biology, vol. 362, pp. 17–34, 2014. View at: Publisher Site | Google Scholar
  16. Y. Wong, C. Li, and B. Chen, “Evolution of network biomarkers from early to late stage bladder cancer samples,” BioMed Research International, vol. 2014, Article ID 159078, 23 pages, 2014. View at: Publisher Site | Google Scholar
  17. A. Jemal, F. Bray, M. M. Center, J. Ferlay, E. Ward, and D. Forman, “Global cancer statistics,” CA Cancer Journal for Clinicians, vol. 61, no. 2, pp. 69–90, 2011. View at: Publisher Site | Google Scholar
  18. S. F. Altekruse, K. A. McGlynn, and M. E. Reichman, “Hepatocellular carcinoma incidence, mortality, and survival trends in the United States from 1975 to 2005,” Journal of Clinical Oncology, vol. 27, no. 9, pp. 1485–1491, 2009. View at: Publisher Site | Google Scholar
  19. Y. Li, Z.-Y. Tang, and J.-X. Hou, “Hepatocellular carcinoma: insight from animal models,” Nature Reviews Gastroenterology and Hepatology, vol. 9, no. 1, pp. 32–43, 2012. View at: Publisher Site | Google Scholar
  20. J. U. Marquardt, P. R. Galle, and A. Teufel, “Molecular diagnosis and therapy of hepatocellular carcinoma (HCC): an emerging field for advanced technologies,” Journal of Hepatology, vol. 56, no. 1, pp. 267–275, 2012. View at: Publisher Site | Google Scholar
  21. R. N. Aravalli, E. N. K. Cressman, and C. J. Steer, “Cellular and molecular mechanisms of hepatocellular carcinoma: an update,” Archives of Toxicology, vol. 87, no. 2, pp. 227–247, 2013. View at: Publisher Site | Google Scholar
  22. B. C. Sun and M. Karin, “Obesity, inflammation, and liver cancer,” Journal of Hepatology, vol. 56, no. 3, pp. 704–713, 2012. View at: Publisher Site | Google Scholar
  23. A. Villanueva, V. Hernandez-Gea, and J. M. Llovet, “Medical therapies for hepatocellular carcinoma: a critical view of the evidence,” Nature Reviews Gastroenterology and Hepatology, vol. 10, no. 1, pp. 34–42, 2013. View at: Publisher Site | Google Scholar
  24. Y. Zhang, S. Wang, D. Li et al., “A systems Biology-Based classifier for hepatocellular carcinoma diagnosis,” PLoS ONE, vol. 6, no. 7, Article ID e22426, 2011. View at: Publisher Site | Google Scholar
  25. Y. Q. Zhang, X. Guo, D. Wang et al., “A systems biology-based investigation into the therapeutic effects of Gansui Banxia Tang on reversing the imbalanced network of hepatocellular carcinoma,” Scientific Reports, vol. 4, Article ID 04154, 2014. View at: Publisher Site | Google Scholar
  26. L. A. D'Alessandro, R. Meyer, and U. Klingmüller, “Hepatocellular carcinoma: a systems biology perspective,” Frontiers in Physiology, vol. 4, article 28, 2013. View at: Publisher Site | Google Scholar
  27. L. Chen, R. Liu, Z.-P. Liu, M. Li, and K. Aihara, “Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers,” Scientific Reports, vol. 2, article 342, 2012. View at: Publisher Site | Google Scholar
  28. R. Liu, X. Wang, K. Aihara, and L. Chen, “Early diagnosis of complex diseases by molecular biomarkers, network biomarkers, and dynamical network biomarkers,” Medicinal Research Reviews, vol. 34, no. 3, pp. 455–478, 2014. View at: Publisher Site | Google Scholar
  29. R. Liu, X. Yu, X. Liu, D. Xu, K. Aihara, and L. Chen, “Identifying critical transitions of complex diseases based on a single sample,” Bioinformatics, vol. 30, no. 11, pp. 1579–1586, 2014. View at: Publisher Site | Google Scholar
  30. Y.-C. Wang and B.-S. Chen, “A network-based biomarker approach for molecular investigation and diagnosis of lung cancer,” BMC Medical Genomics, vol. 4, article 2, 2011. View at: Publisher Site | Google Scholar
  31. K.-Q. Liu, Z.-P. Liu, J.-K. Hao, L. Chen, and X.-M. Zhao, “Identifying dysregulated pathways in cancers from pathway interaction networks,” BMC Bioinformatics, vol. 13, no. 1, article 126, 2012. View at: Publisher Site | Google Scholar
  32. A. Chatr-Aryamontri, B.-J. Breitkreutz, S. Heinicke et al., “The BioGRID interaction database: 2013 update,” Nucleic Acids Research, vol. 41, no. 1, pp. D816–D823, 2013. View at: Publisher Site | Google Scholar
  33. J. M. Bland and D. G. Altman, “Multiple significance tests: the Bonferroni method,” British Medical Journal, vol. 310, article 170, 1995. View at: Publisher Site | Google Scholar
  34. R. Johansson, System Modeling and Identification, Prentice Hall, 1993.
  35. M. Pagano and K. Gauvreau, Principles of Biostatistics, 2000.
  36. M. Kanehisa, “Molecular network analysis of diseases and drugs in KEGG,” Methods in Molecular Biology, vol. 939, pp. 263–275, 2013. View at: Publisher Site | Google Scholar
  37. J.-I. Satoh, “Molecular network of microRNA targets in Alzheimer's disease brains,” Experimental Neurology, vol. 235, no. 2, pp. 436–446, 2012. View at: Publisher Site | Google Scholar
  38. D. W. Huang, B. T. Sherman, and R. A. Lempicki, “Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources,” Nature Protocols, vol. 4, no. 1, pp. 44–57, 2009. View at: Publisher Site | Google Scholar
  39. D. W. Huang, B. T. Sherman, and R. A. Lempicki, “Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists,” Nucleic Acids Research, vol. 37, no. 1, pp. 1–13, 2009. View at: Publisher Site | Google Scholar
  40. J. Wang, Q. Huang, Z.-P. Liu et al., “NOA: a novel Network Ontology Analysis method,” Nucleic Acids Research, vol. 39, no. 13, article e87, 2011. View at: Publisher Site | Google Scholar
  41. C. Zhang, J. Wang, K. Hanspers, D. Xu, L. Chen, and A. R. Pico, “NOA: a cytoscape plugin for network ontology analysis,” Bioinformatics, vol. 29, no. 16, pp. 2066–2067, 2013. View at: Publisher Site | Google Scholar
  42. G. H. Williams and K. Stoeber, “The cell cycle and cancer,” Journal of Pathology, vol. 226, no. 2, pp. 352–364, 2012. View at: Publisher Site | Google Scholar
  43. J. Lu, M.-L. He, L. Wang et al., “miR-26a inhibits cell growth and tumorigenesis of nasopharyngeal carcinoma through repression of EZH2,” Cancer Research, vol. 71, no. 1, pp. 225–233, 2011. View at: Publisher Site | Google Scholar
  44. K.-F. Cheung, J. Zhao, Y. Hao et al., “CITED2 is a novel direct effector of peroxisome proliferator-activated receptor γ in suppressing hepatocellular carcinoma cell growth,” Cancer, vol. 119, no. 6, pp. 1217–1226, 2013. View at: Publisher Site | Google Scholar
  45. F. Dituri, A. Mazzocca, L. Lupo et al., “PI3K class IB controls the cell cycle checkpoint promoting cell proliferation in hepatocellular carcinoma,” International Journal of Cancer, vol. 130, no. 11, pp. 2505–2513, 2012. View at: Publisher Site | Google Scholar
  46. M. Furuta, K.-I. Kozaki, K. Tanimoto et al., “The tumor-suppressive miR-497-195 cluster targets multiple cell-cycle regulators in hepatocellular carcinoma,” PLoS ONE, vol. 8, no. 3, Article ID e60155, 2013. View at: Publisher Site | Google Scholar
  47. L. Zhang, Y. Guo, B. Li et al., “Identification of biomarkers for hepatocellular carcinoma using network-based bioinformatics methods,” European Journal of Medical Research, vol. 18, article 35, 2013. View at: Google Scholar
  48. M.-E. Terret, A. Chaigne, and M.-H. Verlhac, “Mouse oocyte, a paradigm of cancer cell,” Cell Cycle, vol. 12, no. 21, pp. 3370–3376, 2013. View at: Publisher Site | Google Scholar
  49. M. Macheret and T. D. Halazonetis, “DNA replication stress as a hallmark of cancer,” Annual Review of Pathology: Mechanisms of Disease, vol. 10, no. 1, pp. 425–448, 2015. View at: Publisher Site | Google Scholar
  50. L. Ma, J. Teruya-Feldstein, and R. A. Weinberg, “Tumour invasion and metastasis initiated by microRNA-10b in breast cancer,” Nature, vol. 449, no. 7163, pp. 682–688, 2007. View at: Publisher Site | Google Scholar
  51. P. Gao, A.-Y. Xing, G.-Y. Zhou et al., “The molecular mechanism of microRNA-145 to suppress invasion-metastasis cascade in gastric cancer,” Oncogene, vol. 32, no. 4, pp. 491–501, 2013. View at: Publisher Site | Google Scholar
  52. S. Negrini, V. G. Gorgoulis, and T. D. Halazonetis, “Genomic instability—an evolving hallmark of cancer,” Nature Reviews Molecular Cell Biology, vol. 11, pp. 220–228, 2010. View at: Google Scholar
  53. J. J. Hornberg, F. J. Bruggeman, H. V. Westerhoff, and J. Lankelma, “Cancer: a systems biology disease,” BioSystems, vol. 83, no. 2-3, pp. 81–90, 2006. View at: Publisher Site | Google Scholar
  54. T. Soussi, Y. Legros, R. Lubin, K. Ory, and B. Schlichtholz, “Multifactorial analysis of p53 alteration in human cancer: a review,” International Journal of Cancer, vol. 57, no. 1, pp. 1–9, 1994. View at: Google Scholar
  55. H. Ueda, S. J. Ullrich, J. D. Gangemi et al., “Functional inactivation but not structural mutation of p53 causes liver cancer,” Nature Genetics, vol. 9, no. 1, pp. 41–47, 1995. View at: Publisher Site | Google Scholar
  56. A. Fujimoto, Y. Totoki, T. Abe et al., “Whole-genome sequencing of liver cancers identifies etiological influences on mutation patterns and recurrent mutations in chromatin regulators,” Nature Genetics, vol. 44, no. 7, pp. 760–764, 2012. View at: Publisher Site | Google Scholar
  57. M. S. Jurica and M. J. Moore, “Pre-mRNA splicing: awash in a sea of proteins,” Molecular Cell, vol. 12, no. 1, pp. 5–14, 2003. View at: Publisher Site | Google Scholar
  58. Z. Zhou, L. J. Licklider, S. P. Gygi, and R. Reed, “Comprehensive proteomic analysis of the human spliceosome,” Nature, vol. 419, no. 6903, pp. 182–185, 2002. View at: Publisher Site | Google Scholar
  59. R. S. Pillai, M. Grimmler, G. Meister et al., “Unique Sm core structure of U7 snRNPs: assembly by a specialized SMN complex and the role of a new component, Lsm11, in histone RNA processing,” Genes and Development, vol. 17, no. 18, pp. 2321–2333, 2003. View at: Publisher Site | Google Scholar
  60. L. Agranat-Tamir, N. Shomron, J. Sperling, and R. Sperling, “Interplay between pre-mRNA splicing and microRNA biogenesis within the supraspliceosome,” Nucleic Acids Research, vol. 42, no. 7, pp. 4640–4651, 2014. View at: Publisher Site | Google Scholar
  61. D. Kaida, H. Motoyoshi, E. Tashiro et al., “Spliceostatin A targets SF3b and inhibits both splicing and nuclear retention of pre-mRNA,” Nature Chemical Biology, vol. 3, no. 9, pp. 576–583, 2007. View at: Publisher Site | Google Scholar
  62. Y. Kotake, K. Sagane, T. Owa et al., “Splicing factor SF3b as a target of the antitumor natural product pladienolide,” Nature Chemical Biology, vol. 3, no. 9, pp. 570–575, 2007. View at: Publisher Site | Google Scholar
  63. R. J. van Alphen, E. A. C. Wiemer, H. Burger, and F. A. L. M. Eskens, “The spliceosome as target for anticancer treatment,” British Journal of Cancer, vol. 100, no. 2, pp. 228–232, 2009. View at: Publisher Site | Google Scholar
  64. M. Tian, H. Cheng, Z. Wang et al., “Phosphoproteomic analysis of the highly-metastatic hepatocellular carcinoma cell line, MHCC97-H,” International Journal of Molecular Sciences, vol. 16, no. 2, pp. 4209–4225, 2015. View at: Publisher Site | Google Scholar
  65. Y. Huang, X. Yang, F. Zhao et al., “Overexpression of Dickkopf-1 predicts poor prognosis for patients with hepatocellular carcinoma after orthotopic liver transplantation by promoting cancer metastasis and recurrence,” Medical Oncology, vol. 31, no. 7, pp. 1–10, 2014. View at: Publisher Site | Google Scholar
  66. A. L. P. Whitley, C. San Jose, N. Hernandez et al., A Comparison of Next Generation Sequencing and Microarrays for Transcriptome Expression Profiling, Life Technologies, 2009.

Copyright © 2015 Yung-Hao Wong et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

1063 Views | 522 Downloads | 4 Citations
 PDF  Download Citation  Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

We are committed to sharing findings related to COVID-19 as quickly and safely as possible. Any author submitting a COVID-19 paper should notify us at help@hindawi.com to ensure their research is fast-tracked and made available on a preprint server as soon as possible. We will be providing unlimited waivers of publication charges for accepted articles related to COVID-19. Sign up here as a reviewer to help fast-track new submissions.