Applying NGS Data to Find Evolutionary Network Biomarkers from the Early and Late Stages of Hepatocellular Carcinoma
Hepatocellular carcinoma (HCC) is a major liver tumor (~80%), besides hepatoblastomas, angiosarcomas, and cholangiocarcinomas. In this study, we used a systems biology approach to construct protein-protein interaction networks (PPINs) for early-stage and late-stage liver cancer. By comparing the networks of these two stages, we found that the two networks showed some common mechanisms and some significantly different mechanisms. To obtain differential network structures between cancer and noncancer PPINs, we constructed cancer PPIN and noncancer PPIN network structures for the two stages of liver cancer by systems biology method using NGS data from cancer cells and adjacent noncancer cells. Using carcinogenesis relevance values (CRVs), we identified 43 and 80 significant proteins and their PPINs (network markers) for early-stage and late-stage liver cancer. To investigate the evolution of network biomarkers in the carcinogenesis process, a primary pathway analysis showed that common pathways of the early and late stages were those related to ordinary cancer mechanisms. A pathway specific to the early stage was the mismatch repair pathway, while pathways specific to the late stage were the spliceosome pathway, lysine degradation pathway, and progesterone-mediated oocyte maturation pathway. This study provides a new direction for cancer-targeted therapies at different stages.
Cancer is the leading cause of death worldwide, and its etiology occurs at the DNA, RNA, and protein levels. It is a very complex disease involving cascades of spatial and temporal changes in genetic networks and metabolic pathways . Weinberg summarized the important cancer hallmarks [2, 3]. Cancer hallmarks can highlight important cancer mechanisms. In recent years, many systems biology approaches have been applied to study cancers [4–9]. We performed a series of studies on cancer using systems biology approaches. Recently, we have particularly focused on searching for network biomarkers of cancers [10–16].
Hepatocellular carcinoma (HCC) is a major liver tumor (~80%), besides hepatoblastomas, angiosarcomas, and cholangiocarcinomas. Compared to other types of cancer, liver cancer is the third most deadly cancer globally and caused about 700,000 deaths in 2011 . Due to the increasing annual incidences  and poor 5-year survival rate (~15%), diagnosis and prognosis of HCC are still important public health issues. Various studies revealed a multistep process involved in liver carcinogenesis ; however, the exact biology of HCC remains poorly understood overall . In addition, late confirmation of the occurrence of HCC through traditional histological examinations and tumor sections contributes to the poor survival rate. To accurately diagnose the occurrence of HCC at an early time, it is necessary to understand the cellular and molecular mechanisms of HCC . Hence, in this study, we compared the early and late stages of molecular interaction networks of HCC to reveal the underlying mechanisms of HCC development.
Although the histological and molecular features leading to HCC initiation are still poorly understood, mounting evidence suggests that a gradual accumulation of mutations and genetic changes in hepatocytes, which form the live lobule, may lead to the development of HCC . Exposure to risk factors for liver cancer, such as obesity , alcohol addiction, aflatoxin, or hepatitis viruses , and the subsequent induction of inflammation may cause advanced hepatocyte necrosis. The recurrence of necrosis and regeneration of hepatocytes provide an opportunity for the introduction of mutations and genetic changes in hepatocytes. During these processes, alternative ways to protect liver lesions occur such as fibrosis and cirrhosis. Although a high correlation between cirrhosis and an HCC diagnosis was reported, the mechanisms of how cirrhosis transforms into HCC are still unknown. Furthermore, after a definite diagnosis of HCC, patients with early-stage liver cancer have better survival rates than those in the late stage. Although HCC’s management has substantially changed in the past few decades, the only systemic standard of care for patients with advanced HCC is sorafenib, a multikinase inhibitor, with a mean survival benefit of only 3 months [20, 23]. With advances in affordable, high-throughput technologies, increasing numbers of systems biology studies of HCC diagnoses  and treatments  have shed light on applications of systems biology to HCC diagnosis, prognosis, and therapy.
As for different etiologies and heterogenic genomic alterations of HCC, the systems biology methodology that integrates Omics data is suitable to develop accurate diagnoses, novel therapeutic targets, and efficient targeted therapies . In this study, NGS data were used to construct protein-protein interaction (PPI) networks (PPINs) of the early and late stages of HCC. The network structure and protein associations of different stages of HCC were compared to obtain a set of significant proteins which play important roles in processes of the progression of HCC.
Chen et al. developed a dynamical network biomarker (DNB) that can serve as a general early-warning signal to indicate an imminent bifurcation or sudden deterioration before the critical transition occurs, which means that it can identify a predisease state using time series microarray data [27–29]. We tried a different approach from their method and used sample microarray data from liver cancer patients at different stages. Our approach could also be extended to predict some similar results as their research. That is, in this study, we simply divided the cancer into early and late stages, but there are more stages of cancer, such as stages I, II, III, and IV. If we could observe the time evolution of cancer biomarkers at these other different stages, we could also predict the predisease state by comparing these cancer biomarkers at different stages.
We reveal the carcinogenesis process from early-stage and late-stage liver cancer. A specific pathway of the early stage was the mismatch repair pathway, while specific pathways of the late stage were the spliceosome pathway, lysine degradation pathway, and progesterone-mediated oocyte maturation pathway.
2. Materials and Methods
2.1. Overview of the Process for Constructing Liver Cancer Network Markers
We successfully used our methods to find core and specific network markers of four different kinds of cancer and the evolution of network markers from the early stage to late stage of liver cancer [15, 16]. A similar theoretical framework was employed in this study to find the evolution of network markers in the early and late stages of liver cancer. The theoretical systems method in this paper was developed from a previous study, but we used a new large dataset of NGS expression data which differ from traditional microarray expression data. A flowchart of the construction of network biomarkers for the early and late stages of liver cancer is shown in Figure 1. We know that a lot of NGS gene expression data have been generated in recent years. Some consider that they are more accurate than traditional microarray expression data. Another key point of this work was to make comparisons with our previous results, such as network biomarkers of liver cancer from microarray data and the evolution of network markers from the early and late stages of liver cancer. We combined two data sources: (1) NGS data of liver cancer and noncancer samples from the GEO database, among which cancer samples were divided into two groups of early-stage and late-stage liver cancer, and (2) the PPI database, which is required to construct PPINs for liver cancer. These data were used for PPI pool selection, and the selected PPIs and microarray data were then used for PPIN construction. Through regression modeling and the maximum-likelihood parameter estimation method, a cancer PPIN (CPPIN) and noncancer PPIN (NPPIN) were obtained. We also constructed a differential PPIN (DPPIN). The constructed CPPIN and NPPIN were compared to obtain a set of significant proteins for liver cancer based on the carcinogenesis relevance value (CRV) for each protein and a statistical assessment. Significant proteins and PPIs among these proteins were used to construct network markers for the early and late stages of liver cancer.
2.2. Data Selection and Preprocessing
Liver RNA-seq data were collected from liver HCC (LIHC) of The Cancer Genome Atlas (TCGA) with a batch number of 100. Normalized results, which consisted of reads per kilobase of exon per million mapped reads (RPKM) values, were used to represent gene expressions. The NGS gene expression dataset of liver cancer was obtained from TCGA database. The same dataset contained early-stage and late-stage liver cancer and noncancer samples. We only used data derived from nonprocessed primary biopsies to avoid discrepancies in gene expressions that are intrinsic to cell culture and fixation. Therefore, the dataset utilized contained primary tumor samples of both stages from patients and adjacent nontumor tissue samples from the same cancer patients, which were considered to be control samples. As shown in Table 1, 19 early-stage (stages I and II) and 18 late-stage (stages III and IV) liver HCC tumors and 24 adjacent normal tissue pairs were collected for further analysis. To describe the extent of a patient’s cancer, the cancers were classified into four stages according to their degree of invasion and migration using the TNM staging system, as defined by the American Joint Committee on Cancer (AJCC) and the International Union against Cancer (UICC). We then divided the cancer samples into two groups. In general, stages I and II described early-stage cancers that have higher curability rates with medical treatment, while stages III and IV described late stages. We also combined the early and late stages for a total stage. However, there were no corresponding noncancer samples in the surrounding area for each stage, and we had only one group of surrounding noncancer samples (Table 1). The reason why we had to take this step was that NGS samples are harder (and much more expensive) to obtain than microarray data, and the number of samples was small, which may have caused overfitting in the parameter estimation process of our model. So we used the total stage to identify a model to avoid overfitting. We built the CPPIN and NPPIN for early-stage, late-stage, and total-stage liver cancer in this study. We obtained 19 and 18 samples for the early-stage and late-stage liver cancer, respectively, and 24 noncancer samples. Prior to further analysis, the gene expression value, , was normalized to -transformed scores, , for each gene, , and then the resulting normalized expression value had a mean and a standard deviation for sample [30, 31].
PPI data for Homo sapiens were extracted from the Biological General Repository for Interaction Database (BioGRID, downloaded in October 2012). The BioGRID is an open-access archive of genetic and protein interactions that are curated from the primary biomedical literature of all major model organisms. As of September 2012, BioGRID housed more than 500,000 manually annotated interactions from more than 30 model organisms . The above two databases were mined for liver cancer and noncancer PPINs using their corresponding microarray data. These early-stage and late-stage liver cancer and noncancer PPINs were then compared to obtain network markers.
2.3. Selection of a Protein Pool and Identification of PPINs for Cancerous and Noncancerous Cells
To integrate gene expressions with PPI data so we could construct the corresponding CPPINs and NPPINs, we set up a protein pool containing differentially expressed proteins. Gene expression values were reasonably assumed to correlate with protein expression levels. We used a one-way analysis of variance (ANOVA) to analyze the expression of each protein and selected proteins with differential expression levels. This method allowed determination of significant differences between cancer and noncancer datasets. The null hypothesis (Ho) was based on the assumption that mean protein expression levels of cancer and noncancer sets were the same. A Bonferroni adjustment , a type of multiple testing, was used to detect and correct proteins with a discrepancy. Proteins with a value of <0.01 were included in the protein pool. However, any proteins in the protein pool which did not have PPI information were eliminated. In addition, proteins that were not already in the protein pool were included if their PPI information determined that they were closely associated with proteins already in the pool. As a result, the protein pool contained proteins that had certain differences in expression levels and proteins that had close relationships with the aforementioned proteins.
On the strength of the significant pool and PPI information, candidate PPINs for early-stage and late-stage liver cancer were constructed for liver cancer and noncancer tissues by linking proteins that interacted with each other. In other words, proteins that had PPI information through the pool were linked together, resulting in candidate PPINs.
As the candidate PPIN included all possible PPIs under various environments, different organisms, and experimental conditions, the candidate PPIN needed to be further confirmed by microarray data to identify appropriate PPIs according to the biological processes that are relevant to cancer. To remove false-positive PPIs from each candidate PPIN for different biological conditions, we used both a PPI model identification scheme and a model order detection method to prune each candidate PPIN using the corresponding microarray data to approach the actual PPIN. Here, the PPIs of a target protein in the candidate PPIN can be depicted by the following protein association model:where represents the expression level of the target protein for sample ; represents the expression level of the th protein interacting with target protein for sample ; denotes the association interaction ability between the target protein and its th interactive protein; represents the number of proteins interacting with the target protein ; and represents the stochastic noise due to other factors or model uncertainty. The biological meaning of (1) is that expression levels of target protein are associated with expression levels of proteins that interact with it. Consequently, a protein association (interaction) model for each protein in the protein pool can be built using (1).
After constructing (1) for the PPI model of each protein in the candidate PPIN, we used the maximum-likelihood estimation method  to identify association parameters in (1) using microarray data as follows (see Supplementary Materials S.1 available online at http://dx.doi.org/10.1155/2015/391475):where was identified using microarray data in accordance with the maximum-likelihood estimation method.
Once the association parameters for all proteins in the candidate PPIN were identified for each protein, significant protein associations were determined using the interaction model order detection method based on estimated association abilities, that is, to detect the interaction number in (2). The Akaike information criterion (AIC)  and Student’s -test  were used for both model order selection and significance determination of protein associations in (see Supplementary Materials S.2).
2.4. Determination of Significant Proteins and Their Network Structures in the Carcinogenesis of Liver Cancer
After the interaction number was determined using the AIC order detection and Student’s -test, spurious false-positive PPIs, , in (2) were pruned away, and only significant PPIs that remained were refined as follows:where denotes the number of significant PPIs of the PPIN, with the target protein . In other words, a number of ’s (or false positives) were pruned from the PPIs of target protein . One protein by one protein (i.e., for all proteins in the refined PPIN in (3)) resulted in the following refined PPIN:where the interaction matrix denotes the PPIs.
If there was no PPI between proteins and or it was pruned away by the AIC order detection due to insignificance in the refined PPIN, then . In general, , but if this was not the case, the larger one was chosen as to avoid the situation where . The above PPIN construction method was used to construct refined PPINs for each stage of liver cancer (early and late) and noncancer cells. The interaction matrices of the refined PPINs in (4) for cancer and noncancer cells of both the early and late stages of liver cancer were constructed, respectively, as follows:where is the early-stage and late-stage liver cancer; and denote interaction matrices of the refined PPIN of the th stage of liver cancer and noncancer, respectively; and is the number of proteins in the refined PPIN. Therefore, the protein association model for CPPIN and NPPIN in the th-stage liver cancer and noncancer samples can be represented by the following equations according to (4) and (5):where is early-stage and late-stage liver cancer, and denote vectors of expression levels, and and indicate noise vectors of the PPINs in the th stage liver cancer and noncancer cells, respectively.
The different matrix of the differential PPI network between CPPIN and NPPIN in the th stage liver cancer is defined as follows:where is early-stage and late-stage liver cancer, denotes the protein association ability difference between CPPIN and NPPIN in the th-stage liver cancer, and matrix indicates the difference in network structures between CPPIN and NPPIN in the th-stage liver cancer. In order to investigate carcinogenesis from the difference matrix, , between CPPIN and NPPIN of the th-stage liver cancer in (7), a score, which we called the carcinogenesis relevance value (CRV), was presented to quantify the correlation of each protein in with the significance of carcinogenesis as follows :where and is early-stage and late-stage liver cancer.
The in (8) quantifies the differential extent of protein associations of the th protein (the absolute sum of the th row of in (7)) and the can differentiate CPPIN from NPPIN in th-stage liver cancer. In other words, the in (8) represents the network structure difference of the th protein between cancer and noncancer networks in the th-stage liver cancer.
In order to investigate what proteins are more likely involved in th-stage liver cancer, we needed to calculate the corresponding empirical value to determine the statistical significance of the . To determine the observed value of each , we repeatedly permuted the network structure of the candidate PPIN of the th-stage liver cancer as a random network of th-stage liver cancer. Each protein in the random network of the th-stage liver cancer had its own CRV to generate a distribution of for = early-stage and late-stage liver cancer. Although the network structure was randomly disarranged, linkages of each protein were maintained. In other words, proteins with which a particular protein interacted were permuted without changing the total number of protein interactions. This procedure was repeated 105 times, and the corresponding value was calculated as the fraction of the random network structure in which the was at least as large as the CRV of the real network structure. According to distributions of the of the random networks, the in (8) with a value of ≤0.01 was regarded as a significant CRV, and the corresponding protein was determined to be a significant protein in the carcinogenesis of the th-stage liver cancer: a protein with a value of >0.01 was removed from the list of significant proteins in carcinogenesis (in other words, if the value of was >0.01, then the th protein was removed from the in (8), and the remainder in the with values of CRVs of <0.01 was considered significant proteins of the th-stage liver cancer).
Based on values of the CRVs for all proteins () and the two stages of liver cancer ( = early-stage and late-stage liver cancer), we generated two lists of significant proteins for each of the two stages according to the CRV and a statistical assessment of each significant protein in the in (8). We found 152 significant proteins in the early-stage liver cancer and 50 significant proteins in the late-stage liver cancer. These proteins showed significant changes between the CPPIN and NPPIN in the carcinogenic process according to their corresponding stage of cancer, and we suspected that these changes might play important roles in the carcinogenesis process of liver cancer. These findings warrant further investigation.
Intersections of these significant proteins in the early and late stages of liver cancer and their PPIs are known as the core network markers appearing in all stages of liver cancer. In contrast, unique significant proteins and their PPIs in each stage of liver cancer are known as specific network markers for each stage of cancer. We found 18 significant proteins that could be classified as core network markers over the entire carcinogenesis process of liver cancer. We also found 134 significant proteins as specific network markers of early-stage liver cancer and 32 significant proteins as specific network markers of late-stage liver cancer.
2.5. Pathway Analysis
Much valuable cellular information can be found using known pathways, which are useful for describing most “normal” biological phenomena. All of these known pathways are the result of repeated testing and verification, and the entire pathway network has defined most links. Therefore, the proteins we identified to be significant in the above network markers were mapped onto known pathway networks (e.g., the Kyoto Encyclopedia of Genes and Genomes (KEGG) and PANTHER pathways) to investigate significant pathways with network markers and explore relationships between these pathways and the carcinogenesis of liver cancer. This approach supports the view that systems biology can help identify significant network biomarkers in both normal and cancerous pathways and their roles in the pathogenesis of cancer.
Together with comprehensive pathway databases such as the KEGG, we used a series of bioinformatics pathway analytical tools to identify biologically relevant pathway networks . The KEGG includes manually curated biological pathways that cover three main categories: systems information (e.g., human diseases and drugs), genomics information (e.g., gene catalogs and sequence similarities), and chemical information (e.g., metabolites and biochemical reactions). At present, the KEGG contains 134,511 distinct pathways generated from 391 original reference pathways . Therefore, to investigate pathways involved in carcinogenesis, the bioinformatics database, DAVID [38, 39], which generates automatic outputs of the results from a KEGG pathway analysis , was used for the pathway analysis of significant proteins identified as network markers to determine their roles in the pathogenesis of early- and late-stage liver cancer. Our methodology did not include a pathway analysis or gene set enrichment analysis (GSEA). To complete our research results, we used NOA software for the pathway analysis and GSEA of biological processes, cellular components, and molecular functions [40, 41].
2.6. Explore the Evolution of Network Biomarkers in the Liver Carcinogenesis via PPI Network through NGS Data
Our cancer PPI model was constructed from the differential expression of cancer and noncancer microarray data and data mining of PPI information from the BioGRID database. So, the early-stage and late-stage liver CPPINs and NPPINs were the results of our systems biology model using the original NGS data and PPI databases. There were two key factors that affected our final results.
(i) The Effect of Different Original PPI Databases. We know that PPI databases, such as the BioGRID and MIPS, are constructed from putative samples and validated by wet-lab experiments. Due to advances in many high-throughput experimental skills, the original PPI databases have evolved over time. The newly updated original PPI databases are the second factor to affect the final results.
(ii) The Effect of the Systems Biology Model. Microarray data, PPI databases, and the PPI interaction model in (1) were employed to construct PPI networks of normal and cancer cells by the maximum-likelihood parameter estimation method (see Supplementary Materials S.1). The AIC system order detection method (Supplementary Materials S.2) was used to prune false-positive PPIs to obtain actual PPI networks of normal and cancer cells; that is, we used the so-called reverse engineering method to construct PPI networks of normal and cancer cells. Then, the DPPIN between the CPPIN and NPPIN was obtained in (7) to investigate PPI variations of each protein in the DPPIN due to carcinogenesis. Finally, the CRV based on PPI variations was also proposed to evaluate the significance of carcinogenesis for each protein of the DPPIN. Proteins with a significant CRV () were considered significant proteins of the cancer. Significant proteins in Table 3 were these significant proteins of early-stage and late-stage liver cancer, and these proteins and their PPIs were used to construct the PPI network in Figure 2. Finally, from early-stage and late-stage liver cancer network markers, we investigated mechanisms of the carcinogenesis process with the help of databases (e.g., the GO database and the DAVID and KEGG pathway databases) to find multiple network targets for cancer therapy. Unlike conventional theoretical methods which always give a single mathematical model for a cancer network for a more-detailed theoretical analysis, this study introduced a systems biology approach to cancer network markers based on actual NGS data through so-called reverse engineering, a systems statistical method, and a data mining method in combination with large databases. These are the novelty and significance of our study. Although we described the novelty of our systems biology model, we have validated our results by a literature survey of research. In the future, our results can be validated by other researchers’ wet-lab experiments, and we will repeatedly modify our mathematical system model. This is the third key factor that affected our results. Although not directly, it also influenced the protein interaction network.
(a) DPPIN of early-stage liver cancer
(b) DPPIN of late-stage liver cancer
(c) DPPIN of total-stage liver cancer
We also know that biosystems evolve with time. It is obvious that early-stage and late-stage patients have very different symptoms; these are key features we used to classify early-stage and late-stage liver cancer. Since liver cancer patients in the two stages have very different symptoms, the NGS data of these two stages of patients should undoubtedly greatly differ. As described above, protein expressions from NGS data are one of the key factors of our systems biology model producing the final CPPINs and NPPINs, and the CPPINs and NPPINs gave the final network biomarkers from our systems biology model. So, the most important thing for evolution of network biomarkers is evolution of the NGS data in both stages of liver cancer, which is inherent in the exhibition of cancer-related genes due to DNA mutations in the carcinogenesis process.
3. Results and Discussion
3.1. Evolution of Network Biomarkers in Early-Stage and Late-Stage Liver Cancer
We built the DPPIN to examine the early, late, and total stages of liver cancer (Figure 2). CRVs of each protein in the three networks were calculated. This figure shows more-detailed information than just the CRVs; that is, it shows the changes of edges and nodes at different stages of the network. By screening using values of CRVs, we found significant proteins of network markers for these three stages of liver cancer. Similar to our previous experience with bladder cancer , we wanted to reveal the carcinogenesis mechanisms of liver cancer in the early and late stages. Because we combined the early stage (stages I and II) and late stage (stages III and IV) into a total stage (stages I~IV), it is not surprising that most of the pathways associated with either the early or late stage overlapped those of the total stage. We had to perform this step because NGS samples are more difficult (and much more expensive) to obtain than microarray data; that is, there were few samples. This shortage may have caused overfitting in the parameter estimation process of our model in (2). So we used the total stage again to build a model to avoid overfitting. To summarize, we built a model using early, late, and total stages. Numbers of samples in these three stages were 19, 18, and 37, respectively (37 being the sum of 19 and 18). The reason for using the total stage is that 19 and 18 are small numbers of samples, and there was the chance that overfitting may have occurred. So we used the sum of the early and late stages (37) to see the entire picture between cancer and noncancer tissues.
3.2. Network Markers of Early, Late, and Total (Early Plus Late Stages) Stages of Liver Cancer
After value (0.01) screening, we found that there were 43, 80, and 74 significant proteins for the early, late, and total stages of liver cancer, respectively. In addition, their corresponding CRVs ranged 6.0~46.5, 6.1~34.1 and 6.5~81.5, respectively. These significant proteins and their PPIs were used to construct network markers for the early, late, and total stages of liver cancer. The intersection network of markers of the early and late stages was a core feature that contained 27 significant proteins associated with carcinogenesis. We list the 27 significant proteins and their corresponding CRVs and values in both stages of liver cancer in Table 2. In the second step, we separately list the top 20 significant proteins in the early and late stages of liver cancer in Table 3. The full list of the 43, 80, and 74 significant proteins for these three stages of liver cancer is given in Supplementary Tables S3. Table 4 shows the top 30 proteins of the total stage. Table 5 shows interactions with our previous results using microarray data.
3.3. Pathway Analysis of the Total Stage of Liver Cancer
We first analyzed the pathway of the total stage of liver cancer using the David database. As stated above, the key point of this research was to find evolutionary mechanisms of liver cancer from the early and late stages, but the number of samples of NGS data was small, so we had to combine the early and late stages to see the overall picture of the liver cancer network. The five key pathways we were interested in, which were selected by these 74 key proteins, are listed as follows: (1) 14 proteins in hsa04110 were associated with the cell cycle (Figure 3(a)), (2) seven proteins in hsa04114 were associated with oocyte meiosis (Figure 3(b)), (3) five proteins in hsa03030 were associated with DNA replication (Figure 3(c)), (4) 10 proteins in hsa05200 were associated with cancer pathways (Figure 3(d)), and (5) four proteins in hsa04115 were associated with the p53 signaling pathway (Figure 3(e)). We only focused on those pathways related to liver cancer and eliminated unrelated ones, such as prostate cancer, glioma (brain cancer) pathway, and small cell lung cancer pathway (Table 6).
(a) The proteins in the total stage liver cancer network marker are enriched in “hsa04110:Cell cycle” (Rank 1 in Table 6)
(b) The proteins in the total stage liver cancer network marker are enriched in “hsa04114:Oocyte meiosis” (Rank 3 in Table 6)
(c) The proteins in the total stage liver cancer network marker are enriched in “hsa03030:DNA replication” (Rank 4 in Table 6)
(d) The proteins in the total stage liver cancer network marker are enriched in “hsa05200:Pathways in cancer” (Rank 5 in Table 6)
(e) The proteins in the total stage liver cancer network marker are enriched in “hsa04115:p53 signaling pathway” (Rank 11 in Table 6)
(f) The proteins in the total stage liver cancer network marker are enriched in “h_g1Pathway:Cell Cycle: G1/S Check Point” (Rank 1a in Table 6) (BioCarta)
(g) The proteins in the total stage liver cancer network marker are enriched in “h_p27Pathway:Regulation of p27 Phosphorylation during Cell Cycle Progression” (Rank 2a in Table 6) (BioCarta)
(h) The proteins in the total stage liver cancer network marker are enriched in “h_p53Pathway:p53 Signaling Pathway” (Rank 3a in Table 6) (BioCarta)
(i) The proteins in the total stage liver cancer network marker are enriched in “h_her2Pathway:Role of ERBB2 in Signal Transduction and Oncology” (Rank 4a in Table 6) (BioCarta)
(1) Cell Cycle. An abnormal cell cycle is highly related to cancer mechanisms. We described this in our previous study [15, 16]. More clues and mechanisms were revealed in this study, so we have to repeat and highlight the previous results while adding the new results. Dysregulation of the cell cycle governs deviant cell proliferation in cancer. Loss of the ability to control cell cycle checkpoints induces abnormal genetic instability. It is well known that the cell cycle is composed of two consecutive periods, characterized by DNA replication, and of sequential differential and segregation of replicated chromosomes into two separate daughter cells. This may be due to activation of tumorigenic mutations, which were recognized in various tumors at different levels in mitogenic signal transduction pathways: (1) ligands and receptors (receptor mutations of HER2/neu [ErB2] or amplification of the HER2 gene), (2) downstream signal transduction networks (Raf/Ras/MAPK or PI3K-AKT-mTOR), and (3) regulatory genes of the cell cycle (cyclin D1/CDK4, CDK6, and cyclin E/CDK2) . There are many reported discussions regarding cell cycle regulators and checkpoint functions involved in HCC [43–46]. Herein, 14 of the total stage network markers identified were involved in the cell cycle pathway. In this case, we found that five of these 14 markers were related to the DNA replication pathway. This shows that the cell cycle and DNA replication are highly related to liver cancer at the total stage.
Because the cell cycle is so crucial to cancer, we list other pathways given by BioCarta (Figures 3(e) and 3(f)). The former shows that these key proteins are related to the G1/S checkpoint, while the later shows that they are related to regulation of p27 phosphorylation during cell cycle progression.
Dituri et al. showed that changes in the cell cycle checkpoint frequently occur during HCC. They identified different pathways from us and showed that the phosphatidylinositol-3-kinase (PI3K)/protein kinase B (AKT)/mammalian target of the rapamycin (mTOR) pathway is the key pathway for HCC. They used the human PLC/PRF/5, Hep3B, HepG2, HLE, and HLF HCC cell lines and a normal human hepatocyte cell line. Although they and our group did not identify the same targets, the key point is that an abnormal cell cycle is a complex mechanism. Their scope covered the G0-G1, G2-M, and G1-S phases . Our future work will try to link our network makers with mechanisms discovered by other groups.
Furuta et al. showed that micro- (mi)RNAs are key posttranscriptional regulators of gene expression and are usually deregulated in HCC. They identified four miRNAs, mir-101, mir-195, mir-378, and mir-497 that are always silenced in HCC. In this research, we did not include miRNAs when building the model . We know that miRNAs play significant roles in genetic regulatory networks, and our future work will incorporate miRNAs into our model to reveal more hidden mechanisms of HCC.
(2) Oocyte Meiosis. We did notidentify this pathway in our previous study. Zhang et al. used network-based bioinformatics methods to identify biomarkers for HCC and identified this pathway with 19 upregulated genes with a value of 0.0016 . We identified seven genes in this pathway with a very low value of 8.70−05. Terret et al. claimed that mouse oocytes are a paradigm of cancer cells. They showed asymmetric division in both mitotic cells and oocytes cells, and this causes cancer cells to divide .
(3) DNA Replication. As we stated above, we identified five genes of the DNA replication pathway which are also involved in the cell cycle pathway. Macheret and Halazonetis claimed that DNA replication stress is a hallmark of cancer , so we have to discuss this topic in more detail here. Hanahan and Weinberg published two review papers on the topic of cancer hallmarks, and the key points are listed here. Based on their first-generation cancer hallmarks , Hanahan and Weinberg extended their work to second-generation hallmarks of cancers . It inspired many people and stimulated many ideas on cancer targeted therapies.
(i) Sustaining Proliferative Signaling. Cancer cells can sustain proliferative signaling in various ways, become independent of exogenous growth signals, and so can proliferate. They also have the ability to generate growth factor ligands themselves.
(ii) Evading Growth Suppressors. Cancer cells can ignore signals that inhibit cell proliferation. Retinoblastoma-associated (RB) and TP53 proteins are the two main prototypes of tumor suppressors. They are also important targets of cancer therapies.
(iii) Activating Invasion and Metastasis. Invasion and metastasis comprise multistep processes, usually called the invasion-metastasis cascade. This cascade is related to the mechanisms of miRNAs [50, 51].
(iv) Enabling Replicative Immortality. Cancer cells becoming immortal is a lethal event for cancer patients. There are three important topics to this immortality: reassessing replicative senescence, delayed activation of telomerase, and new noncanonical functions of telomerases.
(v) Inducing Angiogenesis. Both invasion and metastasis need new-born blood vessels to supply nutrients for cancer cells.
(vi) Resisting Cell Death. Cancer cells have many strategies to disable apoptosis. Cancer cells becoming immortal is another lethal event for cancer patients.
When we tried to apply a systems biology approach to cancer therapy, understanding cancer hallmarks was the most important and basic step. There are many investigations of cancer systems biology based on Weinberg’s work. Negrini et al. discussed the genomic instability characteristics of cancers and evolving hallmarks of cancer . Hornberg et al. used the concept of complex signaling networks and increases in complexity to directly name cancers as systems biology disease . Barillot et al. merged the six hallmarks into the following three main properties related to the proliferation, survival, and dissemination of cancers. They are the defective control of the cell cycle, defective control of cell death, and the invasiveness and metastatic potential. There are three key levels at which normal cells transform into cancer cells: mutations of DNA, transcription of DNA to RNA, and RNA translation to proteins. Hallmarks are omens and portent of cancers, and, by a pathway analysis, we can find drug targets at this stage.
Cancer is a network disease, that is, a dysregulation of the entire network. Determining how to build a cancer network is the first important task.
We have briefly reviewed and summarized the key points of Weinberg’s theory. Now we focus on DNA replication stress, because it was identified as another hallmark of cancer. Macheret and Halazonetis claimed that the sustained proliferation hallmark can be regarded as mutations in oncogenes and tumor suppressor genes that are involved in the cell growth pathway. Mutations of TP53, ATM, or MDM2 genes can allow escaping from the apoptosis hallmark. He discussed oncogene-induced DNA replication stress and the role it plays as a cancer progression driver .
(4) p53 Signaling Pathway. Soussi et al. discussed a multifactorial analysis of p53 alterations in human cancers . Ueda et al. claimed that the functional inactivation but not the structural mutation of p53 can cause liver cancer .
3.4. Pathway Analysis of Early-Stage Liver Cancer
Four of these five key pathways discussed in the total stage (cell cycle, DNA replication, oocyte meiosis, and p53 signaling) were also selected by both early and late stages, although the proteins involved in these pathways were not the same (Table 7). We should pay more attention to the evolution from the early to late stages.
(1) Mismatch Repair. This is a key pathway that was only selected by the early stage (with higher value of 0.09) but not the late stage (Figure 4). Fujimoto et al. sequenced the entire genome of liver cancers to identify etiological influences on mutation patterns. They discussed transcription-coupled repair and mismatch repair-deficient tumors to show that mismatch repair plays a significant role in liver cancer .
3.5. Pathway Analysis of Late-Stage Liver Cancer
(1) Spliceosome. It is important that we selected the spliceosome pathway again in late-stage liver cancer (Table 8), because we selected it in our previous results of late-stage bladder cancer. We review previous results of the spliceosome pathway  and then add new results here. Alternative splicing is a modification of premessenger (m)RNA transcripts in which internal noncoding regions of pre-mRNA (introns) are removed and then the remaining segments (exons) are joined (Figure 5). The formation of mature mRNA is subsequently capped at its 5′ end, polyadenylated at its 3′ end, and transported out of the nucleus to be translated into proteins in the cytoplasm. Most genes use alternative splicing to generate multiple spliced transcripts. These transcripts contain various combinations of exons resulting from different mRNA variants, and these are synthesized as protein isoforms. Exons are always around 50~250 base pairs, whereas introns can be as long as several thousands of base pairs. For nuclear-encoded genes, splicing takes place within the nucleus simultaneously with or after transcription. Splicing is necessary for eukaryotic mRNA before it can be translated into the correct protein. The spliceosome is a dynamic intracellular macromolecular complex of multiple proteins and ribonucleoproteins (RNPs). For many eukaryotic introns, the spliceosome carries out the two main functions of alternative splicing: first, it recognizes intron-exon boundaries, and, second, it catalyzes cut-and-paste reactions that remove introns and concatenate exons. Various spliceosomal machinery complexes are formed from five RNP subunits, termed uridine- (U) rich small nuclear (sn)RNPs, that are transiently associated with more than 760 non-snRNP splicing factors (RNA helicases, SR splicing factors, etc.) [57, 58]. Each spliceosomal snRNP (U1, U2, U4, U5, and U6) consists of a U-rich snRNA complexed with a set of seven proteins known as canonical Sm core or SNRP proteins. The seven Sm proteins (B/B′, D1, D2, D3, E, F, and G) form a core ring structure that surrounds the RNA. All Sm proteins contain a conserved sequence motif in two segments (Sm1 and Sm2) that are responsible for the assembly and ordering of snRNAs. They form the Sm core of spliceosomal snRNPs  and process pre-mRNA . Spliceosomes not only catalyze splicing by a series of reactions, but also are the main cellular machinery that guides splicing. Recently, scientists found two natural compounds that can interfere with spliceosome function that also display anticancer activity in vitro and in vivo [61, 62]. Therefore, it is thought that inhibiting the spliceosome can be a new target for anticancer drug development , and it should be validated in vivo and in vitro in the future .
Tian et al. used a phosphoproteomic analytical method on the highly metastatic HCC cell line, MHCC97-H. They reported that phosphoproteins were also found in the spliceosome pathway, and they were related to liver cancer .
(2) Progesterone-Mediated Oocyte Maturation. Zhang et al. also identified this pathway .
(3) Lysine Degradation. Huang et al. identified this pathway in liver cancer .
3.6. Comparison with Our Previous Liver Cancer Network Biomarkers Using a Microarray Analysis
We found 15 common proteins in this study and our previous results (Table 5). That tells us that the NGS is a revolutionary technology, but it can be trusted to not give totally different results from microarray data.
3.7. Comparison of Microarray and NGS Technology
Gene expression profiling by microarray has been successful at demonstrating the patterns of mRNAs within tissues and cells. Although microarray platforms showed similar levels of concordance with the RNA-seq data, the next generation sequencing (NGS) technologies provided high sensitivity, specificity, and accuracy as compared to the microarray platforms . Due to NGS providing a more detailed observation at the transcriptome, scientists and biologists have been eager to apply NGS for gene expression profiling. Microarrays only return results from those regions for probes which have been designed. Therefore, microarrays are only as good as the databases from which they are designed. NGS methods provide aspects of the transcriptome without a priori knowledge, allowing for the analysis and discovery of novel transcripts, noncoding RNAs, and alternative splicing. Thus, RNA sequencing methods enable the most accurate detection and quantification for transcriptome analysis.
Liver cancer is the third most deadly cancer causing about 700,000 deaths in 2011 worldwide. It is a lethal disease like other cancers. There are three important topics in this research. The first was to compare this study with our previous research of liver cancer microarray data, because the microarray technology may someday be replaced by the NGS technology. We found the results to be good, because there were a lot of key proteins identified by both methods. The second was to reveal the carcinogenesis process from the early to late stages of liver cancer. The specific pathway of the early stage was the mismatch repair pathway, and the specific pathways of the late stage were the spliceosome pathway, lysine degradation pathway, and progesterone-mediated oocyte maturation pathway. This suggests novel directions for choosing different targeted therapeutic strategies at different stages of cancer. In particular, compared to our previous results of bladder cancer, we found that the spliceosome pathway is a significant pathway in the late stage of both liver cancer and bladder cancer. Our future work will focus greater attention on this pathway related to various cancers and consider it as a new drug target for anticancer therapies.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Yung-Hao Wong and Chia-Chou Wu contributed equally to this work.
The authors are grateful for the support provided by the Ministry of Science and Technology with Grants no. MOST-103-2745-E-007-001-ASP and no. 103-2221-E-038-013-MY2.
Supplementary material S.1 uses the Maximum Likelihood Method to do the parameter identification of regression model in equation (1). S.2 uses the AIC and student’s t-test to calculate the p-values of association abilities, and detect the system model order and determine the significance of the model parameters. Table S3: (a) The 43 identified significant proteins of early stage liver cancer. (b) The 80 identified significant proteins of late stage liver cancer. (c) The 74 identified significant proteins of total stage liver cancer.
L. Preziosi, Cancer Modelling and Simulation, Chapman & Hall/CRC, Boca Raton, Fla, USA, 2003.
E. Barillot, Computational Systems Biology of Cancer, Taylor & Francis, Boca Raton, Fla, USA, 2013.
A. Cesario and F. B. Marcus, Cancer Systems Biology, Bioinformatics and Medicine: Research and Clinical Applications, Springer, Dordrecht, The Netherlands, 2011.
E. Wang, Cancer Systems Biology, CRC Press, Boca Raton, Fla, USA, 2010.
Q. Yan, Systems Biology in Drug Discovery and Development: Methods and Protocols, Humana Press, New York, NY, USA, 2010.
H.-F. Juan and H.-C. Huang, Systems Biology: Applications in Cancer-Related Research, World Scientific, River Edge, NJ, USA, 2012.
C.-W. Li, Y. Chu, and B. Chen, “Construction and clarification of dynamic gene regulatory network of cancer cell cycle via microarray data,” Cancer Informatics, vol. 2, pp. 223–241, 2006.View at: Google Scholar
L.-H. Chu and B.-S. Chen, “Comparisons of robustness and sensitivity between cancer and normal cells by microarray data,” Cancer Informatics, vol. 6, pp. 165–181, 2008.View at: Google Scholar
R. Johansson, System Modeling and Identification, Prentice Hall, 1993.
M. Pagano and K. Gauvreau, Principles of Biostatistics, 2000.
L. Zhang, Y. Guo, B. Li et al., “Identification of biomarkers for hepatocellular carcinoma using network-based bioinformatics methods,” European Journal of Medical Research, vol. 18, article 35, 2013.View at: Google Scholar
S. Negrini, V. G. Gorgoulis, and T. D. Halazonetis, “Genomic instability—an evolving hallmark of cancer,” Nature Reviews Molecular Cell Biology, vol. 11, pp. 220–228, 2010.View at: Google Scholar
T. Soussi, Y. Legros, R. Lubin, K. Ory, and B. Schlichtholz, “Multifactorial analysis of p53 alteration in human cancer: a review,” International Journal of Cancer, vol. 57, no. 1, pp. 1–9, 1994.View at: Google Scholar
Y. Huang, X. Yang, F. Zhao et al., “Overexpression of Dickkopf-1 predicts poor prognosis for patients with hepatocellular carcinoma after orthotopic liver transplantation by promoting cancer metastasis and recurrence,” Medical Oncology, vol. 31, no. 7, pp. 1–10, 2014.View at: Publisher Site | Google Scholar
A. L. P. Whitley, C. San Jose, N. Hernandez et al., A Comparison of Next Generation Sequencing and Microarrays for Transcriptome Expression Profiling, Life Technologies, 2009.