Abstract

A novel model for cascading failures in a directed logic network based on the degree strength at a node was proposed. The definitions of in-degree and out-degree strength of a node were initially reconsidered, and the load at a nonisolated node was proposed as the ratio of in-degree strength to out-degree strength of the node. The cascading failure model based on degree strength was applied to the logic network for three types of cancer including adenocarcinoma of lung, prostate cancer, and colon cancer based on their gene expression profiles. In order to highlight the differences between the three networks by the cascading failure mechanism, we used the largest-scale cascades and the cumulative cascade probability to depict the damage. It was found that the cascading failures caused by hubs are usually larger. Furthermore, the result shows that propagations against the networks were correlated with the structures motifs of connected logical doublets. Finally, some genes were selected based on cascading failure mechanism. We believe that these genes may be involved in the occurrence and development of three types of cancer.

1. Introduction

Over the past few decades, many scientists focused on the study of cascading failures in different networks, such as the electrical power networks [13], traffic networks [4, 5], Internet networks [6], social networks [7, 8], and even biological networks [9, 10]. The various models of cascading failures and their mechanisms, as well as their prevention, have been proposed. For instance, Motter and Lai [11] proposed a load-capacity cascading failure model and simulated an arbitrary power exponent of scale-free networks. The results showed that loads would redistribute among the nodes, and intentional attacks would lead to a cascade of overload failures, which could cause the entire part of the network to collapse. Wang and Xu [12] investigated cascading failures in coupled map lattices with different topologies and found that cascading failures are much easier to occur in small-world and scale-free coupled map lattices than in globally coupled map lattices. Crucitti et al. [13] presented a simple model for cascading failures based on the dynamical redistribution of the flow in the network, showing that the breakdown of a single node is sufficient to reduce the efficiency of the entire system if the node is among those with the largest load.

Recently, some researchers focused on the cascading failure mechanisms for directed networks. Fang et al. [14] proposed the cascading failure model in the context of directed complex networks. They used two attack strategies including minimum in-degree and the maximum out-degree attack strategy, which were compared with random attack strategy through simulations. Numerical results show that the cascading failure propagation in directed complex networks is highly dependent on the attack strategies and the directionality of the network. Jin et al. [15] built the load-capacity cascading failure model of the directed and weighted network. They applied the models to two typical real networks, namely, the Poisson distribution network and power law distribution network. Through simulation analyses, they concluded that the average weight and the average in-degree should be increased, respectively, for enhancing the resistibility of overloading and short-loading failures. Smart et al. [9] investigated the relationship between structure and robustness in the metabolic networks of Staphylococcus aureus and so on using a cascading failure model based on a topological flux balance criterion.

Despite this success, few studies have attempted to identify the cascading failure mechanism in a directed gene logic network. In this study, we investigate a load-capacity cascading failure model based on the degree strength of nodes and identify the influence of cascading failures on the gene logic networks. The directed network is constructed. The definitions of in-degree, out-degree, and degree strength are refined for different regulation types of second-order logical relationships. Then a novel algorithm for cascading failure based on load-capacity model is investigated. The load at a node is defined as the ratio of the in-degree strength to the out-degree strength of the node. The capacity of a node is the interval from the minimum load to the maximum load that the node can handle. By removing a particular gene node initially, the corresponding number of cascading failure nodes generated is noted. This process is repeated for each gene node in the network. The parameters, that is, the probability that a gene node will yield damage greater or equal to , as well as the largest size ratio of cascading failure, are used to detect the relationship between network structure and robustness. Applying the model to gene expression profiles data for adenocarcinoma of lung, prostate cancer, and colon cancer, we find that hubs connected with other nodes by logical motif are more likely to break down. The study of cascading failure for gene networks may provide useful information underlying the biological mechanism of the formation and the development of cancers.

2. Methods

2.1. In-Degree and Out-Degree in the Logic Network

Bowers et al. [16, 17] proposed the logic analysis of phylogenetic profiles (LAPP) and demonstrated the benefits of identifying the relationships among gene triplets, as they have a greater likelihood of yielding the network organization of the interactions among gene triplets which forms the gene logical network. In fact, it can be considered as a weighted and directed graph that deciphers different logic interactions among gene node, including first-order and second-order logical relationships by the uncertainty coefficient at some thresholds (for details about the gene logical network, see Wang et al. [18] and Zhang et al. [19]).

In the first-order logical relationship, taking , its uncertainty coefficient is defined aswhich measures the probability that gene regulates gene , where and are the Shannon entropies for vectors and , respectively, and is the joint entropy of and . This regulatory relationship is denoted as a weighted and directed edge. Figure 1 gives three topologies for in-degree and out-degree for node of 1st-order logical relationship. Obviously both the in-degree of and the out-degree of are increased by one for . A second-order logical relationship as shown in Figure 2, for example, , has an uncertainty coefficient denoted as that measures the probability of existence of this second-order logical relationship. In this formula, is the logical function. The uncertainty coefficient of can be calculated by

The second-order logic relationship can be considered to be a directed edge with the weight . As in the LAPP method, all such gene triplets, with the corresponding values, give rise to the gene logic networks further studied in our present work.

The definitions of the in-degree and out-degree need to be refined for different regulation types of second-order logical relationships, namely, AND, OR, and XOR. We propose these new definitions based on two principles: the sum of the in-degree and that of the out-degree of all the nodes in a network are equal, and the definition must be consistent with that of the degree and strength in first-order logical relationships.

Based on these two principles, the in-degree and out-degree of second-order logical relationships are defined as follows. If regulates (i.e., 1 appears times), then the in-degree of is increased by . However, the out-degrees of and are determined by the proportion of their contributions to the second-order logical relationship. We can calculate the proportion based on the gene expression data particularly applied to gene networks in this research. Moreover, the second principle is meaningful only when it comes to the OR logic, as and regulate simultaneously for AND logic.

For the XOR logic, we cannot determine how and regulate (cooperatively or independently) merely from their gene expression profiles. For example, the specific algorithm to calculate the proportion of contribution from to that from is depicted by the third proper function . In a gene expression profile, components “1” and “0” denote the presence and the absence of the gene, respectively. An matrix with element 1 or 0 denote the gene expression profiles of genes , , and expressed in columns, where is the dimension of these vectors. Each row of the matrix is a three-dimensional vector, and each column is an -dimensional vector. Let be the frequency of row , which indicates that both and activate ; let be the frequency of row which indicates that only activates ; and let be the frequency of row , which indicates that only conducts the activation. The out-degree added to by this second-order logic is times the total out-degree (i.e., the in-degree increment of ), and the out-degree distributed to is times the total out-degree. Specifically, for the OR logic, the out-degree increments of both and are according to the gene expression profiles of nodes in a network, and the in-degree increment of is . For the AND logic, the out-degree increments of both and are and the in-degree increment of is . For XOR logic, the out-degree increments of both and are , and the in-degree increment of is 1. Table 1 lists the different types of logic relationship as well as their corresponding in-degrees and out-degrees.

2.2. Model of Cascading Failure for the Logic Network

Definition 1 (in-degree strength and out-degree strength of a node). Suppose that there are nodes regulating node only by the first-order logical relationship. Therefore, the in-degree strength of node is defined as , where denotes the uncertainty coefficient of gene node controlling gene node . On the contrary, if node is the source gene node regulating other nodes by the first-order logical relationship, then the out-degree strength of node can be defined as . Considering logical triplets, if node is the target node of node and just by second-order logical relationships, then the in-degree strength of node is defined as

On the contrary, if node and other nodes commonly regulate node only by second-order logical relationships, then the out-degree strength of node is defined as , where corresponds to the types of second-order logic shown in Table 1. Finally, the total in-degree strength (out-degree strength) of node is the sum of all in-degree strength (out-degree strength) of node generating from both first-order logical relationships and second-order logical relationships.

Definition 2 (load at a node). For a nonisolated node , its load can be defined in terms of its local information as the ratio of the in-degree strength to the out-degree strength. Specifically, if the in-degree strength of node is equal to zero and the out-degree strength of node is , then its load . If the in-degree strength of node is equal to and the out-degree strength of node is zero, then its load . If the in-degree strength of node is equal to and also the out-degree strength of node is , then its load .

Definition 3 (capacity of a node). Two capacities in node are defined: for node , the lower limit of capacity is and the upper limit of capacity is , where parameter . Three cases are presented as follows: if , then , and the interval shrinks to a point. If , then , and we consider that the interval becomes . If , it forms a real interval from the minimum load to the maximum load which the node can handle.

When all the nodes are active, the network operates in a free-flow state [11]. However, the removal of a node may cause the loads in other particular nodes to be redistributed to other components. The redistribution may cause the load to increase or decrease beyond the range of its initial capacity interval. In particular, the load may decrease from the positive value to 0 or increase to . Thus, the corresponding nodes would collapse. As a result, subsequent failures would occur. Although it may stop after a few steps, it may also propagate and shut down a considerable fraction of the whole network. The cascading failure model depending on degree strength (D-SCFM) and the mechanism and the relationship between structure and stable are studied to control cascading failure against the gene logic network.

Let be a logic network with the gene node set , the directed edge set , and the edge weighted set . Suppose the logic network does not have multiple edges and self-loops. On the basis of the above-mentioned definitions and symbols, we propose an algorithm as follows.

Input. Initial matrix of the logic network .

Step 1. Initially select a node , and then calculate its load and capacity , .

Step 2. Delete node and its incident edges (both incoming and outgoing edges).

Step 3. Calculate the current load of remaining node and compare it with the initial capacity. Then delete any node which fails along with each of remaining edges.

Step 4. Repeat Step 3 until the failure will not happen.

3. Results

The real gene expression data are all downloaded from the Gene Expression Omnibus (CEO). All databases were based on the Gene Chip Human Genome U133A. The lung normal group was recorded as the control group I and the lung adenocarcinoma as the experimental group I. The prostate normal group was recorded as the control group II and the prostate cancer as the experimental group II. Similarly, the colon normal group was recorded as control group III and the colon cancer as the experimental group III. The specific situation is shown in Table 2. Furthermore, by using the Console Expression Software provided by Affymetrix Company, we obtain their value, value, and corresponding values, where represents Present (expression), represents Absent (not expressed), and represents Margin. The value in the database is recorded as 1, and the values of and are all recorded as 0.

However, there are few samples with too many genes (beyond 20000 genes) in each data set. We shall choose significant difference genes between the control groups and the experimental groups for the three types, respectively. We select candidate genes on the Wilcoxon rank sum test [19] at the significance level by the corresponding values. Finally, 60, 65, and 79 genes were filed out from initial data and finally their expression matrices were obtained where each row represents a gene and contains a binary string of 0’s and 1’s to indicate the presence or the absence of the gene (http://cise.sdust.edu.cn/labs/other/zhangyulin/2017/workingdata.rar).

Two thresholds, namely, first-order and second-order threshold are used to detect the connections among nodes in the gene logic networks. We obtain the structural features including the numbers of nodes and edges versus two thresholds in Table 3. The number and distribution of the two order logic types in the networks change with the thresholds. The degree of each node subsequently changes, as a result its in-degree strength and out-degree strength will also change. With the increasing of threshold, the average degree of network nodes decreases. We try to analyze the relationship between robustness for cascading failure and network structural features such as degree and network motif under some thresholds.

By initially removing a gene node, failure cascades characterize the resultant cascade by its total number of other nodes deleted. After deleting node , the failure of nodes (including node ) and is an approximate indicator of network damage. The largest size ratio of cascading failure . Letwhere is a variable parameter. Then the cumulative probability of cascading failures is defined as , denoting the probability that the network’s cascading failures are larger than . The structural parameters defined above are used to measure the relationship between the network structure and the robustness of a network when successive failures occur. We focus on the key nodes that cause large-scale cascading failures on the network, that is, the key failure nodes, which are related to the parameters such as first-order threshold , second-order threshold , and capacity parameter . Firstly, the capacity parameter plays an important role in maintaining robustness of the network. Let be a value from 0.1 to 0.9 with increment of 0.1. Figure 3 gives the change curves of largest size ratio of cascading failure versus the capacity parameter . Obviously, the smaller capacity parameter is, the more easily logical network is to fail in cascading failure.

If the thresholds are relatively small to zero, the connectivity of the network is very high. Not only is there no difference between the networks, but also the computational difficulty increases. While the thresholds are relatively large, lots of nodes in the network will be isolated. The selection of the thresholds is too large or too small not to conform to the practical biological significance. In the paper, four sets of thresholds at , , , and for three types of logical network are given to analyze the change of parameters for cascading failures. When we fix the parameter  , then the corresponding cumulative distribution curve for each type under the thresholds is shown in Figure 4. Obviously, with increasing values of , reduces to zero. The distributions have a similar form for the types we studied: they are broad-tailed, indicating that most cascades are small, while some are quite large. These large failures represent lethal events, so that the behavior of at large is of special interest. In fact, with the increasing of thresholds, more and more isolated nodes and smaller connected branches appear in the network. The connectivity of the network is reduced, and the integrity of the network structure has been seriously compromised.

4. Conclusions

In our model, each node in the network is initially deleted and then cascading failure spread over the entire network. We try to obtain these nodes which can lead to the larger scale cascading failure. Removing a node initially, the failure of these nodes will lead to the failure of other nodes in the network. The four genes CDH1, MYC, SOS2, and CDKN1A are obtained from the prostate cancer network. Similarly, five genes including TOP2A, REL, SHH, ROS1, and CHEK2 in colon cancer gene network and three genes including RBL1, MAPK9, and PIK3CA in adenocarcinoma of lung gene network are selected. Table 4 lists the gene nodes causing larger size cascading failure under all thresholds, where their in-degrees and out-degrees are given. It can be found that the nodes that cause the large-scale successive failures of the network are those nodes with larger in-degree or out-degree. The nodes with larger degree are closely associated with other nodes. If they are deleted, the cascades spread throughout almost entire network. However, the nodes with larger degree do not necessarily lead to large-scale cascading failures which are determined by the coupling relationship between nodes such as logical motifs.

The logic motifs are some doublets which are a combination of 2nd-order or 1st-order logical relationships with at least one common node. In Figure 5, (a), (b), (c), and (d) show all possible second-order logic doublets centered on node . Nodes (e) and (f) in Figure 5 give another logic doublets centered on node . These logic doublets are named according to the different positions of as “both-in,” “both-out,” and “in-out” doublets. For example, (a) and (e) are “both-in” doublets for node ; therefore, node has only incoming edge but no outgoing edge, so its load . For (b), (c), and (f), node has only outgoing edge but no incoming edge. So the in-degree strength of node is equal to zero and the out-degree strength of node is ; then its load . For (d), node has both incoming and outgoing edge, so its in-degree and out-degree strengths are all greater than zero; hence, its load . If node closely connected to other nodes by logical motifs is deleted initially, then it would cause any other nodes to break down easily.

5. Discussion

In the study, we look into the propagation of cascading failures in gene logic network occurring from initial failure using one by one deletion strategy. A new model based on load-capacity at nodes for cascading failure in the directed logic network is proposed. It attempts to explore the relationship between robustness and structure of the network. We apply the load-capacity cascading failure method based on degree strength to gene expression profiles data from the NCBI for three types of cancer gene networks including adenocarcinoma of lung, prostate cancer, and colon cancer. We find that if the hubs are deleted, it will cause larger cascading failure. As such these nodes are possibly related to the occurrence and development of three types of cancers. Table 5 lists the genes and their gene annotations.

Some genes have been confirmed in the literature associating with corresponding cancer. For example, Cherfas [20] found that gene CHEK2 is closely related to the occurrence and development of colon cancer. Cai et al. [21] detected the expression of SHH gene in 38 surgical resection of colon cancer. The aberrant state of the SHH signaling pathway may be involved in the development of colon cancer. Gene TOP2A encodes DNA topoisomerase, which can be used as a target for many anticancer drugs, and many of its variants are closely related to the development of resistance. The MYC gene is a regulator gene that codes for a transcription factor. It is located on chromosome 8 and believed to regulate expression of 15% of all genes through binding on Enhancer Box sequences and recruiting histone. This means that in addition to its role as a classical transcription factor, MYC also regulates the global chromatin structure by regulating histone acetylation both in gene-rich regions and at sites far from any known gene. Koh et al. [22] found MYC to be one of the top genes overexpressed in human prostate cancer tissues, as compared to matched normal-appearing prostate tissue.

Baldi et al. [23] found that the expression levels of RBL1, a protein similar to that encoded by the gene pRb2, were negatively related to the histological stage and metastasis of lung tumors. Therefore, gene RBL1 is a tumor suppressor gene of lung cancer. Gene PIK3CA encodes an alpha subunit of the phosphatidylinositol 3-kinase. Samuels and Velculescu [24] found high frequency variations of the PIK3CA gene in breast cancer and lung cancer. Most mutations are clustered in two locations in the PI3K helix or its catalytic role, and at least one hotspot mutation has increased kinase activity.

The paper proposed a load-capacity cascading failure model based on the degree strength of nodes and identified the influence of cascading failures on the gene logic networks based on their gene expression profiles. By numerical experiment, the parameters in the cascading failure model on the networks were analyzed to obtain the relationship between network structure such as degree and cascading failure. Finally, we obtained some gene nodes leading the larger scale cascading failure on the networks under the thresholds. These genes may play an important role in the development or metastasis of cancer. Due to the limited operation, Rank sum test is used to determine significant difference gene sets at a significant level firstly and this will inevitably lose some genes related to the specificity cancer. In addition, the specific biological significance of these genes still needs further validation by biologists.

Disclosure

The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Conflicts of Interest

The authors declare no conflicts of interest.

Authors’ Contributions

Yulin Zhang conceived and designed the experiments; Kebo Lv performed the experiments; Xiao Lu analyzed the data; Maoxian Zhao contributed reagents/materials/analysis tools; Jionglong Su wrote the paper.

Acknowledgments

The research is supported by the National Natural Science Foundation of China (Grants 61370207, 61572522, 61503224, and 61773245), the National Natural Science Foundation of Shandong Province (Grant ZR2015FM014), and Qingdao Postdoctoral Research Project (2016110).