BioMed Research International

Volume 2015, Article ID 146365, 10 pages

http://dx.doi.org/10.1155/2015/146365

## Discovering Distinct Functional Modules of Specific Cancer Types Using Protein-Protein Interaction Networks

^{1}Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE 68198, USA^{2}Bioinformatics and Systems Biology Core, University of Nebraska Medical Center, Omaha, NE 68198, USA^{3}Department of Biochemistry and Molecular Biology, University of Nebraska Medical Center, Omaha, NE 68198, USA^{4}Fred and Pamela Buffet Cancer Center, University of Nebraska Medical Center, Omaha, NE 68198, USA

Received 9 January 2015; Revised 12 March 2015; Accepted 31 March 2015

Academic Editor: Md. Altaf-Ul-Amin

Copyright © 2015 Ru Shen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

*Background*. The molecular profiles exhibited in different cancer types are very different; hence, discovering distinct functional modules associated with specific cancer types is very important to understand the distinct functions associated with them. Protein-protein interaction networks carry vital information about molecular interactions in cellular systems, and identification of functional modules (subgraphs) in these networks is one of the most important applications of biological network analysis.* Results*. In this study, we developed a new graph theory based method to identify distinct functional modules from nine different cancer protein-protein interaction networks. The method is composed of three major steps: (i) extracting modules from protein-protein interaction networks using network clustering algorithms; (ii) identifying distinct subgraphs from the derived modules; and (iii) identifying distinct subgraph patterns from distinct subgraphs. The subgraph patterns were evaluated using experimentally determined cancer-specific protein-protein interaction data from the Ingenuity knowledgebase, to identify distinct functional modules that are specific to each cancer type.* Conclusion*. We identified cancer-type specific subgraph patterns that may represent the functional modules involved in the molecular pathogenesis of different cancer types. Our method can serve as an effective tool to discover cancer-type specific functional modules from large protein-protein interaction networks.

#### 1. Background

PPI networks represent the cross talk among groups of proteins, which have a wide range of biological implications [1, 2]. Computational analysis has become an indispensable tool in understanding the functional significance of PPI networks, given the large volumes of PPI data available from systems biology experiments. Specifically, graph theory based computational methods have been widely used to analyze PPI networks [3, 4]. For example, graph kernels and graph alignments have been used to compare similarities between networks [5]; and graph-clustering and module detection have been used to identify functional modules in PPI networks [6]. For a thorough description of different graph-mining algorithms that have been applied to study biological interaction networks, please refer to a recent review [7].

In a previous study [8], we collected differentially expressed genes (DEGs) between tumor and normal samples from microarray studies of nine different solid tumor types, using the Oncomine database [9]. We constructed nine cancer-type specific PPI networks by mapping DEGs to PPIs of five human protein interactome databases including IntAct [10], MINT [11], HPRD [12], DIP [13], and BIND [14]. We studied the commonality among the nine PPI networks and identified the common modules that frequently occur in these networks. These common modules could be functionally important as they were frequently identified in multiple cancer types. In fact, these modules have been closely associated with cancer-related processes such as transcriptional regulation, cell growth, and cell proliferation [8]. While finding common functional modules (subgraphs) that exist among many cancer types was very useful, it is more valuable to find the modules that are specific to only one cancer type. In contrast to our previous study, this study is focused on discovering distinct cancer-specific functional modules that could offer direct targets for effective drug discovery. Distinct modules are those that exist exclusively in one network and can be discovered by finding distinct patterns in PPI networks. From the graph theory perspective, identification of distinct patterns is differential from identification of common patterns, in that the latter converges as the size of modules increase, while the former diverges.

Existing algorithms, such as RNSC (Restricted Neighbourhood Search Clustering), are effective in extracting modules from networks (more details on the existing algorithms are provided in Supplementary File 1) (see Supplementary File 1 in Supplementary Material available online at http://dx.doi.org/10.1155/2015/146365). RNSC is a local search-based, graph-clustering algorithm that defines a naïve cost function and a scaled cost function, resulting in the lowest clustering cost among comparable methods [15]. Starting from an initial random clustering, RNSC moves vertices among different clusters in order to reduce the cost. RNSC maintains a list of moves referred to as Tabu list, which should be avoided to speed up the process. Once the modules are extracted, it identifies distinct modules that exist only in one network but not in the others. Subgraph query algorithms are used to determine whether a module exists in a given network. Such methods require a subgraph isomorphism test, and as a result querying is computationally expensive. SPath is a subgraph query method [16], which maintains a neighborhood signature (NS) consisting of a group of node sets indexed by shortest path distance, for each vertex. During the subgraph query, NS of the vertices are used to generate the shortest paths of the query graph. A few of the shortest paths are selected to represent the whole query graph. Another approach is graph indexing, which is frequently used as an optimization technique in graph-mining. GraphGrep [17] is a graph indexing algorithm that enumerates all the paths up to a certain length in a network and indexes them as a means to later identify every graph that contains all the paths. Yan et al. proposed a method for quick graph indexing and pattern search known as gIndex [18], which performs graph-based indexing instead of path-based indexing. It uses discriminative fragments to index the networks and is therefore suitable for complex query graphs.

In this study, we developed a new graph theory based method to identify distinct modules between the nine PPI networks, where each network belongs to a distinct cancer. We divided the task into three steps: (1) We used RNSC [15], a local search algorithm that divides networks into nonoverlapping substructures to identify modules in networks. (2) We found distinct subgraphs among the identified modules. And (3) we extracted patterns from the distinct subgraphs and searched for these patterns in other networks. If a pattern does not exist in other networks, we defined it as a distinct module. Using this method, we identified distinct modules or subgraphs that are unique to a given cancer type. We also verified if the unique subgraphs indeed represent PPI networks in specific cancer types using quantitative validation methods. To our knowledge, this work represents the first attempt to identify distinct functional modules in cancer using large-scale PPI networks and graph theory based algorithms.

#### 2. Methods

Our method includes three steps: module detection using RNSC, distinct subgraph identification, and distinct pattern identification. We first introduce preliminary concepts and then explain the details of each step in the methodology.

##### 2.1. Graph Theory Preliminaries

*Graph*. A graph is a pair , where is the node set and is the edge set.

*Labeled Graph*. A labeled graph is a triple , where is the node set, is the edge set, and is the function assigning labels to vertices.

*Graph Isomorphism*. Given two graphs and , graph isomorphism is a bijective function such that , , .

*Subgraph Isomorphism*. Given two graphs, and , if there exists a subgraph in such that is graph isomorphic to , then is subgraph isomorphic to .

*Graph Patterns*. Given a labeled graph , the graph pattern of is an abstraction graph such that . The graph pattern is a special case of the graph isomorphism. When the bijective function in the graph isomorphism is defined to be the assignment of same vertex labels, graphs that belong to the same patterns are isomorphic to each other.

##### 2.2. Module Detection Using RNSC

We used RNSC algorithm to generate modules for each of the nine cancer PPI networks. RNSC divides a graph into nonoverlapping connected components, each of which is defined as a module. The results of RNSC clustering depend on the parameter setting. We set up the following parameters for our RNSC runs. (1)*Tabu list tolerance*: Tabu list stores the vertex moves that should be avoided. Tabu list tolerance is the number of times a vertex must appear in the Tabu list before it becomes forbidden to move the vertex. We chose 1 for this value. (2)*Tabu length*: the number of items that are stored in a Tabu list (we set it to 50). (3)*Naive stopping tolerance*: the number of steps the naive scheme will continue without improving the best cost. It determines when to stop running for the naive scheme (we set it to 15). (4)*Scaled stopping tolerance*: the number of steps the scaled scheme will run without improving the best cost (we set it to 15). (5)*Diversification frequency*: it represents the shuffling diversification frequency or the destructive diversification frequency, depending on which diversification scheme is used (we set it to 50). (6)*Shuffling diversification length*: the number of moves for shuffling diversification. If this parameter is set, shuffling diversification will be performed instead of destructive diversification (we set it to 3).

##### 2.3. Distinct Subgraph Identification

Distinct modules are not only the unique subgraphs, but also the unique subgraph patterns (a subgraph can have many patterns based on the edge topology) in networks. From the modules generated by RNSC, we searched for those that exist uniquely in each network. We used canonical labels [8] to represent subgraphs in order to quickly identify distinct subgraphs.

##### 2.4. Module Labeling

In McKay’s canonical graph labeling algorithm [19], the concept of canonical labeling for graphs was introduced. The basic idea is to represent relational graph data using a sequence of symbols that can uniquely identify a graph. Conversely, a graph must be able to be converted to the same sequence of symbols all the time. Koyuturk et al. proposed to use the concatenation of upper triangle of adjacency matrix as the canonical label of graphs [20]. For a graph without edge weights, its adjacency matrix is a binary matrix in which every row or column corresponds to a node in the graph. The value at the row and column of the matrix is “1” if there is an edge connecting node with node , and “0” otherwise. For an undirected graph, its adjacency matrix is symmetric on the main diagonal. Therefore, we can use the upper right triangle of the adjacency matrix to fully represent a graph. An example of the subgraph labeling is shown in Figure 1.