Research Article  Open Access
Identifying Protein Complexes from Dynamic Temporal Interval ProteinProtein Interaction Networks
Abstract
Identification of protein complex is very important for revealing the underlying mechanism of biological processes. Many computational methods have been developed to identify protein complexes from static proteinprotein interaction (PPI) networks. Recently, researchers are considering the dynamics of proteinprotein interactions. Dynamic PPI networks are closer to reality in the cell system. It is expected that more protein complexes can be accurately identified from dynamic PPI networks. In this paper, we use the undulating degree above the base level of gene expression instead of the gene expression level to construct dynamic temporal PPI networks. Further we convert dynamic temporal PPI networks into dynamic Temporal Interval Protein Interaction Networks (TIPINs) and propose a novel method to accurately identify more protein complexes from the constructed TIPINs. Owing to preserving continuous interactions within temporal interval, the constructed TIPINs contain more dynamical information for accurately identifying more protein complexes. Our proposed identification method uses multisource biological data to judge whether the joint colocalization condition, the joint coexpression condition, and the expanding cluster condition are satisfied; this is to ensure that the identified protein complexes have the features of colocalization, coexpression, and functional homogeneity. The experimental results on yeast data sets demonstrated that using the constructed TIPINs can obtain better identification of protein complexes than five existing dynamic PPI networks, and our proposed identification method can find more protein complexes accurately than four other methods.
1. Introduction
The majority of proteins interact with each other to perform a specific biological process [1]. The fast accumulation of proteinprotein interaction (PPI) data has made maps of PPI networks of several model organisms become available [2]. Identifying protein complexes from PPI networks plays a key role in understanding cellular organizations and functional mechanisms [3].
Over the past decades, the studies on identifying protein complexes from static proteinprotein interaction network (SPIN) have yielded many effective methods. The clusteringbased methods such as MCODE [4], ClusterONE [5], MCL [6], PCP [7], APcluster [8], SPICi [9], and DPCLus [10] identify complexes by detecting closely connected structures from SPIN. Gavin et al. [1] discovered the coreattachment structure of yeast protein complexes based on genomewide analysis. Accordingly, CORE [11], COACH [12], WPNCA [13], and MCLCAw [14] were designed to find protein complexes from SPIN. Some methods [15–17] detect protein complexes with biological significance by integrating GObased functional annotations and SPIN, and other methods [18, 19] measure Pearson correlation coefficient between two proteins and weight SPIN to identify protein complexes.
The aforementioned methods mainly focus on identifying complexes from static proteinprotein interaction network (SPIN). However, the real PPI network in cell keeps changing over different stages of the cell cycle [20]. In fact, cellular systems are highly dynamic and responsive to environment cues [21]. So it is expected that modelling the real PPI network as dynamic PPI networks can lead to identifying more protein complexes accurately.
Fortunately, by monitoring simultaneous and quantitative changes in RNA concentration of thousands of genes under various experimental conditions, DNA microarray technology produced a large number of gene expression data [22, 23]. These gene expression data provide potential insights into the dynamics of PPI networks. Thus, the key step of identifying protein complexes from dynamic PPI networks is how to construct nearly real PPI networks using gene expression data. During a whole cell cycle, protein is not always active all the time. To construct the dynamic PPI network, it requires determining the socalled active time point at which protein exhibits activity. According to the periodicity of gene expression, De Lichtenberg et al. [24] constructed dynamic PPI networks over the yeast mitotic cell cycle by determining active time points of each protein. A protein is considered to be active when its level of gene expression exceeds a specified threshold. Tang et al. [22] used a recommended threshold to filter nonactive protein over three successive metabolic cycles and then constructed a timecourse protein interaction network (TCPIN). Instead of using a global threshold, Wang et al. [23] presented a threesigma method, which uses the sum of the gene expression mean and three standard deviations as a threshold, to determine active time points of each protein, and constructed dynamic protein interaction networks (DPIN) and identified complexes from DPIN. Some swarm intelligencebased methods [25–29] also exploit the threesigma method to construct dynamic PPI networks and identify protein complexes. Based on the threesigma method, Zhang et al. [30] calculated the active probability of each protein at different time point to determine the active time point of each protein and constructed dynamic probabilistic protein interaction networks (DPPN). Furthermore, OUYang et al. [31] proposed a time smooth overlapping complexes detection (TSOCD) model to construct dynamic PPI networks to detect temporal protein complexes. Shen et al. [32] used the deviation degree method to construct a TimeEvolving PIN (TEPIN) to detect temporal protein complexes. By adopting a dynamic modelbased method to filter the noisy data from gene expression profiles, Xiao et al. [21] proposed a ksigma method to determine whether a protein at a time point is active and constructed a noisefiltered active protein interaction network (NFAPIN) to detect protein complexes. The forementioned methods mainly consider how to construct the dynamic PPI networks and then apply existing identification method to find protein complexes from the constructed dynamic PPI networks.
Furthermore, some researchers have not only investigated how to construct the dynamic PPI networks but also designed identifying methods to find protein complexes from the constructed dynamic PPI networks. By combining the active probability of proteins and Pearson correlation coefficient of PPIs with static PPI networks, Zhang et al. [33] constructed dynamic PPI networks and proposed a protein complex prediction method. Based on the neighbor affinity and dynamic proteinprotein interaction network, DPCNADPIN method [34] selects the proteins with a high clustering coefficient and their neighbors to consolidate into an initial cluster and iteratively expands the neighbor proteins to the cluster to form a protein complex. TSOCD method [31] captures the temporal feature of networks between consecutive time points and detects temporal protein complexes from the constructed dynamic PPI networks. Shen et al. [35] proposed a method called DCA (Dynamic CoreAttachment), which uses threesigma method to construct dynamic PPI network integrating the inherent organizations of protein complexes and applies an outward expanding strategy to identify protein complexes with the characteristic of coreattachment structure. All four abovementioned works identify protein complexes by constructing dynamic PPI networks using gene expression data and topological features of PPI networks.
We observed that all the existing methods determine active time point of proteins by a conservative and relative high threshold. This leads to losing dynamic information of gene with expression value that is lower than the specified threshold. In this paper, we first exploited the undulating degree above the base level of gene expression instead of the gene expression level to determine the active time point of a protein and constructed temporal PPI networks (TPNs) by protein interaction data and gene expression data. We then proposed a method which not only converts TPNs into temporal interval PPI networks (TIPINs) but also identifies more colocalized, coexpressed, and functionally significant protein complexes from the constructed TIPINs by multisource biological data. Finally, we evaluated our constructed TIPINs and other dynamic PPI networks and compared our proposed identification method with four other competing methods.
2. Methods
In this section, we describe how to construct temporal interval PPI networks (TIPINs) and identify protein complexes from TIPINs.
2.1. Preliminary
Let a graph G=(V, E) represent a static proteinprotein interaction network (SPIN), where V is a set of nodes (proteins) and N=, E is a set of edges (proteinprotein interactions), and e(i, j) denotes the edge between nodes i and j, where i, j=1, 2, , N. Let =(PP, s) denote a set of proteinprotein interactions with reliability score, where PP is a set of interacting protein pairs and s(x,y) denote the reliability score of the interacting protein pair (x,y) in PP. Thus, we use GW=(V, E, w) to denote the graph G weighted by , where the edge weight w(i, j) is defined as follows:
Furthermore, matrix is used to represent the reliability score matrix of GW, where the element of is defined as follows:
If , we define the edge as a rreliable link between nodes i and j, where r is a given reliability threshold and [36].
Let denote the matrix of gene expression of N proteins across T time points. For a gene i, let represent the expression value of gene i at time t and denote the gene expression pattern of gene i, where , , and =, i=1,2,,N, t=1,,T. In fact, is composed of T normalized gene expression values. The normalized gene expression data can be used to measure the undulating degree above the base level of gene expression during a whole cell cycle.
2.2. Temporal PPI Networks
When a protein is involved in a specific biological process, the expression data of the proteincoding gene exhibits the undulation above the base level of the gene expression during the biological process. If the normalized expression value of gene i exceeds a specified threshold at a certain time point, we call that the product of gene i is activated at this time point. Let denote the active state of protein i at time point t, if protein i is active, =1, otherwise, =0, i=1,..., N, and t=1,..., T. For a given active threshold φ, is defined as follows:
Obviously, a lower active threshold φ preserves more gene dynamical information. The best active threshold φ will be discussed in the section “The effect of active threshold”.
In order to model the dynamics of active proteins mentioned above, the dynamical PPI network is introduced. In the following, we discuss how to construct temporal PPI networks (TPNs) by incorporating timecourse gene expression data into static PPI network SPIN. Let =(, ) denote a temporal PPI network at time point t, where and is the set of proteins and the set of interactions between active proteins at time point t, respectively, t=1, 2, , T. We use to represent the reliability score matrix of , where element in is computed as follows:
2.3. Temporal Interval PPI Networks
A protein complex is corresponding to a subgraph in PPI network. To represent the subgraph of a protein complex which appears in several successive temporal PPI networks, we introduce temporal interval PPI networks (TIPINs). A temporal interval PPI network (TIPIN) can be generated by merging several successive temporal PPI networks (TPNs). Given =(, ), t=1,...,T, let and denote the temporal interval PPI network and its reliability score matrix from to , respectively, where and are two time points and 1 ≤≤≤ T. and element in are defined as follows:
Obviously, if 1 ≤ = t = ≤ T, then ==, namely, is the same as . If 1 ≤≤ T, then is newly constructed by , ≤ t ≤ . Here, let l=+1 denote the temporal interval length. Figure 1 shows the generation of by merging the successive temporal PPI networks (TPNs) from to . From Figure 1, we can see that T TPNs can generate T·(T1)/2 TIPINs.
For given time points and , and <, if the PPI subgraph of a protein complex appears in all ,,, and , then also appears in all , t_{1} ≤ t_{2} and t_{1}, t_{2}=, +1,..., and . Because the number of is larger than the number of , the chance of exactly identifying the protein complex from is higher than that from , where t_{1} ≤ t_{2} and t, t_{1}, t_{2}=, +1,..., .
2.4. Identification Method
In this section, we introduce the concepts of the joint colocalization condition, the joint coexpression condition, the GObased functional similarity between proteins, and the expanding cluster condition and then present our identification method.
2.4.1. Joint Colocalization Condition
To accomplish a specific biological process, some proteins physically interact with each other to form a protein complex at the same subcellular localization. Huh et al. [37] investigated the distribution of yeast proteins at different subcellular localization. Without loss of generality, we use yeast protein subcellular localization to illustrate the construction of joint colocalization condition of a protein complex. Yeast protein subcellular localization is classified into 22 categories shown in Table 1 [37]. Based on the subcellular localization category, a 22dimension 01 vector is defined to represent the protein subcellular localization indicating the appearance of a protein during a whole cell cycle.
 
Note. No. is the subcellular localization category number. 
Given a protein p, let LV(p) denote the localization vector of the protein p and denote the ith element of LV(p), i=1,, 22. If the protein p is once localized at the ith subcellular localization category in a whole cell cycle, =1, otherwise, =0, i=1,, 22.
Given a set PS of k proteins and , j=1,, k, let JLV(PS)= denote the joint localization vector of PS, where = , i=1,,22, and “∧” is the logical AND operation on the elements among the localization vectors of proteins in PS. If all proteins in PS perform a specific function at the ith subcellular localization category, then =1, otherwise, =0, i=1,, 22. Obviously, JLV(PS) is also a 22dimension 01 vector.
Given a set PS of proteins and its JLV(PS), let JC(PS)= denote the joint colocalization count of PS. Clearly, JC(PS) is the sum of all elements in JLV(PS). If JC(PS)>0, there exists at least one subcellular localization category where all proteins in PS are jointly colocalized in a whole cell cycle. If JC(PS)=0, all proteins in PS are not jointly colocalized at any subcellular localization category in a whole cell cycle. We define “JC(PS)>0” as the joint colocalization condition.
2.4.2. Joint Coexpression Condition
There exists a correlation between gene expression level and protein activity [38]. The subunits in a permanent complex are coexpressed [39]. That suggests analyzing gene coexpression can reveal the potential interaction between active proteins to some extent.
Given a set GS of k genes and the normalized gene expression value gep_{i}(t) of gene i at time point t, t=1,,T, i=1,,k, we use = to denote the joint gene expression profile of GS, where and “Π” is the multiplication operation on the expression pattern values among k genes. In essence, we can generate by calculating the product of the normalized expression values of k genes in GS at time point t, t=1,..., T.
To measure the joint coexpression level of GS, we use JQ(GS)= to denote the joint coexpression quantity of GS. If JQ(GS)≥γ, all genes in GS are jointly coexpressed, where γ is the given threshold. We define “JQ(GS)≥γ” as the joint coexpression condition.
When the temporal interval length is l, we use l+4 successive expression data to analyze the joint coexpression condition. We set a time window, which covers l+4 successive time points, on the normalized expression data. If the current temporal interval is (, ), the time window covers l+4 time points including 2, 1, ,..., , +1, and +2. If , the time window consists of the following time points: 1, 2,..., , +1, and +2. If , the time window consists of the following time points: 2, 1, ,...,T1, and T.
2.4.3. The GOBased Functional Similarity between Proteins
Ontology provides welldefined, structured, and computable semantics of domain knowledge [40]. Because of the need for consistent description related to genes and gene products across species, gene ontology GO has been launched by a collaborative effort to build complete ontologies in the biological domain [41]. GO terms include biological process (BP), molecular function (MF), and cellular component (CC). BP is referred to as a biological objective to which the gene or gene product contributes. MF is defined as the biochemical activity of a gene product. And CC is referred to as the place in the cell where a gene product is active [42]. These terms are semantically and hierarchically organized into a directed acyclic graph (DAG).
Semantic similarity is a function to measure closeness in meaning between ontological terms [43]. The GO semantic similarity score can be applied to quantify functional similarity between proteins. We compute the GO term based functional similarity (P_{1}, P_{2}) between two proteins P_{1} and P_{2} by formula (7) [44, 45].where ST_{1}= is a term set annotating protein P_{1}, ST_{2}= is a term set annotating protein P_{2}, and Sim(, ST_{2}) and Sim(, ST_{1}) are computed by formula (8).where go denotes a GO term, ST= denotes a set of k GO terms, and sim(go, ) is computed by formula (9).where go_{1} and go_{2} are two different GO terms, l denotes the sum of the lengths of the shortest paths from mica to go_{1} and from mica to go_{2}, h and d represent the depth and the information content of mica, respectively, and =0.2, =0.3, =30, while mica is referred to as the maximum informative common ancestor of two terms go_{1} and go_{2} in a DAG [44].
Correspondingly, we use formulas (7)(9) to calculate the MF term based similarity (P_{1}, P_{2}), the CC term based similarity (P_{1}, P_{2}), and the BP term based similarity (P_{1}, P_{2}) between proteins P_{1} and P_{2}, respectively. The values of (P_{1}, P_{2}), (P_{1}, P_{2}), and (P_{1}, P_{2}) range from 0.0 to 1.0. The larger these values are, the more similar proteins P_{1} and P_{2} are. If (P_{1}, P_{2})≥ω, proteins P_{1} and P_{2} are judged to be similar based on the MF term, where ω is a given threshold. Similarly, if (P_{1}, P_{2})≥σ and (P_{1}, P_{2})≥θ, proteins P_{1} and P_{2} are judged to be similar based on the CC term and the BP term, respectively, where σ and θ are given thresholds.
2.4.4. Expanding Cluster Condition
It is well known that members of a protein complex are similar to each other in functionality. In order to use the seed expanding strategy to mine a functional aggregation cluster from a PPI network, we devise an expanding cluster condition to judge whether a protein is functionally similar to a protein cluster (PC). Our method uses the expanding cluster condition to iteratively add the functionally similar proteins into the protein cluster PC to generate candidate protein complexes with functional homogeneity.
Given a protein cluster PC and a protein u, the CC term based minimal similarity CC(PC,u), the MF term based minimal similarity MF(PC,u), and the BP term based minimal similarity BP(PC,u) between PC and u are defined by formulas (10), (11), and (12), respectively.where r is a given reliability threshold.
To judge whether CC(PC,u), MF(PC,u), and BP(PC,u) exceed their specified thresholds σ, ω, and θ, respectively, we define three Boolean variables bcc, bmf, and bbp as follows:
If at least two out of three Boolean variables bcc, bmf, and bbp are “true” at the same time, the value of EC(PC,u) in formula (16) will become “true”. This means that the protein u is similar with the protein cluster PC in at least two aspects. Therefore, the protein u can be added into the protein cluster PC. We define “EC(PC,u)=true” as the expanding cluster condition.
2.4.5. Algorithm
The main idea of our algorithm is to iteratively construct temporal interval PPI network (TIPIN) from time point t_{1} to time point t_{2}, and identify protein complexes from , 1 ≤ t_{1} ≤ t_{2} ≤ T. To construct different temporal interval TIPINs, our algorithm dynamically constructs TIPINs in a bottomup approach as shown in Figure 1. Firstly, the TIPINs of temporal interval length l=1 are constructed. Next, the TIPINs of temporal interval length l=2 are constructed, and so on. In Figure 1, the direction of arrow indicates the order of constructing TIPINs. To identify a protein cluster, our algorithm initializes a protein cluster by selecting a node not being a member of any identified protein cluster, and successively checks the joint colocalization condition, the joint coexpression condition, and the expanding cluster condition to determine whether to add the adjacent nodes into the protein cluster, and terminates until no nodes around the protein cluster satisfy all three abovementioned conditions. By repeating the identifying process of a protein cluster, different protein clusters (PCs) are identified one by one from the constructed TIPINs. We call our algorithm as ICJointLEDPN (Identifying protein complexes with the features of joint colocalization and joint coexpression from Dynamic Protein Networks). Algorithm 1 describes ICJointLEDPN in detail.
Input: Reliabilityscored PPI data set , gene expression matrix .  
Output: Candidate Protein Complex set CPCs.  
Begin  
1. ;  
2. for to do  
3. for t_{2}=T to l step 1 do  
4. t_{1} = t_{2} l+1;  
5. construct ;  
6. agglomerate the jointly colocalized, jointly coexpressed, functionally similar  
proteins to identify all protein clusters PCs one by one from the constructed ;  
7. ;  
8. end for  
9. end for  
10. Remove the protein complexes of size 1 in CPCs;  
11. Postprocess CPCs to ensure that no duplicate protein complexes appear in the CPCs.  
End. 
By converting temporal PPI networks TPNs into temporal interval PPI networks TIPINs, the constructed TIPINs preserve only interactions lasting over the temporal interval. Besides, the amount of the constructed TIPINs is more than that of TPNs. So, our constructed TIPINs can offer more opportunities to accurately identify more protein complexes.
Now we analyze the time complexity of ICJointLEDPN. Consider Algorithm 1, ICJointLEDPN dynamically constructs TIPINs. For T time points, ICJointLEDPN can construct T∙(T+1)/2 TIPINs. For each constructed TIPIN, there are at most N protein nodes, where N is the total number of protein nodes in the constructed TIPIN. For each protein node not being a member of any identified protein cluster, ICJointLEDPN selects this protein node as an initial protein cluster and expands the protein cluster by checking N1 other protein nodes. The time complexity of identifying protein complexes from each constructed TIPIN is O(N∙(N1)), namely, O(N^{2}). Therefore, the time complexity of ICJointLEDPN is O(N^{2}∙T∙(T+1)/2)=O((N∙T)^{2}).
In the following section, we evaluate our constructed TIPINs and other dynamic PPI networks and compare our proposed identification method with other competing methods.
3. Experiments and Results
In this section, we first introduce the testing data sets and the benchmark data. Subsequently, we describe metrics evaluating the quality of identified protein complexes. Finally, we present the experimental results and comparative analysis.
3.1. Experimental Dataset
To construct temporal interval PPI networks (TIPINs), we selected three yeast PPI data sets. The first one, downloaded from the STRING database V10 version [36], consists of 6418 proteins and 939998 interactions with reliability score. The second one containing 5811 proteins and 256516 interactions was downloaded from the BioGrid database 3.4.128 version [46]. The last one, containing 5022 proteins and 22381 interactions, was downloaded from the DIP database with the release date 2015/07/01[47]. According to formula (1), we used reliability scores annotating interactions in STRING to score the interactions shared in STRING and BioGrid/DIP.
Furthermore, we selected two yeast gene expression data sets to conduct the comparative experiment. One data set, GSE3431 [48], is extracted from the file GDS2267_full.soft which was acquired with access number GDS2267 on http://www.ncbi.nlm.nih.gov/sites/GDSbrowser. GSE3431 is an expression profile of yeast by Affymetrix Yeast Genome S98 Array over three successive metabolic cycles. GSE3431 contains 36 raw gene expression data gathered at 25minute interval. Let T_{1}, T_{2},..., and T_{36} denote the 36 successive time points, thus we can calculate the average value ave_ of three raw gene expression data at three time points , , and for each gene in GSE3431. The average value ave_ is used to represent the ith gene expression value, i=1, 2,..., 12. We used the 12 gene expression values for each gene to analyze joint coexpression condition and construct TIPINs for GSE3431. Another data set GSE4987 [49] is composed of gene expression data of wild type W303a cells, which are sampled at 5minute interval over two hours per cell cycle across two cell cycles. GSE4987 contains 50 raw gene expression data across two cell cycles, where there are 25 raw gene expression data per cell cycle. Similarly, we calculated 25 gene expression values for each gene in GSE4987, and used the 25 gene expression values for each gene to analyze joint coexpression condition and construct TIPINs for GSE4987.
In addition, we used the yeastrelated protein localization data [37], downloaded from http://yeastgfp.yeastgenome.org, to analyze joint colocalization condition. The GO term annotations of the yeastrelated proteins were downloaded from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE3431. We used the GO term annotations to calculate the GO term based functional similarity between proteins. The known complexes set CYC2008 containing 408 manually curated heterometric protein complexes was downloaded from http://wodaklab.org/cyc2008/ [50].
3.2. Evaluation Metrics
Comparing identified complexes with known ones is a commonly used evaluation. There are several statistical matchingbased metrics, which evaluate the quality of identified complexes and the performance of identification methods. The biological relevancebased metrics, which are independent of the known complexes, are used to evaluate the biological significance of identified complexes.
3.2.1. The Statistical MatchingBased Metrics
If an identified complex and a known complex overlap each other, there exist common proteins between them. The overlapping score between an identified complex ic and a known complex kc, OS(ic, kc), is calculated as follows [4]:where and are the protein set of ic and the protein set of kc respectively. If OS(ic, kc)≥λ, ic is matched with kc, where the threshold λ is usually set to 0.2 [4, 11]. Particularly, OS(ic, kc)=1 indicates that ic is completely matched with kc.
Let IC be the set of complexes identified by computational method and KC be the set of the known complexes. Then let Mic represent the number of identified complexes which matches at least one known complex in KC, and Mkc denote the number of known complexes which matches at least one identified complex in IC. Mic and Mkc are defined as follows [4]:
We evaluate the quality of identified complexes by precision (prec), recall(rec), and fmeasure (fm) [51].
Frac is defined as the fraction of matched complexes, which measures the percentage of known complexes matched with identified complexes [5]. In fact, Frac is equivalent to rec.
The maximum matching ratio (MMR) [5] is based on a maximal onetoone mapping between identified complex and known complex, and it measures accuracy that the identified complexes can represent the known complexes. MMR is calculated as follows [5]:where is the ith known complex, i=1,...,n, and n= , and is the jth identified complex, j=1,...,m and m= .
Let be the number of proteins in the ith known complex and be the number of common proteins between the ith known complex and the jth identified complex, i=1,,n, and j=1,,m. Sensitivity (Sn), positive predictive value (PPV), and geometric mean (Acc) of Sn and PPV [51] are used to assess the accuracy of identification methods. Sn, PPV, and Acc are computed by formulas (24)(26).
As a result, the performance of identification method is evaluated by the comprehensive score FAM, which is calculated by formula (27) [5].
Obviously, FAM is a metric measuring statistical match and is mainly used to statistically evaluate the identification accuracy.
Let #PM be the number of identified complexes that match with known complexes exactly. In fact, #PM is a metric for evaluating the degree of exact match between the identified complexes and known complexes.
In the following, we will illustrate how to use both #PM and FAM to comprehensively compare the quality of two sets of identified complexes via analyzing the relative performance of these two sets of identified complexes.
For two sets of identified complexes with metrics #PM and FAM, S_{1} with #PM_{1} and FAM_{1} and S_{2} with #PM_{2} and FAM_{2}, let G_{1,2} denote the geometric mean of the relative performances of S_{1} and S_{2}, and G_{1,2} is calculated as follows:
If G_{1,2}> 1, then the quality of S_{1} will be superior to that of S_{2} in terms of the product of #PM and FAM; otherwise, the quality of S_{2} will be superior to that of S_{1} in terms of the product of #PM and FAM. Hence, whether the quality of a set of identified complexes is superior to that of another set of identified complexes can be judged in terms of the product of #PM and FAM. In essence, we treat the product of #PM and FAM as a comprehensive score of exact match and statistical match. So, in our experiments, we chose the product of #PM and FAM, #PM×FAM, as the major metric to comprehensively evaluate the quality of identified complexes.
3.2.2. The Biological RelevanceBased Metrics
We noticed that the known complexes are generally incomplete [52]. Even though an identified complex does not match with any known complex, it may be an uncharacterized but valid complex [5]. A protein complex tends to be responsible for a specific biological process or molecular function [53]. Hence, it is necessary for evaluating biological relevance to analyze the overexpression of an identified protein complex.
The GO term based overexpression analysis for biological process and molecular function can be used to reveal functional homogeneity of proteins in a complex to some extent [5]. For a PPI network containing N proteins, we use K to denote the total number of the term Xannotated proteins in the PPI network. For a given complex containing proteins, the pvalue of this complex is defined as the probability that the number of term Xannotated proteins in a protein set of size is not less than , where is the number of the term Xannotated proteins in this complex [54]. The pvalue is computed as follows [54]:
We used the open source software GO::TermFinder [55] to calculate the pvalue of an identified complex.
If pvalue<ψ, we call that the term Xannotated proteins enrich the complex at ψlevel [54], where ψ is a given threshold. If the term Xannotated proteins enrich a complex at the level of ψ=0.01 [54], this complex will has significantly biological function and be called significant complex [5]. The overexpression score of a set of identified protein complexes is defined as the ratio of the number of significant protein complexes to the total number of protein complexes in the set. We can evaluate the biological relevance of a set of identified protein complexes by calculating its overexpression score.
3.3. Experimental Results
Firstly, we evaluated the effect of active threshold φ on the quality of protein complexes identified from temporal PPI networks (TPNs). Secondly, we assessed the protein complexes identified from temporal interval PPI networks (TIPINs). Finally, we compared our method ICJointLEDPN with Zhang’s method [33], DPCNADPIN [34], TSOCD [31], and DCA [35].
3.3.1. The Effect of Active Threshold
Here we first constructed different temporal PPI networks (TPNs) by combining three yeast PPI data sets (STRING, BioGrid, and DIP) with two yeast gene expression data sets (GSE3431 and GSE4987). And then we evaluated the quality of the complexes identified from these different TPNs. Figure 2 shows the variation curves of value #PM×FAM of complexes identified from different constructed TPNs with the changing φ.
(a)
(b)
From Figure 2(a) we can see that for GSE3431, the value of #PM×FAM of the complexes identified from the constructed TPNs is the largest when φ=0.01 for DIP and φ=0.1 for STRING and BioGrid respectively. At the meantime, from Figure 2(b), we can also see that for GSE4987, the value of #PM×FAM is the largest when φ=0.05 for DIP, and the value of #PM×FAM is the largest when φ=0.2 for STRING and BioGrid. Hence, in the following experiments, these values of φ, shown in Table 2, are used to construct different TPNs for different combination of yeast expression data sets and yeast PPI data sets.

3.3.2. Setting of Parameters for ICJointLEDPN
In our experiments, we empirically adjusted the value of parameters σ, ω, and θ to enable ICJointLEDPN to perform well. We adjusted the value of parameters σ, ω, and θ from 0.1 to 0.9 by increment 0.1 through several experiments respectively, and set these parameters to the appropriate values.
Table 3 shows the values of four parameters for algorithm ICJointLEDPN with different combination of yeast gene expression data sets and yeast PPI data sets.

3.3.3. Evaluating Identified Complexes
To evaluate the quality of complexes identified by our method ICJointLEDPN, we first constructed TPNs and TIPINs. And then we executed algorithm ICJointLEDPN to identify complexes from SPIN, TPNs, and TIPINs respectively. Finally, we compared the quality of the complexes identified from SPIN, TPNs, and TIPINs respectively in terms of value of #PM×FAM, which is shown in Table 4.

As seen in Table 4, we can find that for the same yeast PPI data set, the values of #PM×FAM resulting from both TPNs and TIPINs are apparently larger than that resulting from SPIN. This indicates that identifying protein complexes from dynamic PPI networks can improve the quality of identified complexes. From Table 4, we can also see that the value of #PM×FAM resulting from TIPINs is larger than that from TPNs. It means that identifying protein complexes from TIPINs can further enhance the quality of identified complexes. As mentioned in the section “temporal interval PPI networks”, the use of TIPINs constructed by several successive TPNs can provide more opportunities to accurately identify more protein complexes.
To further illustrate the effect of our constructed TIPINs, we ran our algorithm ICJointLEDPN to identify complexes from TIPINs and other existing dynamic PPI networks respectively. The experimental results are shown in Figure 3.
(a)
(b)
We can see from Figure 3 that no matter which yeast PPI data set is integrated with either GSE3431 or GSE4987 to construct TIPINs, the value of #PM×FAM of complexes identified by ICJointLEDPN from the constructed TIPINs is apparently larger than that from other dynamic PPI networks. Such results may partly be attribute to using the relatively low active threshold φ. In addition, by preserving continuous interactions, our constructed TIPINs can indeed offer more opportunities to identify more protein complexes accurately.
As a result, our constructed TIPINs have more contributions to identification of protein complexes than other dynamic PPI networks such as TEPIN, DPIN, NFAPIN, DPPN, and TCPINs.
3.3.4. Comparing Identification Methods
In order to evaluate the performance of the identification methods, we compared our method ICJointLEDPN to three other competing methods Zhang’s method [33], DPCNADPIN [34], TSOCD [31], and DCA[35]. As described in the section “Expanding cluster condition”, in our method, only those PPIs with reliability score not lower than reliability threshold r are used to identify protein complexes. For fair comparison, we removed those PPIs with reliability score lower than reliability threshold r in three yeast PPI data sets before executing four other competing methods. For DPCNADPIN method, no parameters need to be set. Zhang’s method uses two parameters Pre_thresh and Complex_thresh whose default values are 0.5 and 0.1. For DCA, we set parameters to the recommended values α=0.6, β=0.55, and γ=1.4. The setting of nine parameters used in TSOCD method is shown in Table 5.

By analyzing known complexes in CYC2008, we found that the number of the complexes of size two to six exceeds 84% of the total number of known complexes. To evaluate the ability of identifying complexes of size two to six, Table 6 shows the distribution of the size of the complexes identified exactly by five methods.

From Table 6, we can see that our method ICJointLEDPN has stronger ability of exactly identifying the complexes of size two to six than other four competing methods. Especially, DPCNADPIN, TSOCD, and DCA fail to identify any complexes of size two.
To evaluate the overall performance of five competing methods, we reported the statistical matchingbased metrics of the identified complexes in Table 7.

From Table 7, we can see that our method ICJointLEDPN outperforms the other four competing methods in terms of #PM, Frac, MMR, FAM, and #PM×FAM. We also see that, concerning fm, ICJointLEDPN obtains almost all the largest values except for one among five competing methods, and with regard to Acc, ICJointLEDPN is ranked top two. Overall, our method ICJointLEDPN can not only identify complexes accurately but also identify more complexes exactly matched with known complexes from TIPINs.
Now we give two examples related to the complexes identified from dynamical PPI networks which are constructed via incorporating GSE3431 into DIP. Figure 4 illustrates the matching example between nuclear exosome complex and the complexes identified by five competing methods.
(a) TSOCD, ICJointLEDPN
(b) Zhangâ€™s method
(c) DPCNADPIN
(d) DCA
As can be seen from Figure 4(a), TSOCD and ICJointLEDPN can identify nuclear exosome complex exactly. Zhang’s method misses four proteins outside the ellipse in Figure 4(b). DPCNADPIN wrongly identifies the yellowcolored YNL189W and misses YHR081W outside the ellipse in Figure 4(c). DCA wrongly identifies three yellowcolored proteins in Figure 4(d).
Similarly, Figure 5 shows the matching example between COMA complex and the complexes identified by five competing methods.
(a) ICJointLEDPN
(b) TSOCD, Zhangâ€™s method
(c) DPCNADPIN
(d) DCA
From Figure 5, we can see that our method ICJointLEDPN fails to identify COMA complex exactly due to missing YBR211C outside the ellipse in Figure 5(a). TSOCD and Zhang’s method wrongly identify the yellowcolored YBR107C and miss YBR211C outside the ellipse in Figure 5(b), these two methods are unable to detect COMA complex exactly. Likewise, owing to wrongly identifying the yellowcolored YKL049C and missing YBR211C outside the ellipse in Figure 5(c), DPCNADPIN fails to find COMA complex exactly. We can also see from Figure 5(d) that DCA is unsuccessful in detecting the COMA complex due to wrongly identifying the yellowcolored YGR140W.
To evaluate the functional enrichment of identified complexes, we compared our method ICJointLEDPN to other four competing methods with respect to biological process (BP) enrichment analysis. For complexes identified by ICJointLEDPN from different TIPINs, their raw data of BP enrichment analyses and their results of significant statistics are presented in Supplementary Materials (Available here). Table 8 shows the proportion of the complexes that are significantly enriched by BP termannotated proteins, where #IC is the total number of identified complexes, #SC denotes the number of identified complexes with significant enrichment.

As seen from Table 8, for five competing methods, their identified complexes of size larger than 6 are almost biologically significant except for the four italic cases. From Table 8, we can also see that for the significant enrichment of identified complexes of size not larger than 6, our method performs slightly weaker than DPCNADPIN, TSOCD, and DCA but stronger than Zhang’s method.
In summary, our proposed identification method overall outperforms other four competing methods in terms of the number of identified complexes exactly matched with known complexes #PM, the fraction of known complexes matched with identified complexes FRAC, maximum matching ratio MMR, comprehensive score FAM, and the product of #PM and FAM. Concerning the significant enrichment, five competing methods overall perform well when they identify complexes of size larger than 6; when identifying complexes of size not larger than 6, our proposed method performs slightly weaker than DPCNADPIN, TSOCD, and DCA but stronger than Zhang’s method.
4. Conclusions
Gene expression data contains temporal information of protein activity. By integrating gene expression data into PPI data to determine active time point of interacting proteins, we exploited temporal dynamics of proteins to construct temporal PPI networks TPNs. In order to accurately identify more protein complexes, we further converted TPNs into temporal interval PPI networks TIPINs. The experimental results confirmed that our constructed TIPINs have more contributions to identification of protein complex than TEPIN (TimeEvolving PIN), DPIN (dynamic protein interaction networks), NFAPIN (noisefiltered active protein interaction networks), DPPN (dynamic probabilistic protein interaction networks), and TCPIN (timecourse protein interaction networks).
Based on our constructed TIPINs, we devised a novel method ICJointLEDPN which uses multisource biological data to identify protein complexes. First, our proposed method employs protein localization data to analyze the joint colocalization condition to judge whether a group of proteins is of joint colocalization. Secondly, our proposed method uses gene expression data to analyze the joint coexpression condition to judge whether a group of proteins is of joint coexpression. Thirdly, our method exploits three types of similarity to analyze the expanding cluster condition to judge whether a group of proteins is of functional homogeneity. As a result, by combining these three conditions, our proposed method can accurately identify more protein complexes from TIPINs than other four competing methods Zhang’s method, DPCNADPIN, TSOCD, and DCA.
Identifying protein complexes from dynamic PPI networks remains to be a challenging work in postgenomic era. In cell system, protein activity and proteinprotein interaction have dynamical characteristics. Hence, it is important for identifying protein complexes to construct dynamic PPI networks close to reality. Due to the limited gene expression samples and failure to capture some transient interactions, it is difficult to construct dynamic PPI networks completely expressing protein interactions in cell system. Although many works have made to construct effective dynamic PPI networks to identify protein complexes, the efforts on constructing nearly real PPI networks will still be encouraged. In addition, it is also important to design an effective method to identify protein complexes from dynamic PPI networks. To find protein complexes with biological relevance by computational approach, multisource biological data should be used to identify protein complexes from dynamic PPI networks. As seen from Table 8, some protein complexes of size not larger than 6 identified by our method are not significant enough in biological meaning. This suggests that more other biological data should be integrated into protein complex identification. In future work, we will further investigate the integration of more biological data into our method in order to not only identify protein complexes more accurately but also improve the significant enrichment of the identified protein complexes of size not larger than 6.
Data Availability
Algorithm ICJointLEDPN is implemented in C++. The software suite of our method and the results produced by ICJointLEDPN from three yeast PPI data sets STRING, BioGrid, and DIP are available at https://dx.doi.org/10.6084/m9.figshare.7824233. Or please contact to zhangjx@gxu.edu.cn.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
We are grateful to Yiran Huang, Li Wang, Chunyan Tang, Xi Qin, and Na Li for their discussion. This work is supported by National Natural Science Foundation of China under Grant No. 61962004 and 61462005, and Natural Science Foundation of Guangxi under Grant No. 2014 GXN SFAA118396.
Supplementary Materials
For complexes identified by ICJointLEDPN from different TIPINs, their raw data of BP enrichment analyses and their results of significant statistics are, respectively, compressed in the following three packages. They are also available at https://dx.doi.org/10.6084/m9.figshare.7824233. Additional File 1: BioGrid_BP.rar for the protein complexes identified from TIPINs constructed by integrating GSE3431 and GSE4987 into BioGrid respectively. Additional File 2: DIP_BP.rar for the protein complexes identified from TIPINs constructed by integrating GSE3431 and GSE4987 into DIP, respectively. Additional File 3: STRING_BP.rar for the protein complexes identified from TIPINs constructed by integrating GSE3431 and GSE4987 into STRING, respectively. (Supplementary Materials)
References
 A. Gavin, M. Bösche, R. Krause et al., “Functional organization of the yeast proteome by systematic analysis of protein complexes,” Nature, vol. 415, no. 6868, pp. 141–147, 2002. View at: Publisher Site  Google Scholar
 P. Uetz, L. Glot, G. Cagney et al., “A comprehensive analysis of proteinprotein interactions in Saccharomyces cerevisiae,” Nature, vol. 403, no. 6770, pp. 623–627, 2000. View at: Publisher Site  Google Scholar
 K. Lage, E. O. Karlberg, Z. M. Størling et al., “A human phenomeinteractome network of protein complexes implicated in genetic disorders,” Nature Biotechnology, vol. 25, no. 3, pp. 309–316, 2007. View at: Publisher Site  Google Scholar
 G. D. Bader and C. W. Hogue, “An automated method for finding molecular complexes in large protein interaction networks,” BMC Bioinformatics, vol. 4, no. 1, article 2, 2003. View at: Publisher Site  Google Scholar
 T. Nepusz, H. Yu, and A. Paccanaro, “Detecting overlapping protein complexes in proteinprotein interaction networks,” Nature Methods, vol. 9, no. 5, pp. 471472, 2012. View at: Publisher Site  Google Scholar
 S. M. Van Dongen, Graph clustering by flow simulation [Ph.D. Thesis], University of Utrecht, Utrecht, The Netherlands, 2000.
 H. N. Chua, K. Ning, and W. K. Sung, “Using indirect proteinprotein interactions for protein complex prediction,” Bioinformatics & Computational Biology, vol. 6, no. 3, pp. 435–466, 2008. View at: Google Scholar
 B. J. Frey and D. Dueck, “Clustering by passing messages between data points,” Science, vol. 315, no. 5814, pp. 972–976, 2007. View at: Publisher Site  Google Scholar  MathSciNet
 P. Jiang and M. Singh, “SPICi: a fast clustering algorithm for large biological networks,” Bioinformatics, vol. 26, no. 8, pp. 1105–1111, 2010. View at: Publisher Site  Google Scholar
 M. AltafUlAmin, Y. Shinbo, K. Mihara, K. Kurokawa, and S. Kanaya, “Development and implementation of an algorithm for detection of protein complexes in large interaction networks,” BMC Bioinformatics, vol. 7, article 207, 2006. View at: Publisher Site  Google Scholar
 H. C. M. Leung, Q. Xiang, S. M. Yiu, and F. Y. L. Chin, “Predicting protein complexes from PPI data: a coreattachment approach,” Journal of Computational Biology, vol. 16, no. 2, pp. 133–144, 2009. View at: Publisher Site  Google Scholar  MathSciNet
 M. Wu, X. Li, C.K. Kwoh, and S.K. Ng, “A coreattachment based method to detect protein complexes in PPI networks,” BMC Bioinformatics, vol. 10, article 169, 2009. View at: Publisher Site  Google Scholar
 W. Peng, J. X. Wang, B. H. Zhao et al., “Identification of protein complexes using weighted pageranknibble algorithm and coreattachment structure,” in Proceedings of the IEEE ACM Transactions on computational biology and bioinformatics, vol. 12, pp. 179–192, 2015. View at: Publisher Site  Google Scholar
 S. Srihari, K. Ning, and H. W. Leong, “MCLCAw: a refinement of MCL for detecting yeast complexes from weighted PPI networks by incorporating coreattachment structure,” BMC Bioinformatics, vol. 11, no. 1, 2010. View at: Publisher Site  Google Scholar
 T. Price, F. I. Peña, and Y. Cho, “Survey: Enhancing protein complex prediction in PPI networks with GO similarity weighting,” Interdisciplinary Sciences: Computational Life Sciences, vol. 5, no. 3, pp. 196–210, 2013. View at: Publisher Site  Google Scholar
 B. Xu, H. Lin, and Z. Yang, “Ontology integration to identify protein complex in protein interaction networks,” Proteome Science, vol. 9, no. Suppl 1, p. S7, 2011. View at: Publisher Site  Google Scholar
 B. Xu, H. Lin, Y. Chen, Z. Yang, H. Liu, and J. Kleinjung, “Protein Complex Identification by Integrating ProteinProtein Interaction Evidence from Multiple Sources,” PLoS ONE, vol. 8, no. 12, p. e83841, 2013. View at: Publisher Site  Google Scholar
 J. Feng, R. Jiang, and T. Jiang, “A maxflow based approach to the identification of protein complexes using protein interaction and microarray data,” in Proceedings of the IEEE ACM Transactions on Computational Biology and Bioinformatics, vol. 8, pp. 621–634, 2011. View at: Publisher Site  Google Scholar
 X. Tang, J. Wang, and Y. Pan, “Predicting protein complexes via the integration of multiple biological information,” in Proceedings of the 2012 IEEE 6th International Conference on Systems Biology, ISB 2012, pp. 174–179, China, August 2012. View at: Google Scholar
 T. M. Przytycka, M. Singh, and D. K. Slonim, “Toward the dynamic interactome: It's about time,” Briefings in Bioinformatics, vol. 11, no. 1, Article ID bbp057, pp. 15–29, 2010. View at: Publisher Site  Google Scholar
 Q. Xiao, J. Wang, X. Peng, and F. Wu, “Detecting protein complexes from active protein interaction networks constructed with dynamic gene expression profiles,” Proteome Science, vol. 11, no. Suppl 1, p. S20, 2013. View at: Publisher Site  Google Scholar
 X. Tang, J. Wang, B. Liu, M. Li, G. Chen, and Y. Pan, “A comparison of the functional modules identified from time course and static PPI network data,” BMC Bioinformatics, vol. 12, article no. 339, 2011. View at: Publisher Site  Google Scholar
 J. Wang, X. Peng, M. Li, and Y. Pan, “Construction and application of dynamic protein interaction network based on time course gene expression data,” Proteomics, vol. 13, no. 2, pp. 301–312, 2013. View at: Publisher Site  Google Scholar
 U. de Lichtenberg, “Dynamic complex formation during the yeast cell cycle,” Science, vol. 307, no. 5710, pp. 724–727, 2005. View at: Publisher Site  Google Scholar
 X. Lei, Y. Ding, and F.X. Wu, “Detecting protein complexes from dpins by density based clustering with pigeoninspired optimization algorithm,” Science China Information Sciences, vol. 59, no. 7, Article ID 070103, 2016. View at: Publisher Site  Google Scholar
 X. Lei, F. Wang, F.X. Wu, A. Zhang, and W. Pedrycz, “Protein complex identification through Markov clustering with firefly algorithm on dynamic proteinprotein interaction networks,” Information Sciences, vol. 329, pp. 303–316, 2016. View at: Publisher Site  Google Scholar
 J. Zhao, X. Lei, and F.X. Wu, “Identifying protein complexes in dynamic proteinprotein interaction networks based on Cuckoo Search algorithm,” in Proceedings of the 2016 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2016, pp. 1288–1295, 2016. View at: Google Scholar
 X. Lei, H. Li, A. Zhang, and F. Wu, “iOPTICSGSO for identifying protein complexes from dynamic PPI networks,” BMC Medical Genomics, vol. 10, no. S5, 2017. View at: Publisher Site  Google Scholar
 Jie Zhao, Xiujuan Lei, and FangXiang Wu, “Predicting Protein Complexes in Weighted Dynamic PPI Networks Based on ICSC,” Complexity, vol. 2017, Article ID 4120506, 11 pages, 2017. View at: Publisher Site  Google Scholar
 Y. J. Zhang, H. F. Lin, Z. H. Yang et al., “Construction of dynamic probabilistic protein interaction networks for protein complex identification,” BMC Bioinformatics, vol. 17, pp. 186–198, 2016. View at: Google Scholar
 L. OuYang, D. Dai, X. Li, M. Wu, X. Zhang, and P. Yang, “Detecting temporal protein complexes from dynamic proteinprotein interaction networks,” BMC Bioinformatics, vol. 15, no. 1, p. 335, 2014. View at: Publisher Site  Google Scholar
 X. Shen, L. Yi, X. Jiang et al., “Mining temporal protein complex based on the dynamic pin weighted with connected affinity and gene coexpression,” PLoS ONE, vol. 11, no. 4, p. e0153967, 2016. View at: Publisher Site  Google Scholar
 Y. Zhang, H. Lin, Z. Yang, J. Wang, Y. Liu, and S. Sang, “A method for predicting protein complex in dynamic PPI networks,” BMC Bioinformatics, vol. 17, no. S7, 2016. View at: Publisher Site  Google Scholar
 X. Shen, L. Yi, X. Jiang et al., “Neighbor affinity based algorithm for discovering temporal protein complex from dynamic PPI network,” Methods, vol. 110, pp. 90–96, 2016. View at: Publisher Site  Google Scholar
 X. Shen, L. Yi, X. Jiang et al., “Identifying protein complex by integrating characteristic of coreattachment into dynamic PPI network,” PLoS ONE, vol. 12, no. 10, p. e0186134, 2017. View at: Publisher Site  Google Scholar
 http://stringdb.org/newstring_cgi/show_download_page.pl.
 W.K. Huh, J. V. Falvo, L. C. Gerke et al., “Global analysis of protein localization in budding yeast,” Nature, vol. 425, no. 6959, pp. 686–691, 2003. View at: Publisher Site  Google Scholar
 B. Futcher, G. I. Latter, P. Monardo, C. S. McLaughlin, and J. I. Garrels, “A sampling of the yeast proteome,” Molecular and Cellular Biology, vol. 19, no. 11, pp. 7357–7368, 1999. View at: Publisher Site  Google Scholar
 R. Jansen, D. Greenbaum, and M. Gerstein, “Relating wholegenome expression data with proteinprotein interactions,” Genome Research, vol. 12, no. 1, pp. 37–46. View at: Publisher Site  Google Scholar
 J. B. Bard and S. Y. Rhee, “Ontologies in biology: design, applications and future challenges,” Nature Reviews Genetics, vol. 5, no. 3, pp. 213–222, 2004. View at: Publisher Site  Google Scholar
 M. Ashburner, C. A. Ball, J. A. Blake et al., “Gene ontology: tool for the unification of biology,” Nature Genetics, vol. 25, no. 1, pp. 25–29, 2000. View at: Publisher Site  Google Scholar
 Gene Ontology Consortium, “The Gene Ontology (GO) project in 2006,” Nucleic Acids Research, vol. 34, no. 1, pp. D322–D326, 2006. View at: Publisher Site  Google Scholar
 P. W. Lord, R. D. Stevens, A. Brass, and C. A. Goble, “Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation,” Bioinformatics, vol. 19, no. 10, pp. 1275–1283, 2003. View at: Publisher Site  Google Scholar
 Y. Zhang and A. Xu, “Improved computation method for semantic similarity between gene ontology terms,” Journal of Computer Applications, vol. 32, no. 5, pp. 1329–1331, 2012. View at: Publisher Site  Google Scholar
 J. Z. Wang, Z. Du, R. Payattakool, P. S. Yu, and C.F. Chen, “A new method to measure the semantic similarity of GO terms,” Bioinformatics, vol. 23, no. 10, pp. 1274–1281, 2007. View at: Publisher Site  Google Scholar
 C. Stark, B.J. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, and M. Tyers, “BioGRID: a general repository for interaction datasets,” Nucleic Acids Research, vol. 34, supplement 1, pp. D535–D539, 2006. View at: Publisher Site  Google Scholar
 L. Salwinski, C. S. Miller, A. J. Smith, F. K. Pettit, J. U. Bowie, and D. Eisenberg, “The database of interacting proteins: 2004 update,” Nucleic Acids Research, vol. 32, pp. D449–D451, 2004. View at: Publisher Site  Google Scholar
 B. P. Tu, A. Kudlicki, M. Rowicka et al., “Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes,” Science, vol. 310, no. 5751, pp. 1152–1158, 2005. View at: Google Scholar
 T. Pramila, W. Wu, S. Miles et al., “The Forkhead transcription factor Hcm1 regulates chromosome segregation genes and fills the Sphase gap in the transcriptional circuitry of the cell cycle,” Genes & Development, vol. 20, no. 16, pp. 2266–2278, 2006. View at: Publisher Site  Google Scholar
 S. Pu, J. Wong, B. Turner, E. Cho, and S. J. Wodak, “Uptodate catalogues of yeast protein complexes,” Nucleic Acids Research, vol. 37, no. 3, pp. 825–831, 2009. View at: Publisher Site  Google Scholar
 S. Brohée and J. van Helden, “Evaluation of clustering algorithms for proteinprotein interaction networks,” BMC Bioinformatics, vol. 7, no. 1, article 488, 2006. View at: Publisher Site  Google Scholar
 R. Jansen and M. Gerstein, “Analyzing protein function on a genomic scale: the importance of goldstandard positives and negatives for network prediction,” Current Opinion in Microbiology, vol. 7, no. 5, pp. 535–545, 2004. View at: Publisher Site  Google Scholar
 H. Ge, Z. Liu, G. M. Church, and M. Vidal, “Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae,” Nature Genetics, vol. 29, no. 4, pp. 482–486, 2001. View at: Publisher Site  Google Scholar
 B. Zhang, B. Park, T. Karpinets, and N. F. Samatova, “From pulldown data to protein interaction networks and complexes with biological relevance,” Bioinformatics, vol. 24, no. 7, pp. 979–986, 2008. View at: Publisher Site  Google Scholar
 E. I. Boyle, S. Weng, J. Gollub et al., “GO: TermFinder—open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes,” Bioinformatics, vol. 20, no. 18, pp. 3710–3715, 2004. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2019 Jinxiong Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.