Research Article  Open Access
Partition of a Binary Matrix into () Exclusive Row and Column Submatrices Is Difficult
Abstract
A biclustering problem consists of objects and an attribute vector for each object. Biclustering aims at finding a bicluster—a subset of objects that exhibit similar behavior across a subset of attributes, or vice versa. Biclustering in matrices with binary entries (“0”/“1”) can be simplified into the problem of finding submatrices with entries of “1.” In this paper, we consider a variant of the biclustering problem: the submatrix partition of binary matrices problem. The input of the problem contains an matrix with entries (“0”/“1”) and a constant positive integer . The submatrix partition of binary matrices problem is to find exactly submatrices with entries of “1” such that these submatrices are pairwise row and column exclusive and each row (column) in the matrix occurs in exactly one of the submatrices. We discuss the complexity of the submatrix partition of binary matrices problem and show that the problem is NPhard for any by reduction from a biclustering problem in bipartite graphs.
1. Introduction
The problems considered in this paper are biclustering problems. Biclustering is an important optimization problem with applications in many fields including bioinformatics (especially in gene expression data analysis), identifying web communities, network information security analysis, and many more [1–3]. Biclustering is also known as block clustering, coclustering, or twoway clustering. The earliest biclustering algorithm that can be found in the literature is the socalled direct clustering by Hartigan in the 1970s [4, 5]. Since then, many approaches to biclustering have been proposed, such as the direct clustering algorithm [4], the nodedeletion algorithm [6], the FLOC algorithm [7], the biclustering via spectral bipartite graph partitioning algorithm [8], the biclustering via GIBBS sampling algorithm [9], and the algorithm for finding an orderpreserving submatrix [10]. For more on biclustering, see [3, 11, 12].
The basic model for biclustering is as follows. Let a dataset of objects and attributes be given as a matrix , where the value of is the value of the th attribute of the th object; the simplest aim of biclustering is to find a subset of rows (objects) that exhibit similar behavior across a subset of columns (attributes), or vice versa. In this case, the combination of the subset of objects and the subset of attributes is called a bicluster. A bicluster forms a contiguous rectangle after an appropriate reordering of rows and columns; that is, a bicluster is a submatrix of .
In some applications, the main goal of biclustering is to simultaneously find many submatrices (biclusters) in a matrix. Madeira and Oliveira discussed this issue and summarized eight biclustering patterns [11]. Five of these patterns are presented in Figure 1: (1) exclusive row and column biclusters (Figure 1(a)), with each row (column) occurring in exactly one bicluster; (2) exclusive row biclusters (Figure 1(b)), with each row occurring in exactly one bicluster and each column occurring in at least one bicluster; (3) exclusive column biclusters (Figure 1(c)), with each column occurring in exactly one bicluster and each row occurring in at least one bicluster; (4) checkerboard structure (Figure 1(d)), with each entry of the matrix occurring in exactly one bicluster; and (5) arbitrarily positioned overlapping biclusters (Figure 1(e)), with no limiting condition of rows (columns) overlapping or entries overlapping.
(a)
(b)
(c)
(d)
(e)
In many applications, a biclustering problem consists of a matrix that has entries of “1” or “0,” which is also called a binary matrix. The goal of biclustering in binary matrices is to find submatrices with entries of “1.” For example, when applying biclustering to text mining, a dataset of documents and words is arranged in a binary matrix , where rows correspond to documents and columns correspond to words. If an entry (, of the matrix is “1,” then word is present in document . If the entry is “0,” then the word is not present. The question is whether we can find submatrices with entries of “1” such that these submatrices are pairwise row and column exclusive, and each row (column) occurs in exactly one submatrix. Clearly, if the answer is “yes,” then these documents can be partitioned into groups, and documents in the same group have a good chance of belonging to the same domain.
The text mining problem described above can be abstracted as the submatrix partition of binary matrices problem (SPBM). Given an binary matrix and a constant positive integer , SPBM is to find submatrices with entries “1” such that these submatrices are pairwise row and column exclusive and each row (column) of the matrix occurs in exactly one of these submatrices. The bicluster pattern of SPBM belongs to pattern (a) in Figure 1. To the best of our knowledge, the hardness of SPBM remains an open problem, for each .
We will show that SPBM is NPcomplete by reduction from the partition of a bipartite graph into bicliques problem (PBB) that is a variant of biclustering problems in bipartite graphs; that is, an instance of PBB is a bipartite graph. A bipartite graph is a graph whose vertex set can be partitioned into two disjoint sets such that no two graph vertices within the same set are adjacent. For a biclustering problem in bipartite graphs, the goal is to find bicliques according to some scoring criterion. A biclique, which is also called a complete bipartite graph, is a special type of bipartite graph for which every pair of vertices in the two sets are adjacent.
In recent years, much study has focused on algorithms and complexity of biclustering problems in bipartite graphs. Peeters, Dawande et al., and Amit proved that the maximum edge biclique problem [13], the maximum edge weight biclique problem [14], the bicluster graph editing problem [15], the exact cardinality biclique problem [16], and the minimum edge deletion biclique problem [16], among others, are NPcomplete.
When Heydari et al. studied the biclustering of an attack graph problem in information security, they first proposed the partition of a bipartite graph into bicliques problem (PBB). Heydari et al. showed that PBB is NPcomplete [17]. Furthermore, Bein et al. discussed the PBB problem, where is a constant positive integer. Here, PBB is a parameterized version of PBB; it aims at partitioning the vertex set of a bipartite graph into subsets such that each vertex subset can induce a biclique. PBB defines a family of problems for any . Bein et al. first proposed the PBB problem and indicated that the question of whether PBB is NPcomplete for remains open [18].
Contribution of this paper is that it focuses on the complexity of several biclustering problems. The main result shows that 3PBB, PBB (), and SPBM () are all NPcomplete.
Organization of the paper is as follows: in Section 2, we introduce the PBB and SPBM problems. In Section 3, we first show that 3PBB is NPcomplete by reduction from a variant of the monotone oneinthree 3SAT problem (MO3), which is a wellknown NPcomplete problem [19, 20], and, then, we show that PBB () is NPcomplete by reduction from 3PBB. In Section 4, we prove that SPBM () is NPcomplete by reduction from PBB. Finally, in Section 6, we present our conclusions.
2. Preliminaries
In this paper, we study two problems: the SPBM problem and the PBB problem. Next, we present the formal descriptions of SPBM and PBB.
(1) The submatrix partition of binary matrices problem (SPBM).
The input to the SPBM problem is typically a binary matrix. Let be an binary matrix. Denote the set of row vectors and the set of column vectors by and , respectively. Suppose and ; then the public entries of row vectors and column vectors form a matrix that is called a submatrix of induced by and , which is denoted by . Clearly, . Let , be submatrices of . If , then and are row exclusive; if , then and are column exclusive. SPBM is to find exactly exclusive row and column submatrices with entries of “1” in a binary matrix. The SPBM problem can be stated formally as follows. Instance: an binary matrix , and a constant positive integer . Question: are there submatrices with entries “1” of such that the submatrices are pairwise row and column exclusive, and , ?
are called a submatrix partition of .
(2) The partition of a bipartite graph into bicliques problem (PBB).
An instance of PBB is a bipartite graph. All bipartite graphs in the paper are simple bipartite graphs, that is, do not contain parallel edges or selfloops. Let be a bipartite graph. For convenience in writing, vertices in are called leftvertices, and vertices in are called rightvertices of . In other words, and are the leftvertex set and rightvertex set of , respectively. We denote by and its set of edges and its set of vertices, respectively. For a vertex , we denote the set of neighbors of vertex by . A biclique in corresponds to a subset of , say, , such that , , and for each , the edge .
We say that there exists a biclique partition for a bipartite graph if can be partitioned into exactly disjoint sets such that, for , the subgraph induced by is a biclique. The PBB problem is the problem of determining whether there is a biclique partition for a bipartite graph , where is a constant positive integer. The PBB problem can be stated formally as follows. Instance: a finite bipartite graph and a constant positive integer . Question: does there exist a biclique partition for ?
3. The Complexity of PBB
In this section, we first show the NPcompleteness of PBB when (i.e., 3PBB). We then show that PBB is NPcomplete for any constant integer by reduction from 3PBB. Finally, we conclude that PBB is NPcomplete for any constant integer .
3.1. The NPCompleteness of 3PBB
In order to prove the hardness of 3PBB, we first introduce the monotone oneinthree 3SAT problem (MO3), which was proved to be NPcomplete by Schaefer in 1978 [19]. Then, we show that a variant of MO3 is NPcomplete. Finally, we prove that 3PBB is NPcomplete by reduction from MO3.
Below we define the terms we will use in describing MO3. Let be a set of Boolean variables. If , then and are literals over . is called a positive variable, and is called a negative variable. A truth assignment for is a function . For , if , we say that is “TRUE” under ; if , we say that is “FALSE.”
The MO3 problem, which is a variant of 3SAT, is specified as follows. Instance: set of Boolean variables, collection of clauses over , where each clause has , and does not contain a negative variable; that is, , . Question: is there a truth assignment for such that each clause in has exactly one true literal?
In the MO3 problem, a clause over contains only positive variables. For an MO3 instance, a clause over is satisfied by a truth assignment if and only if it has exactly one “TRUE” literal (and thus exactly two “FALSE” literals) under the assignment. A collection of clauses over is satisfiable if and only if there exists a truth assignment for that simultaneously satisfies all the clauses in .
For example, we are given Boolean variable set , and a clause collection , where , and . Let ; then, the values of the variables in , , and are , , and , which means that , , and are satisfied. Therefore, is a feasible solution of this MO3 instance.
For an arbitrary MO3 instance, we can assume that the three literals in each clause are not from the same variable, in which case the clause is not satisfied. Moreover, a clause in which two literals are from the same variable can be transformed into six clauses with pairwise different variables. The approach is as follows.
Suppose that is a clause of an MO3 instance. We create four new variables , , , and . Then, we construct six clauses over , , and the four new variables: , , , , , and . Clearly, the clause is satisfied if and only if and . Moreover, a truth assignment for the variables , , , and exists such that each clause in [1~6] is satisfied if and only if and .
Thus, an arbitrary MO3 instance can be transformed into an MO3 instance with pairwise different variables in each clause in polynomial time. Therefore, we have Theorem 1.
Theorem 1. MO3 with pairwise different variables in each clause is NPcomplete.
Throughout this paper, we assume without loss of generality that, for an instance of MO3, the three literals of each clause are pairwise different. Next, we discuss the complexity of 3PBB; that is, we prove Theorem 2.
Theorem 2. 3PBB is NPcomplete.
The proof of Theorem 2 consists of two steps. First, let a variable set and a clause collection be an instance of MO3; then we build a bipartite graph that is an instance of 3PBB. Second, we show that is satisfied if and only if there exists a 3biclique partition for .
3.1.1. The Construction of a Bipartite Graph from an MO3 Instance
Given an instance of MO3, we build a bipartite graph that is an instance of 3PBB in three steps. In the first step, we construct three components , , and from the clause . In the second step, we merge , , and into a bipartite graph . In the final step, we merge ’s into a bipartite graph .
Step 1. For each clause , we construct three components that are associated with the three literals in . Each of these components is a bipartite graph.
Suppose that . Thus, we construct the components , , and . The three components contain vertices , , and , which correspond to the variables , , and of , respectively. In the following, we will indiscriminately use the notation , , or to represent a vertex or a variable.
The key idea used in this step of construction is that each of the three components contains a bipartite subgraph isomorphic to illustrated in Figure 2. Moreover, for an arbitrary 3biclique partition of (), the structure of ensures that (1), , and are always partitioned into different bicliques,(2), , or only belongs to those bicliques that contain or .
This is our basic way of encoding the idea that can be set to either or ; if belongs to a biclique that contains , we set , and if belongs to a biclique that contains , we set .
contains 13 vertices and 21 edges, as shown in Figure 3(a). Figures 3(b)–3(d) show three 3biclique partitions of . In Figures 3(b) –3(d), the vertices with the same color induce a biclique. In fact, there exist exactly three 3biclique partitions for , as shown in Figures 3(b) –3(d).
(a)
(b)
(c)
(d)
Lemma 3. For an arbitrary 3biclique partition of , , , and are always partitioned into different bicliques. (For the sake of readability, we defer the proof to the Appendix. The complete proof is in Appendix A.)
Based on Lemma 3, each vertex in is assigned a value for denoting a 3biclique partition of by the assignment function . According to a 3biclique partition of , the function is defined as
Lemma 4. There exist exactly three 3biclique partitions for . Accordingly, the values of the vertices , , and are (, , . (The proof is in Appendix B.)
is presented in Figure 4(a). contains 12 vertices and 17 edges. Figures 4(b) and 4(c) show two 3biclique partitions of . In Figures 4(b) and 4(c), the vertices with the same color induce a biclique. In fact, there exist exactly two 3biclique partitions for , as shown in Figures 4(b) and 4(c).
(a)
(b)
(c)
Lemma 5. For an arbitrary 3biclique partition of , , , and are always partitioned into different bicliques. (The proof is in Appendix C.)
Based on Lemma 5, the same approach that was used for is used to assign values to the vertices of . Again, we suppose that is the assignment function for . The assignment method for is the same as that in Formula (1).
Lemma 6. There exist exactly two 3biclique partitions for . Accordingly, the values of the vertices and are . (The proof is in Appendix D.)
is isomorphic to . To obtain in Figure 5, we only need to rename the vertices , , , , , , , , , , , and of as , , , , , , , , , , , and , respectively. We present Lemmas 7 and 8 on without proof. The proofs are similar to those of Lemmas 5 and 6.
Lemma 7. For an arbitrary 3biclique partition of , , , and are always partitioned into different bicliques.
Again, we assign the vertices of using Formula (1).
Lemma 8. There exist exactly two 3biclique partitions for . Accordingly, the values of the vertices and are .
Step 2. We merge , , and into a bipartite graph () that is associated with the clause .
For the bipartite graphs , , , , , , and () constructed as before, we first merge , , and into before building an instance of 3PBB. Suppose that and .
The left and right vertex sets of are obtained by merging the left and right vertex sets of , , and :
In words, each vertex of belongs to , , or , and vice versa, and vertices with the same vertex label in , , and are merged into one vertex in as follows: the vertices with the same label, including , , , , , and in , , and , are merged into one group of vertices labeled , , , , , and in ; two vertices in and in are merged into one vertex labeled in ; and two vertices in and in are merged into one vertex labeled in . In and , no other vertices exist with the same label except for , , , , , and .
has two portions. Let . The first portion can be obtained by merging , , and :
Clearly, the edges with the same vertex label in , , and are merged into one edge of , respectively, and , , and are bipartite subgraphs of . To ensure that there exists a 3biclique partition for , we require the addition of more edges as the other portion of as follows: the edges of and among the nonpublic vertices are added, as denoted by ; the edges of and among the nonpublic vertices are added, as denoted by ; and the edges of and among the nonpublic vertices are added, as denoted by . For two graphs, if a vertex label occurs exactly one of the two graphs, then the vertex corresponding to this label is called a nonpublic vertex. These three additional edge sets are formally stated as follows:
Hence, the second portion of can be obtained:
For and its bipartite subgraphs , , and , Proposition 9 holds.
Proposition 9. A bipartite subgraph of induced by is isomorphic to , where . (The proof is in Appendix E.)
Figure 6 illustrates the process of building from , , and . The meaning of Figure 6 is as follows.(1)Figure 6(a) shows the public vertices. The white vertex set is a public vertex set of , , and . The gray vertex is a public vertex of and . The blue vertex is a public vertex of and .(2)Figure 6(b) depicts how to obtain and . The white vertices of , , and , the gray vertex of and , and the blue vertex of and are merged together, respectively. Here, , , and cannot be merged because they are pairwise different. As shown in Figure 6(b), the edge set is .(3)Figure 6(c) displays the following additional edge sets: (yellow edge set), (black edge set), and (red edge set). For the sake of clarity, is not illustrated in Figure 6(c). If is added to Figure 6(c), then will be obtained.
Step 3. We merge into that is associated with an instance of MO3.
The steps used to merge are similar to those in merging , , and as above. is obtained by merging ’s :
In words, each vertex of belongs to and vice versa, and vertices with the same vertex label in ’s are merged into one vertex in as follows: the group vertices labeled in are merged into one group in and are still labeled , and if a variable appears times in the clause collection , then in , the vertices labeled in ’s are merged into one vertex . Therefore, each variable corresponds to exactly one vertex in .
has two portions. Let . The first portion can be obtained by merging , ; that is,
Similarly, the edges with the same vertex label in ’s () are merged into one edge of , and ’s are bipartite subgraph of . To ensure that there exists a 3biclique partition for , we require the addition of more edges to be the other portion of : the edges among the nonpublic vertices of and are added as the edge set , where . These additional edge sets are formally stated as follows:
Consequently, the second portion of can be obtained:
This completes the construction of the bipartite graph . obtained by merging ’s has at most vertices and edges. Therefore, can be constructed in polynomial time.
For , , and , Proposition 10 holds.
Proposition 10. A bipartite subgraph of induced by is isomorphic to , and a bipartite subgraph of induced by is isomorphic to , where . (The proof is in Appendix F.)
Next, we show that there does not exist a 2biclique partition for ; that is, if there exists a biclique partition for , then .
Lemma 11. If there exists a biclique partition for , then .
Proof. An arbitrary vertex is adjacent to at most two of , , and in . In the process of building , there is no additional edge whose end vertex is in . Therefore, an arbitrary vertex is adjacent to, at most, two of , , and , such that , , and belong to at least two bicliques. If , , and are partitioned into two bicliques, then suppose that and are partitioned into different bicliques, where , , , . Based on the process of building , , and . Thus, , , , and of belong to at least three bicliques, and the lemma follows.
In the following, we prove that if there exists a 3biclique partition for , then Lemmas 12 and 13 hold.
Lemma 12. If there exists at least one 3biclique partition for , then , , and will always be partitioned into three different bicliques for an arbitrary 3biclique partition of .
Proof. There are only three edges , , and between and in . Therefore, if , and are partitioned into three bicliques, then , , and must be partitioned into three bicliques. Moreover, because an arbitrary vertex is adjacent to at most two vertices of , , , and belong to at least two bicliques in a 3biclique partition of . We next show that , , and do not belong to two bicliques using proof by contradiction.
Suppose that , , and belong to two bicliques. We can assume without loss of generality that is a 3biclique partition of , , , where , , , . Because , , , , we have . Thus, there exists , , , such that , . Because , the vertices in are partitioned into three bicliques in a 3biclique partition of . By Proposition 10, the edge subset of induced by is exactly . We next show that the vertices in also belong to three bicliques. Consider the following three cases: , , and . (1)If , then . As shown in Figure 7(a), if is , then there are no edges between and . Moreover, , , and cannot simultaneously belong to either or . Therefore, the vertices in belong to three bicliques. As shown in Figures 7(b)–7(d), if , we distinguish three cases. For an arbitrary , is not adjacent to two of , , and (the brown vertices), and these two vertices cannot simultaneously belong to or . Therefore, the vertices of belong to three bicliques.(2)If , then . As shown in Figures 8(a)–8(c), we distinguish three cases. For an arbitrary , is not adjacent to two of , , and (the brown vertices), and these two vertices cannot simultaneously belong to or . Therefore, the vertices of belong to three bicliques.(3)If , then because and are isomorphic, the vertices of also belong to three bicliques.
By (1), (2), and (3), either the left or right vertices of are always partitioned into three bicliques in a 3biclique partition of . Thus, , , or induces a biclique in a 3biclique partition of , respectively. The three bicliques are a 3biclique partition of . From Lemmas 3, 5, and 7, , , and must belong to three different bicliques, which contradicts the supposition that , , and belong to two bicliques. The lemma follows.
(a)
(b)
(c)
(d)
(a)
(b)
(c)
Lemma 13. Let be a 3biclique partition of . Then, is a 3biclique partition of .
Proof. From Lemma 12, , , and are always partitioned into three different bicliques in a 3biclique partition of . Thus, for in , the vertices of either its or all belong to three bicliques. By Proposition 10, the bipartite subgraph of induced by is . Therefore, the edges between and must belong to in a 3biclique partition of . From the definition of a biclique, the lemma follows.
3.1.2. Completing the NPCompleteness Proof of 3PBB
It is easy to see that 3PBB NP because, for a given bipartite graph , a nondeterministic algorithm need only guess a partition with size 3 of that partitions into three groups and check in polynomial time whether the bipartite subgraph induced by each vertex group is a biclique.
Previously, we constructed a bipartite graph from a variable set and a clause collection . All that remains to be shown is that there exists a truth assignment for such that is satisfied if and only if there exists a 3biclique partition for .
Assume that is a truth assignment that satisfies . We first assign each vertex of in three steps and then show that there exists a 3biclique partition for .(1)Let ; then the value of is . The 3biclique partitions of , , and are given from the values of , , and , as presented in Table 1. Based on Lemmas 3, 5, and 7, we set each vertex of , , and to “0,” “1,” or “2” by Formula (1) and Table 1.(2)We assign a value to each vertex of as follows: if a vertex has the same label with a vertex (), then set equal to the value of . As shown in Table 1, a key observation is that vertices with the same label in , , and are assigned an identical value by a 3biclique partitions of , , or and the true assignment of . This ensures that each vertex of cannot be assigned different values.(3)Similarly as step (2), we assign a value to each vertex of as follows: if a vertex has the same label with a vertex (), then set equal to the value of . Clearly, by the truth assignment, even if a variable occurs in more than one clause of , the variable has exactly one value; therefore, even if a variable corresponds to more than one vertex in different ’s, these vertices corresponding to this variable are assigned an identical value, and it is not hard to see that each vertex of has an identical value in different ’s by Formula (1). In addition, except for , , , , , , , , and , there do not exist other vertices with the same label in different ’s. It follows that vertices with the same label in different ’s have an identical value. This ensures that each vertex of cannot be assigned different values.

Next, to prove that there exists a 3biclique partition for , it suffices to show that vertices with an identical value form a biclique of . In other words, we only need to show that if and belong to the left and right vertex sets, respectively, and their values are identical, then . If and belong to the same , and their values are identical, then and certainly belong to a biclique, and must hold. If and belong to different ’s, then the edge must be added in the process of merging ’s into or merging ’s into ; that is, . Therefore, the vertices of with an identical value certainly form a biclique of .
Suppose that is a 3biclique partition of . Based on Lemma 12, a 3biclique partition of always partitions , , and into three different bicliques. By Formula (1), each vertex of is set to “0,” “1,” or “2.” We next show that the vertices that correspond to a clause are assigned .
Based on Lemma 13, is a 3biclique partition of . Therefore, we can directly consider obtaining the assignment of , , and from a 3biclique partition of .
When is , based on Lemma 4, we have (. Because of and of are of the same vertex, and of and of are of the same vertex in , then the assignment of in is the same as that of in , and the assignment of in is the same as that of in . Therefore, the assignments of in and in must satisfy . When is or , based on Lemmas 6 and 8, we have . Therefore, to ensure that holds, we must have hold. It follows that if there is a 3biclique partition for , then holds.
Because each variable corresponds to exactly one vertex in , it is easy to obtain a truth assignment for all the variables: from the vertex values of . We merely set if the assignment of is in and set if the assignment of is in . After this assignment is made, an arbitrary clause of an MO3 instance is set to , which satisfies the clause collection of the MO3 instance.
3.2. The NPCompleteness of PBB ()
To prove the NPcompleteness of PBB for any , we provide a reduction from 3PBB as follows.
Theorem 14. PBB () is NPcomplete, where is a constant positive integer.
Proof. It is easy to see that PBB ∈ NP because a nondeterministic algorithm need only guess a partition with size of , which partitions into groups for a given bipartite graph , and check in polynomial time whether the bipartite subgraph that is induced by each vertex group is a biclique.
We provide a reduction from 3PBB. Given an input instance of 3PBB, we form an instance of PBB () as follows: ; ; . That is, we add vertices and () independent edges to for building . Then becomes an instance of