Partition of a Binary Matrix into <svg xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" style="vertical-align:-0.198pt;width:11.4px;" id="M1" height="16.35" version="1.1" viewBox="0 0 11.4 16.35" width="11.4">
	
		
			<g transform="matrix(.022,-0,0,-.022,.062,16.025)"><path id="x1D458" d="M480 416q0 -21 -18 -41q-9 -11 -17 -7q-20 9 -42 9q-62 0 -140 -78q23 -69 88 -192q17 -31 27 -42t20 -11q16 0 62 46l17 -20q-64 -92 -119 -92q-35 0 -70 66q-41 73 -84 187q-36 -30 -62 -61q-27 -115 -35 -172q-41 -8 -78 -20l-6 6l140 612q7 28 0.5 34t-37.5 7l-34 1
l5 26q38 4 74 13.5t57 17t25 7.5q12 0 4 -32l-104 -443h2q35 38 97 93q39 35 65.5 56t62 41.5t58.5 20.5q19 0 30.5 -10t11.5 -22z"></path></g>

		
	
</svg> (<svg xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" style="vertical-align:-1.854pt;width:62.712502px;" id="M2" height="16.987499" version="1.1" viewBox="0 0 62.712502 16.987499" width="62.712502">
	
		
			<g transform="matrix(.022,-0,0,-.022,.062,16.025)"><use xlink:href="#x1D458"></use></g><g transform="matrix(.022,-0,0,-.022,25.034,16.025)"><path id="x2265" d="M531 285l-474 -214v56l416 183l-416 184v56l474 -215v-50zM531 -40h-474v50h474v-50z"></path></g><g transform="matrix(.022,-0,0,-.022,51.889,16.025)"><path id="x33" d="M285 378v-2q65 -13 102 -54.5t37 -97.5q0 -57 -30.5 -104.5t-74 -75t-85.5 -42t-72 -14.5q-31 0 -59.5 11t-40.5 23q-19 18 -16 36q1 16 23 33q13 10 24 0q58 -51 124 -51q55 0 88 40t33 112q0 64 -39 96.5t-88 32.5q-29 0 -64 -11l-6 29q77 25 118 57.5t41 84.5
q0 45 -26.5 69.5t-68.5 24.5q-67 0 -120 -79l-20 20l43 63q51 56 127 56h1q66 0 107 -37t41 -95q0 -42 -31 -71q-22 -23 -68 -54z"></path></g>

		
	
</svg>) Exclusive Row and Column Submatrices Is Difficult

Liu, Peiqiang; Zhu, Daming; Xiao, Jinjie; Xie, Qingsong; Mao, Yanyan

doi:https://doi.org/10.1155/2014/934630

Mathematical Problems in Engineering

On this page

Abstract Introduction Preliminaries Applications Conclusions Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2014 | Article ID 934630 | https://doi.org/10.1155/2014/934630

Partition of a Binary Matrix into () Exclusive Row and Column Submatrices Is Difficult

Peiqiang Liu,^1,2,3Daming Zhu,³Jinjie Xiao,^1,2Qingsong Xie,^1,2and Yanyan Mao¹

Academic Editor: Anders Eriksson

Received25 Mar 2014

Revised26 May 2014

Accepted27 May 2014

Published03 Jul 2014

Abstract

A biclustering problem consists of objects and an attribute vector for each object. Biclustering aims at finding a bicluster—a subset of objects that exhibit similar behavior across a subset of attributes, or vice versa. Biclustering in matrices with binary entries (“0”/“1”) can be simplified into the problem of finding submatrices with entries of “1.” In this paper, we consider a variant of the biclustering problem: the -submatrix partition of binary matrices problem. The input of the problem contains an matrix with entries (“0”/“1”) and a constant positive integer . The -submatrix partition of binary matrices problem is to find exactly submatrices with entries of “1” such that these submatrices are pairwise row and column exclusive and each row (column) in the matrix occurs in exactly one of the submatrices. We discuss the complexity of the -submatrix partition of binary matrices problem and show that the problem is NP-hard for any by reduction from a biclustering problem in bipartite graphs.

1. Introduction

The problems considered in this paper are biclustering problems. Biclustering is an important optimization problem with applications in many fields including bioinformatics (especially in gene expression data analysis), identifying web communities, network information security analysis, and many more [1–3]. Biclustering is also known as block clustering, coclustering, or two-way clustering. The earliest biclustering algorithm that can be found in the literature is the so-called direct clustering by Hartigan in the 1970s [4, 5]. Since then, many approaches to biclustering have been proposed, such as the direct clustering algorithm [4], the node-deletion algorithm [6], the FLOC algorithm [7], the biclustering via spectral bipartite graph partitioning algorithm [8], the biclustering via GIBBS sampling algorithm [9], and the algorithm for finding an order-preserving submatrix [10]. For more on biclustering, see [3, 11, 12].

The basic model for biclustering is as follows. Let a dataset of objects and attributes be given as a matrix , where the value of is the value of the th attribute of the th object; the simplest aim of biclustering is to find a subset of rows (objects) that exhibit similar behavior across a subset of columns (attributes), or vice versa. In this case, the combination of the subset of objects and the subset of attributes is called a bicluster. A bicluster forms a contiguous rectangle after an appropriate reordering of rows and columns; that is, a bicluster is a submatrix of .

In some applications, the main goal of biclustering is to simultaneously find many submatrices (biclusters) in a matrix. Madeira and Oliveira discussed this issue and summarized eight biclustering patterns [11]. Five of these patterns are presented in Figure 1: (1) exclusive row and column biclusters (Figure 1(a)), with each row (column) occurring in exactly one bicluster; (2) exclusive row biclusters (Figure 1(b)), with each row occurring in exactly one bicluster and each column occurring in at least one bicluster; (3) exclusive column biclusters (Figure 1(c)), with each column occurring in exactly one bicluster and each row occurring in at least one bicluster; (4) checkerboard structure (Figure 1(d)), with each entry of the matrix occurring in exactly one bicluster; and (5) arbitrarily positioned overlapping biclusters (Figure 1(e)), with no limiting condition of rows (columns) overlapping or entries overlapping.

(a)

(b)

(c)

(d)

(e)

In many applications, a biclustering problem consists of a matrix that has entries of “1” or “0,” which is also called a binary matrix. The goal of biclustering in binary matrices is to find submatrices with entries of “1.” For example, when applying biclustering to text mining, a dataset of documents and words is arranged in a binary matrix , where rows correspond to documents and columns correspond to words. If an entry (, of the matrix is “1,” then word is present in document . If the entry is “0,” then the word is not present. The question is whether we can find submatrices with entries of “1” such that these submatrices are pairwise row and column exclusive, and each row (column) occurs in exactly one submatrix. Clearly, if the answer is “yes,” then these documents can be partitioned into groups, and documents in the same group have a good chance of belonging to the same domain.

The text mining problem described above can be abstracted as the -submatrix partition of binary matrices problem (-SPBM). Given an binary matrix and a constant positive integer , -SPBM is to find submatrices with entries “1” such that these submatrices are pairwise row and column exclusive and each row (column) of the matrix occurs in exactly one of these submatrices. The bicluster pattern of -SPBM belongs to pattern (a) in Figure 1. To the best of our knowledge, the hardness of -SPBM remains an open problem, for each .

We will show that -SPBM is NP-complete by reduction from the partition of a bipartite graph into bicliques problem (-PBB) that is a variant of biclustering problems in bipartite graphs; that is, an instance of -PBB is a bipartite graph. A bipartite graph is a graph whose vertex set can be partitioned into two disjoint sets such that no two graph vertices within the same set are adjacent. For a biclustering problem in bipartite graphs, the goal is to find bicliques according to some scoring criterion. A biclique, which is also called a complete bipartite graph, is a special type of bipartite graph for which every pair of vertices in the two sets are adjacent.

In recent years, much study has focused on algorithms and complexity of biclustering problems in bipartite graphs. Peeters, Dawande et al., and Amit proved that the maximum edge biclique problem [13], the maximum edge weight biclique problem [14], the bicluster graph editing problem [15], the exact cardinality biclique problem [16], and the minimum edge deletion biclique problem [16], among others, are NP-complete.

When Heydari et al. studied the biclustering of an attack graph problem in information security, they first proposed the partition of a bipartite graph into bicliques problem (PBB). Heydari et al. showed that PBB is NP-complete [17]. Furthermore, Bein et al. discussed the -PBB problem, where is a constant positive integer. Here, -PBB is a parameterized version of PBB; it aims at partitioning the vertex set of a bipartite graph into subsets such that each vertex subset can induce a biclique. -PBB defines a family of problems for any . Bein et al. first proposed the -PBB problem and indicated that the question of whether -PBB is NP-complete for remains open [18].

Contribution of this paper is that it focuses on the complexity of several biclustering problems. The main result shows that 3-PBB, -PBB (), and -SPBM () are all NP-complete.

Organization of the paper is as follows: in Section 2, we introduce the -PBB and -SPBM problems. In Section 3, we first show that 3-PBB is NP-complete by reduction from a variant of the monotone one-in-three 3SAT problem (MO3), which is a well-known NP-complete problem [19, 20], and, then, we show that -PBB () is NP-complete by reduction from 3-PBB. In Section 4, we prove that -SPBM () is NP-complete by reduction from -PBB. Finally, in Section 6, we present our conclusions.

2. Preliminaries

In this paper, we study two problems: the -SPBM problem and the -PBB problem. Next, we present the formal descriptions of -SPBM and -PBB.

(1) The -submatrix partition of binary matrices problem (-SPBM).

The input to the -SPBM problem is typically a binary matrix. Let be an binary matrix. Denote the set of row vectors and the set of column vectors by and , respectively. Suppose and ; then the public entries of row vectors and column vectors form a matrix that is called a submatrix of induced by and , which is denoted by . Clearly, . Let , be submatrices of . If , then and are row exclusive; if , then and are column exclusive. -SPBM is to find exactly exclusive row and column submatrices with entries of “1” in a binary matrix. The -SPBM problem can be stated formally as follows. Instance: an binary matrix , and a constant positive integer . Question: are there submatrices with entries “1” of such that the submatrices are pairwise row and column exclusive, and , ?

are called a -submatrix partition of .

(2) The partition of a bipartite graph into -bicliques problem (-PBB).

An instance of -PBB is a bipartite graph. All bipartite graphs in the paper are simple bipartite graphs, that is, do not contain parallel edges or self-loops. Let be a bipartite graph. For convenience in writing, vertices in are called left-vertices, and vertices in are called right-vertices of . In other words, and are the left-vertex set and right-vertex set of , respectively. We denote by and its set of edges and its set of vertices, respectively. For a vertex , we denote the set of neighbors of vertex by . A biclique in corresponds to a subset of , say, , such that , , and for each , the edge .

We say that there exists a -biclique partition for a bipartite graph if can be partitioned into exactly disjoint sets such that, for , the subgraph induced by is a biclique. The -PBB problem is the problem of determining whether there is a -biclique partition for a bipartite graph , where is a constant positive integer. The -PBB problem can be stated formally as follows. Instance: a finite bipartite graph and a constant positive integer . Question: does there exist a -biclique partition for ?

3. The Complexity of -PBB

In this section, we first show the NP-completeness of -PBB when (i.e., 3-PBB). We then show that -PBB is NP-complete for any constant integer by reduction from 3-PBB. Finally, we conclude that -PBB is NP-complete for any constant integer .

3.1. The NP-Completeness of 3-PBB

In order to prove the hardness of 3-PBB, we first introduce the monotone one-in-three 3SAT problem (MO3), which was proved to be NP-complete by Schaefer in 1978 [19]. Then, we show that a variant of MO3 is NP-complete. Finally, we prove that 3-PBB is NP-complete by reduction from MO3.

Below we define the terms we will use in describing MO3. Let be a set of Boolean variables. If , then and are literals over . is called a positive variable, and is called a negative variable. A truth assignment for is a function . For , if , we say that is “TRUE” under ; if , we say that is “FALSE.”

The MO3 problem, which is a variant of 3SAT, is specified as follows. Instance: set of Boolean variables, collection of clauses over , where each clause has , and does not contain a negative variable; that is, , . Question: is there a truth assignment for such that each clause in has exactly one true literal?

In the MO3 problem, a clause over contains only positive variables. For an MO3 instance, a clause over is satisfied by a truth assignment if and only if it has exactly one “TRUE” literal (and thus exactly two “FALSE” literals) under the assignment. A collection of clauses over is satisfiable if and only if there exists a truth assignment for that simultaneously satisfies all the clauses in .

For example, we are given Boolean variable set , and a clause collection , where , and . Let ; then, the values of the variables in , , and are , , and , which means that , , and are satisfied. Therefore, is a feasible solution of this MO3 instance.

For an arbitrary MO3 instance, we can assume that the three literals in each clause are not from the same variable, in which case the clause is not satisfied. Moreover, a clause in which two literals are from the same variable can be transformed into six clauses with pairwise different variables. The approach is as follows.

Suppose that is a clause of an MO3 instance. We create four new variables , , , and . Then, we construct six clauses over , , and the four new variables: , , , , , and . Clearly, the clause is satisfied if and only if and . Moreover, a truth assignment for the variables , , , and exists such that each clause in [1~6] is satisfied if and only if and .

Thus, an arbitrary MO3 instance can be transformed into an MO3 instance with pairwise different variables in each clause in polynomial time. Therefore, we have Theorem 1.

Theorem 1. MO3 with pairwise different variables in each clause is NP-complete.

Throughout this paper, we assume without loss of generality that, for an instance of MO3, the three literals of each clause are pairwise different. Next, we discuss the complexity of 3-PBB; that is, we prove Theorem 2.

Theorem 2. 3-PBB is NP-complete.

The proof of Theorem 2 consists of two steps. First, let a variable set and a clause collection be an instance of MO3; then we build a bipartite graph that is an instance of 3-PBB. Second, we show that is satisfied if and only if there exists a 3-biclique partition for .

3.1.1. The Construction of a Bipartite Graph from an MO3 Instance

Given an instance of MO3, we build a bipartite graph that is an instance of 3-PBB in three steps. In the first step, we construct three components , , and from the clause . In the second step, we merge , , and into a bipartite graph . In the final step, we merge ’s into a bipartite graph .

Step 1. For each clause , we construct three components that are associated with the three literals in . Each of these components is a bipartite graph.
Suppose that . Thus, we construct the components , , and . The three components contain vertices , , and , which correspond to the variables , , and of , respectively. In the following, we will indiscriminately use the notation , , or to represent a vertex or a variable.
The key idea used in this step of construction is that each of the three components contains a bipartite subgraph isomorphic to illustrated in Figure 2. Moreover, for an arbitrary 3-biclique partition of (), the structure of ensures that (1), , and are always partitioned into different bicliques,(2), , or only belongs to those bicliques that contain or .

This is our basic way of encoding the idea that can be set to either or ; if belongs to a biclique that contains , we set , and if belongs to a biclique that contains , we set .

contains 13 vertices and 21 edges, as shown in Figure 3(a). Figures 3(b)–3(d) show three 3-biclique partitions of . In Figures 3(b) –3(d), the vertices with the same color induce a biclique. In fact, there exist exactly three 3-biclique partitions for , as shown in Figures 3(b) –3(d).

(a)

(b)

(c)

(d)

Lemma 3. For an arbitrary 3-biclique partition of , , , and are always partitioned into different bicliques. (For the sake of readability, we defer the proof to the Appendix. The complete proof is in Appendix A.)

Based on Lemma 3, each vertex in is assigned a value for denoting a 3-biclique partition of by the assignment function . According to a 3-biclique partition of , the function is defined as

Lemma 4. There exist exactly three 3-biclique partitions for . Accordingly, the values of the vertices , , and are (, , . (The proof is in Appendix B.)

is presented in Figure 4(a). contains 12 vertices and 17 edges. Figures 4(b) and 4(c) show two 3-biclique partitions of . In Figures 4(b) and 4(c), the vertices with the same color induce a biclique. In fact, there exist exactly two 3-biclique partitions for , as shown in Figures 4(b) and 4(c).

(a)

(b)

(c)

Lemma 5. For an arbitrary 3-biclique partition of , , , and are always partitioned into different bicliques. (The proof is in Appendix C.)

Based on Lemma 5, the same approach that was used for is used to assign values to the vertices of . Again, we suppose that is the assignment function for . The assignment method for is the same as that in Formula (1).

Lemma 6. There exist exactly two 3-biclique partitions for . Accordingly, the values of the vertices and are . (The proof is in Appendix D.)

is isomorphic to . To obtain in Figure 5, we only need to rename the vertices , , , , , , , , , , , and of as , , , , , , , , , , , and , respectively. We present Lemmas 7 and 8 on without proof. The proofs are similar to those of Lemmas 5 and 6.

Lemma 7. For an arbitrary 3-biclique partition of , , , and are always partitioned into different bicliques.

Again, we assign the vertices of using Formula (1).

Lemma 8. There exist exactly two 3-biclique partitions for . Accordingly, the values of the vertices and are .

Step 2. We merge , , and into a bipartite graph () that is associated with the clause .

For the bipartite graphs , , , , , , and () constructed as before, we first merge , , and into before building an instance of 3-PBB. Suppose that and .

The left and right vertex sets of are obtained by merging the left and right vertex sets of , , and :

In words, each vertex of belongs to , , or , and vice versa, and vertices with the same vertex label in , , and are merged into one vertex in as follows: the vertices with the same label, including , , , , , and in , , and , are merged into one group of vertices labeled , , , , , and in ; two vertices in and in are merged into one vertex labeled in ; and two vertices in and in are merged into one vertex labeled in . In and , no other vertices exist with the same label except for , , , , , and .

has two portions. Let . The first portion can be obtained by merging , , and :

Clearly, the edges with the same vertex label in , , and are merged into one edge of , respectively, and , , and are bipartite subgraphs of . To ensure that there exists a 3-biclique partition for , we require the addition of more edges as the other portion of as follows: the edges of and among the nonpublic vertices are added, as denoted by ; the edges of and among the nonpublic vertices are added, as denoted by ; and the edges of and among the nonpublic vertices are added, as denoted by . For two graphs, if a vertex label occurs exactly one of the two graphs, then the vertex corresponding to this label is called a nonpublic vertex. These three additional edge sets are formally stated as follows:

Hence, the second portion of can be obtained:

For and its bipartite subgraphs , , and , Proposition 9 holds.

Proposition 9. A bipartite subgraph of induced by is isomorphic to , where . (The proof is in Appendix E.)

Figure 6 illustrates the process of building from , , and . The meaning of Figure 6 is as follows.(1)Figure 6(a) shows the public vertices. The white vertex set is a public vertex set of , , and . The gray vertex is a public vertex of and . The blue vertex is a public vertex of and .(2)Figure 6(b) depicts how to obtain and . The white vertices of , , and , the gray vertex of and , and the blue vertex of and are merged together, respectively. Here, , , and cannot be merged because they are pairwise different. As shown in Figure 6(b), the edge set is .(3)Figure 6(c) displays the following additional edge sets: (yellow edge set), (black edge set), and (red edge set). For the sake of clarity, is not illustrated in Figure 6(c). If is added to Figure 6(c), then will be obtained.

Step 3. We merge into that is associated with an instance of MO3.

The steps used to merge are similar to those in merging , , and as above. is obtained by merging ’s :

In words, each vertex of belongs to and vice versa, and vertices with the same vertex label in ’s are merged into one vertex in as follows: the group vertices labeled in are merged into one group in and are still labeled , and if a variable appears times in the clause collection , then in , the vertices labeled in ’s are merged into one vertex . Therefore, each variable corresponds to exactly one vertex in .

has two portions. Let . The first portion can be obtained by merging , ; that is,

Similarly, the edges with the same vertex label in ’s () are merged into one edge of , and ’s are bipartite subgraph of . To ensure that there exists a 3-biclique partition for , we require the addition of more edges to be the other portion of : the edges among the nonpublic vertices of and are added as the edge set , where . These additional edge sets are formally stated as follows:

Consequently, the second portion of can be obtained:

This completes the construction of the bipartite graph . obtained by merging ’s has at most vertices and edges. Therefore, can be constructed in polynomial time.

For , , and , Proposition 10 holds.

Proposition 10. A bipartite subgraph of induced by is isomorphic to , and a bipartite subgraph of induced by is isomorphic to , where . (The proof is in Appendix F.)

Next, we show that there does not exist a 2-biclique partition for ; that is, if there exists a -biclique partition for , then .

Lemma 11. If there exists a -biclique partition for , then .

Proof. An arbitrary vertex is adjacent to at most two of , , and in . In the process of building , there is no additional edge whose end vertex is in . Therefore, an arbitrary vertex is adjacent to, at most, two of , , and , such that , , and belong to at least two bicliques. If , , and are partitioned into two bicliques, then suppose that and are partitioned into different bicliques, where , , , . Based on the process of building , , and . Thus, , , , and of belong to at least three bicliques, and the lemma follows.

In the following, we prove that if there exists a 3-biclique partition for , then Lemmas 12 and 13 hold.

Lemma 12. If there exists at least one 3-biclique partition for , then , , and will always be partitioned into three different bicliques for an arbitrary 3-biclique partition of .

Proof. There are only three edges , , and between and in . Therefore, if , and are partitioned into three bicliques, then , , and must be partitioned into three bicliques. Moreover, because an arbitrary vertex is adjacent to at most two vertices of , , , and belong to at least two bicliques in a 3-biclique partition of . We next show that , , and do not belong to two bicliques using proof by contradiction.
Suppose that , , and belong to two bicliques. We can assume without loss of generality that is a 3-biclique partition of , , , where , , , . Because , , , , we have . Thus, there exists , , , such that , . Because , the vertices in are partitioned into three bicliques in a 3-biclique partition of . By Proposition 10, the edge subset of induced by is exactly . We next show that the vertices in also belong to three bicliques. Consider the following three cases: , , and . (1)If , then . As shown in Figure 7(a), if is , then there are no edges between and . Moreover, , , and cannot simultaneously belong to either or . Therefore, the vertices in belong to three bicliques. As shown in Figures 7(b)–7(d), if , we distinguish three cases. For an arbitrary , is not adjacent to two of , , and (the brown vertices), and these two vertices cannot simultaneously belong to or . Therefore, the vertices of belong to three bicliques.(2)If , then . As shown in Figures 8(a)–8(c), we distinguish three cases. For an arbitrary , is not adjacent to two of , , and (the brown vertices), and these two vertices cannot simultaneously belong to or . Therefore, the vertices of belong to three bicliques.(3)If , then because and are isomorphic, the vertices of also belong to three bicliques.
By (1), (2), and (3), either the left or right vertices of are always partitioned into three bicliques in a 3-biclique partition of . Thus, , , or induces a biclique in a 3-biclique partition of , respectively. The three bicliques are a 3-biclique partition of . From Lemmas 3, 5, and 7, , , and must belong to three different bicliques, which contradicts the supposition that , , and belong to two bicliques. The lemma follows.

(a)

(b)

(c)

(d)

(a)

(b)

(c)

Lemma 13. Let be a 3-biclique partition of . Then, is a 3-biclique partition of .

Proof. From Lemma 12, , , and are always partitioned into three different bicliques in a 3-biclique partition of . Thus, for in , the vertices of either its or all belong to three bicliques. By Proposition 10, the bipartite subgraph of induced by is . Therefore, the edges between and must belong to in a 3-biclique partition of . From the definition of a biclique, the lemma follows.

3.1.2. Completing the NP-Completeness Proof of 3-PBB

It is easy to see that 3-PBB NP because, for a given bipartite graph , a nondeterministic algorithm need only guess a partition with size 3 of that partitions into three groups and check in polynomial time whether the bipartite subgraph induced by each vertex group is a biclique.

Previously, we constructed a bipartite graph from a variable set and a clause collection . All that remains to be shown is that there exists a truth assignment for such that is satisfied if and only if there exists a 3-biclique partition for .

Assume that is a truth assignment that satisfies . We first assign each vertex of in three steps and then show that there exists a 3-biclique partition for .(1)Let ; then the value of is . The 3-biclique partitions of , , and are given from the values of , , and , as presented in Table 1. Based on Lemmas 3, 5, and 7, we set each vertex of , , and to “0,” “1,” or “2” by Formula (1) and Table 1.(2)We assign a value to each vertex of as follows: if a vertex has the same label with a vertex (), then set equal to the value of . As shown in Table 1, a key observation is that vertices with the same label in , , and are assigned an identical value by a 3-biclique partitions of , , or and the true assignment of . This ensures that each vertex of cannot be assigned different values.(3)Similarly as step (2), we assign a value to each vertex of as follows: if a vertex has the same label with a vertex (), then set equal to the value of . Clearly, by the truth assignment, even if a variable occurs in more than one clause of , the variable has exactly one value; therefore, even if a variable corresponds to more than one vertex in different ’s, these vertices corresponding to this variable are assigned an identical value, and it is not hard to see that each vertex of has an identical value in different ’s by Formula (1). In addition, except for , , , , , , , , and , there do not exist other vertices with the same label in different ’s. It follows that vertices with the same label in different ’s have an identical value. This ensures that each vertex of cannot be assigned different values.

Next, to prove that there exists a 3-biclique partition for , it suffices to show that vertices with an identical value form a biclique of . In other words, we only need to show that if and belong to the left and right vertex sets, respectively, and their values are identical, then . If and belong to the same , and their values are identical, then and certainly belong to a biclique, and must hold. If and belong to different ’s, then the edge must be added in the process of merging ’s into or merging ’s into ; that is, . Therefore, the vertices of with an identical value certainly form a biclique of .

Suppose that is a 3-biclique partition of . Based on Lemma 12, a 3-biclique partition of always partitions , , and into three different bicliques. By Formula (1), each vertex of is set to “0,” “1,” or “2.” We next show that the vertices that correspond to a clause are assigned .

Based on Lemma 13, is a 3-biclique partition of . Therefore, we can directly consider obtaining the assignment of , , and from a 3-biclique partition of .

When is , based on Lemma 4, we have (. Because of and of are of the same vertex, and of and of are of the same vertex in , then the assignment of in is the same as that of in , and the assignment of in is the same as that of in . Therefore, the assignments of in and in must satisfy . When is or , based on Lemmas 6 and 8, we have . Therefore, to ensure that holds, we must have hold. It follows that if there is a 3-biclique partition for , then holds.

Because each variable corresponds to exactly one vertex in , it is easy to obtain a truth assignment for all the variables: from the vertex values of . We merely set if the assignment of is in and set if the assignment of is in . After this assignment is made, an arbitrary clause of an MO3 instance is set to , which satisfies the clause collection of the MO3 instance.

3.2. The NP-Completeness of -PBB ()

To prove the NP-completeness of -PBB for any , we provide a reduction from 3-PBB as follows.

Theorem 14. -PBB () is NP-complete, where is a constant positive integer.

Proof. It is easy to see that -PBB ∈ NP because a nondeterministic algorithm need only guess a partition with size of , which partitions into groups for a given bipartite graph , and check in polynomial time whether the bipartite subgraph that is induced by each vertex group is a biclique.
We provide a reduction from 3-PBB. Given an input instance of 3-PBB, we form an instance of -PBB () as follows: ; ; . That is, we add vertices and () independent edges to for building . Then becomes an instance of -PBB (). The subgraph formed by these additional vertices and edges consists of disjoint bicliques, and each biclique contains only one edge.
We have that there exists a 3-biclique partition for if and only if there exists a -biclique partition for by the observation of and . The theorem follows.

By Theorems 2 and 14, we get that Corollary 15 holds.

Corollary 15. -PBB is NP-complete for , where is a constant positive integer.

4. The Complexity of -SPBM

Next, we discuss the complexity of -SPBM. We show that -SPBM is NP-complete for any .

Theorem 16. -SPBM is NP-complete for an arbitrary , where is a constant positive integer.

Proof. It is easy to see that -SPBM belongs to NP, given a binary matrix , because a nondeterministic algorithm need only guess submatrices with entries “1” of and check in polynomial time whether these submatrices are a -submatrix partition of .
In what follows, we reduce -PBB to -SPBM. Assume that is an instance of -PBB, where , . Thus, we construct an binary matrix , and we assign “0” or “1” to each entry of by the following: We next show that there exists a -biclique partition for if and only if has a -submatrix partition.
Suppose that is a -biclique partition of . A submatrix of can be obtained from the vertex set as follows. Let , and let and be the left and right vertex sets of , respectively. Then let , . Thus, a submatrix of is selected. Note that, because are a -biclique partition of , , , where , and , . Moreover, for , . Because , ; that is, each entry of is “1.” Thus, are a -submatrix partition of .
Assume that are submatrices of , where (), , , , and each entry of is “1.” Then, for the vertex set obtained from and , where , the bipartite subgraph of induced by is a biclique because each entry of is “1.” Moreover, as are pairwise row and column exclusive and each row (column) of occurs in exactly one of these submatrices, is a 3-biclique partition of .

5. Applications

Large binary matrices arise in many applications, for example, market-basket data analysis, text mining, and community detection. In addition, we can transform a real matrix into a binary matrix in biclustering for convenient analysis [11, 21–24]; the same approach can be used for clustering [25–27]. Recently, because of its prevalence and importance, the biclustering problem in binary matrices has been widely applied to many domains [3, 24, 28], such as the following.(1)Market-basket analysis: this goal aims at finding groups of customers who have similar purchasing preferences toward a subset of products. We are given a binary matrix with rows that correspond to customers and columns that correspond to products. If entry of the matrix is “1,” then customer purchased product . If the entry is “0,” then the customer did not purchase that product. Clearly, a submatrix with entries “1” formed by a subset of rows and a subset of columns can reveal that the corresponding customers have similar purchasing preferences [3].(2)Gene expression data analysis: this analysis searches for groups of genes that have similar expression levels toward a subset of conditions. We are given a binary matrix with rows that correspond to genes and columns that correspond to conditions. If entry of the matrix is “1,” then gene was switched on under condition . If the entry is “0,” then the gene was not switched on under the condition. A submatrix with entries “1” formed by a subset of rows and a subset of columns can reveal that it is highly likely that these genes in the submatrix either perform similar functions or are involved in the same biological process [11].(3)There are also many other applications, including community detection and text mining.

The model of -SPBM can be used to analyze data that belong to different domains and can help extract previously unknown interesting patterns of biclusters.

6. Conclusions and Future Work

We have first proved that 3-PBB is NP-complete by reduction from MO3. Moreover, we have proved that -PBB () is NP-complete by reduction from 3-PBB, thus proving that -PBB () is NP-complete. Finally, we have shown that -SPBM () is NP-complete from the NP-completeness of -PBB ().

Because -SPBM () is NP-complete, the problem has no polynomial time algorithm. Determining an efficient exact algorithm or an approximation algorithm is important, and it requires further research. We intend to study this problem in the future. Moreover, the complexity of some variants of finding bicliques in bipartite graphs is open, for example, the maximum ±1 edge weight biclique problem [15]. Additionally, we plan to study complexity and algorithms for these problems.

Appendices

A. Proof of Lemma 3

Proof. Obviously, for a 3-biclique partition of , and belong to 1 or 2 bicliques. If and belong to 2 bicliques, with and , then , , and belong to three different bicliques. Moreover, ; therefore, either and or and belong to the same biclique. Thus, if there exists a 3-biclique partition for , there are three cases to be considered: (1) and belong to 1 biclique; (2) and belong to 2 bicliques, and and belong to the same biclique; and (3) and belong to 2 bicliques and and belong to the same biclique. Below we discuss the three cases. (1)In case 1, as shown in Figure 3(b), suppose that is an arbitrary 3-biclique partition of . Because is a unique vertex that is adjacent to and , and , we can assume without loss of generality that and . Because and , we have . Because and , and is a unique vertex that is adjacent to and , thus, we have . Because and , and , and and , we have . Because and , , and , and and , thus, we have . We conclude that, in case 1, each vertex set of , , and induces a biclique. It follows that, in case 1, is a unique 3-biclique partition of .(2)In case 2, as shown in Figure 3(c), suppose that is an arbitrary 3-biclique partition of . Because , , and belong to different bicliques, we can assume without loss of generality that , , and . Because , , and is a unique vertex that has edges to and , we have . Because and , and and , we have and . Because and , and and , we have and . Because and , and and , we have and . Because and , we have . We conclude that, in case 2, each vertex set of , , and induces a biclique. It follows that, in case 2, is a unique 3-biclique partition of .(3)In case 3, as shown in Figure 3(d), suppose that is an arbitrary 3-biclique partition of . Because , , and belong to different bicliques, we can assume without loss of generality that , , and . Because , , and is a unique vertex that is adjacent to and , we have . Because and , and , and = , and and , we have and . Because and , and and , we have . Because and , we have . We conclude that, in case 3, each vertex set of , , and induces a biclique. It follows that, in case 3, is a unique 3-biclique partition of .
Thus, there exist exactly three 3-biclique partitions for . The lemma follows.

B. Proof of Lemma 4

Proof. By Lemma 3, there exist exactly three 3-biclique partitions for . Therefore, the lemma follows.

C. Proof of Lemma 5

Proof. For a 3-biclique partition of , there are two cases to be considered: (1) and belong to the same biclique, and (2) and belong to different bicliques. Below we discuss the two cases. (1)In case 1, as shown in Figure 4(b), suppose that is an arbitrary 3-biclique partition of . Because is a unique vertex that is adjacent to and , and has no edges to and , we can assume without loss of generality that and . Because and , and , and and , we have . Because and , and , and and , we have . Because and , we have . Because and , we have . We conclude that, in case 1, each vertex set of , , and induces a biclique. It follows that, in case 1, is a unique 3-biclique partition of .(2)In case 2, as shown in Figure 4(c), suppose that is an arbitrary 3-biclique partition of . Because and have no edges to and , and is a unique vertex that is adjacent to and , it follows that , , and must belong to different bicliques. We can assume without loss of generality that and , . Because and , and , and and , we have . Because and , and , and and , we have . Because and , we have . We conclude that, in case 1, each vertex set of , , and induces a biclique. It follows that, in case 2, is a unique 3-biclique partition of .
Thus, there exist exactly two 3-biclique partitions for . The lemma follows.

D. Proof of Lemma 6

Proof. By Lemma 5, there exist exactly two 3-biclique partitions for . Therefore, the lemma follows.

E. Proof of Proposition 9

Proof. Suppose that a bipartite subgraph of induced by is . It suffices to prove that . By Formulae (4) and (5), for an edge , and do not simultaneously belong to . That is, . Therefore, we need only consider whether the edges in can lead to a difference between and . By Formula (3), we have . For , we next show that, if any edge of does not belong to , then it cannot become an edge of . To ensure this result, it suffices to show that the public vertices of and induce isomorphic bipartite subgraphs in and , respectively. In fact, the vertex set induces isomorphic bipartite subgraphs in and ; the vertex set induces isomorphic bipartite subgraphs in and ; the vertex set induces isomorphic bipartite subgraphs in and . Thus, .

F. Proof of Proposition 10

Proof. Suppose that the bipartite subgraph induced by is ); we show that as follows. By Formulae (8) and (9), for an edge , and do not simultaneously belong to the same . In other words, . Thus, we need only to consider whether an edge in can lead to a difference between and . From Formula (7), we have . For , we next show that, for an arbitrary edge of , if it does not belong to , then it cannot become an edge of . To ensure this result, it suffices to show that the public vertices of and induce isomorphic bipartite subgraphs in and , respectively. In fact, if , then . Obviously, induces isomorphic bipartite subgraphs in and . If , then . Note that any vertex belonging to is always adjacent to and in or . Hence, induces isomorphic bipartite subgraphs in and . Therefore, the bipartite subgraph of induced by is isomorphic to . From Proposition 9, the bipartite subgraph of induced by is isomorphic to .

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors would like to thank the anonymous reviewers for their helpful suggestions and constructive comments. This paper was supported by (1) the PhD Station Foundation of Chinese Education Department (Grant no. 20090131110009), (2) the National Natural Science Foundation of China (Grants nos. 61070019, 61373079, and 61373079), (3) the Natural Science Foundation of Shandong Province of China (Grant no. ZR2011FL004), (4) the Science and Technology Development Planning Item of Yantai (Grant no. 2010167), and (5) the Key Project of Chinese Ministry of Education (Grant no. 212101).

References

J. Abello, P. Pardalos, and M. Resende, Handbook of Massive Data Sets, vol. 4, Springer, Amsterdam, The Netherlands, 2002.
A. K. Jain, “Data clustering: 50 years beyond K-means,” Pattern Recognition Letters, vol. 31, no. 8, pp. 651–666, 2010.
View at: Publisher Site | Google Scholar
S. Busygin, O. Prokopyev, and P. M. Pardalos, “Biclustering in data mining,” Computers & Operations Research, vol. 35, no. 9, pp. 2964–2987, 2008.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
J. Hartigan, “Direct clustering of a data matrix,” Journal of the American Statistical Association, pp. 123–129, 1972.
View at: Google Scholar
B. Mirkin, Mathematical Classification and Clustering, vol. 11, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1996.
View at: MathSciNet
Y. Cheng and G. M. Church, “Biclustering of expression data,” in Proceedings of the International Conference on Intelligent Systems for Molecular Biology (ISMB '00), pp. 93–103, AAAI Press, 2000.
View at: Google Scholar
J. Yang, W. Wang, H. Wang, and P. Yu, “δ-clusters: capturing subspace correlation in a large data set,” in Proceedings of the 18th International Conference on Data Engineering, pp. 517–528, March 2002.
View at: Google Scholar
I. S. Dhillon, “Co-clustering documents and words using bipartite spectral graph partitioning,” in Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 269–274, August 2001.
View at: Google Scholar
Q. Sheng, Y. Moreau, and B. De Moor, “Biclustering microarray data by Gibbs sampling,” Bioinformatics, vol. 19, no. 2, pp. 196–205, 2003.
View at: Publisher Site | Google Scholar
A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini, “Discovering local structure in gene expression data: the order-preserving submatrix problem,” in Proceedings of the 6th Annual International Conference on Computational Biology, pp. 49–57, Washington, DC, USA, April 2002.
View at: Google Scholar
S. C. Madeira and A. L. Oliveira, “Biclustering algorithms for biological data analysis: a survey,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 1, no. 1, pp. 24–45, 2004.
View at: Publisher Site | Google Scholar
A. Tanay, R. Sharan, and R. Shamir, “Biclustering algorithms: a survey,” in Handbook of Computational Molecular Biology, S. Aluru, Ed., pp. 261–274, Chapman & Hall/CRC, London, UK, 2006.
View at: Google Scholar
R. Peeters, “The maximum edge biclique problem is NP-complete,” Discrete Applied Mathematics, vol. 131, no. 3, pp. 651–654, 2003.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
M. Dawande, P. Keskinocak, J. M. Swaminathan, and S. Tayur, “On bipartite and multipartite clique problems,” Journal of Algorithms. Cognition, Informatics and Logic, vol. 41, no. 2, pp. 388–403, 2001.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
N. Amit, The bicluster graph editing problem [M.S. thesis], Tel Aviv University, 2004.
M. Dawande, P. Keskinocak, and S. Tayur, “On the biclique problem in bipartite graphs,” Gsia Working Paper, Carnegie Mellon University, 1996.
View at: Google Scholar
M. H. Heydari, C. O. Shields Jr., L. Morales, and I. H. Sudborough, “Computing cross associations for attack graphs and other applications,” in Proceedings of the 40th Annual Hawaii International Conference on System Sciences (HICSS '08), p. 270b, IEEE Computer Society, Waikoloa, Hawaii, USA, January 2007.
View at: Publisher Site | Google Scholar
D. Bein, L. Morales, W. Bein, J. C. O. Shields, Z. Meng, and I. H. Sudborough, “Clustering and the biclique partition problem,” in Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS '08), p. 475, IEEE Computer Society Press, Waikoloa, Hawaii, USA, 2008.
View at: Google Scholar
T. J. Schaefer, “The complexity of satisfiability problems,” in Proceedings of the 10th Annual ACM Symposium on Theory of Computing, pp. 216–226, ACM, San Diego, Calif, USA, 1978.
View at: Google Scholar | Zentralblatt MATH | MathSciNet
M. R. Garey and D. S. Johnson, Computers and Intractability. A Guide to the Theory of NP-Completeness, W. H. Freeman, San Francisco, Calif, USA, 1979.
View at: MathSciNet
A. Prelić, S. Bleuler, P. Zimmermann et al., “A systematic comparison and evaluation of biclustering methods for gene expression data,” Bioinformatics, vol. 22, no. 9, pp. 1122–1129, 2006.
View at: Publisher Site | Google Scholar
A. Tanay, R. Sharan, and R. Shamir, “Discovering statistically significant biclusters in gene expression data,” Bioinformatics, vol. 18, supplement 1, pp. S136–S144, 2002.
View at: Google Scholar
L. Wang, Y. Lin, and X. Liu, “Approximation algorithms for biclustering problems,” SIAM Journal on Computing, vol. 38, no. 4, pp. 1504–1518, 2008.
View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
Z. Y. Zhang, T. Li, C. Ding, X. W. Ren, and X. S. Zhang, “Binary matrix factorization for analyzing gene expression data,” Data Mining and Knowledge Discovery, vol. 20, no. 1, pp. 28–52, 2010.
View at: Publisher Site | Google Scholar | MathSciNet
A. Figueroa, J. Borneman, and T. Jiang, “Clustering binary fingerprint vectors with missing values for DNA array data analysis,” Journal of Computational Biology, vol. 11, no. 5, pp. 887–901, 2004.
View at: Publisher Site | Google Scholar
R. Gelbard, O. Goldman, and I. Spiegler, “Investigating diversity of clustering methods: an empirical comparison,” Data and Knowledge Engineering, vol. 63, no. 1, pp. 155–166, 2007.
View at: Publisher Site | Google Scholar
P. Q. Liu, D. M. Zhu, Q. S. Xie, H. Fan, and S. H. Ma, “Complexity and improved heuristic algorithms for binary fingerprints clustering,” Journal of Software, vol. 19, no. 3, pp. 500–510, 2008.
View at: Publisher Site | Google Scholar
D. Chakrabarti, D. S. Modha, S. Papadimitriou, and C. Faloutsos, “Fully automatic cross-associations,” in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 79–88, ACM, August 2004.
View at: Google Scholar

Copyright

Copyright © 2014 Peiqiang Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1592

Downloads

960

Citations

Mathematical Problems in Engineering

Partition of a Binary Matrix into ( ) Exclusive Row and Column Submatrices Is Difficult

Abstract

1. Introduction

2. Preliminaries

3. The Complexity of -PBB

3.1. The NP-Completeness of 3-PBB

3.1.1. The Construction of a Bipartite Graph from an MO3 Instance

3.1.2. Completing the NP-Completeness Proof of 3-PBB

3.2. The NP-Completeness of -PBB ()

4. The Complexity of -SPBM

5. Applications

6. Conclusions and Future Work

Appendices

A. Proof of Lemma 3

B. Proof of Lemma 4

C. Proof of Lemma 5

D. Proof of Lemma 6

E. Proof of Proposition 9

F. Proof of Proposition 10

Conflict of Interests

Acknowledgments

References

Copyright

Partition of a Binary Matrix into () Exclusive Row and Column Submatrices Is Difficult