Abstract

The purpose of this paper is to propose a new kind of P systems on simplicial complexes. We present the basic discrete Morse structure, membrane structures on complexes, and communication rules. A new grid-based clustering technique is described based on this kind of new P systems. Examples are given to show the effect of the algorithm. The new P systems provide an alternative for traditional membrane computing.

1. Introduction

Membrane computing is a new branch of natural computing which is initiated by Păun et al. at the end of 1998, as an attempt to formulate models from the functioning of living cells [1], just like DNA computing coming from genes [24]. The advantage of these methods lies in its huge inherent parallelism which has drawn great attention from the scientific community so far. In recent years, many different models of P systems have been proposed, such as cell-like P systems, tissue-like P systems, and spiking neural P systems [59]. The obtained computing systems prove to be so powerful that they are equivalent with Turing machines [10] even when using restricted combinations of features and also computationally efficient. Up to now a number of applications were reported in several areas such as biology, biomedicine, linguistics, computer graphics, economics, approximate optimization, cryptography, and so forth.

Traditionally, Morse theory is the subject of differential topology and differential geometry where the topological spaces in question are smooth manifolds. When we want to study discrete problems, we will use combinatorial complexes rather than manifolds. Along this line, discrete Morse theory has been developed [11, 12]. Recently, discrete Morse theory has attracted many researchers because it has found applications in triangulations and graphics. In fact, simplicial complex, the basic data structure in discrete Morse theory, will prove to be an important data structure besides trees and graphs.

Spatial cluster analysis is a traditional problem in knowledge discovery from databases [13]. It has wide applications since increasingly large amounts of data obtained from satellite images, X-ray crystallography, or other automatic equipment are stored in spatial databases. The most classical spatial clustering technique is due to Han and Kamber [13] who developed a variant PAM algorithm called CLARANS, while new techniques are proposed continuously in the literature aiming to reduce the time complexity or to fit for more complicated cluster shapes. Other clustering-like problems include impulsive cluster anticonsensus of discrete multiagent linear dynamic systems [14] and driving general complex networks into prescribed cluster synchronization patterns by using pinning control [15]. In another research, the authors introduce a cooperative article bee colony algorithm for solving clustering problems [16]. Also the authors propose a DNA-based clustering method by the Adleman-Lipton model [17]. For related research, one can also refer to [1820].

In medical analysis there often appear data clustering problems of various type. The Wisconsin Breast Cancer Showhouse was founded in 1998 as an all-volunteer 501(c)(3) charitable organization by Nance Kinney, a breast cancer survivor (http://www.breastcancershowhouse.org/wbcs2012/index.html). Its mission is to support breast cancer and prostate cancer research at the Medical College of Wisconsin. This breast cancer databases were obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg. Many computing methods have been applied to study this data case and, in fact, the Wisconsin Breast Cancer data set is becoming an important testing benchmark for soft computing.

Inspired by the above research, this paper focuses on the joint study of discrete Morse theory with membrane computing. Our purpose is to propose a P system on simplicial complex. Up to our knowledge, this is the first paper to extend membrane computing to complexes. Then we use membrane computing in cluster analysis, providing a new approach to data mining. We first propose a discrete Morse structure for a candidate of a class of new P systems. Then we described for the first time a communication P system on simplices. Then we propose a new method for cluster analysis by simplicial P systems. Finally, we present the Wisconsin Breast Cancer analysis.

2. Discrete Morse Structure

In this section we present some general discrete models which will form the basis of membrane structures. The main idea comes from [11, 21]. In order to do this, we need to present some basic topological concepts. For simplicity we always assume that we are working in an Euclidean space .

2.1. Simplex without Orientation

A -simplex (cell) is the convex hull of affinely independent points. More precisely, suppose are affinely independent, that is, are linearly independent. Then is defined as the set of points in the form , where and . We will call the dimension of a simplex and write , while are called vertices of the simplex. A simplex is uniquely indicated by its vertices and hence is expressed as , or simply , and will be called a cell in this paper.

A face of a simplex is defined as a simplex generated by a nonempty subset of its vertices. We write . A face is called a hyperface of if and is denoted by . In this case, is called the parent of . Two cells and are called incident if , and is called the coface of and . Two cells are called neighbors if they share a common hyperface. The cone from a vertex to a -simplex is the convex hull of and which yields a -simplex provided is not an affine combination of the vertices of . A simplicial complex is a finite collection of nonempty simplices for which and implies and implies that is either empty or a face of both. The underlying space of is the union of simplices: . is a subset of containing simplices of dimension .

For a -simplex , define to be the collection of and all its faces. Then it is clear that is a simplicial complex. We will call this simplicial complex a simple complex or, simply, a complex.

Now we consider some properties of incident and neighborhood relations as described above. First suppose are incident and . Then are faces of a simplex. Define and we will show that . This is evident because if , then . By removing one vertex we obtain and with vertices remained. Since we know that there exists at least one common vertex among , . By the definition of we get and consequently , are neighbors.

Conversely, if are neighbors, they need not be incident as shown in Figure 1. In the case when they are in the same simplex, however, this is true. In fact, suppose . Then all the vertices belong to and . Then we can assume that and . Since both cells are in the same simplex, define . Then . Consequently , are incident. Putting everything together we get a theorem.

Theorem 2.1. Two incident cells , are neighbors if they are at least one-dimensional. Two neighboring cells are incident provided they are located in the same simplex.

By the definition of neighborhood, if , are neighbors and their common hyperface is , then clearly and, consequently, this common hyperface is unique. However, there exist cells with nonempty intersection but they are not neighbors. Now we consider incident cells . We will show that their coface is also unique. First if , then . Therefore is the edge joining the two vertices , and hence is unique. Next if , then as described previously we define . Then and . Therefore and is unique.

Theorem 2.2. Neighboring cells can only share a unique common hyperface. Incident cells have a unique coface (parent).

2.2. Simplex with Orientation

Simplicial complexes with orientation are important tools in the study of topological properties of discrete data structure. Concepts about discrete Morse functions are listed as follows (Robin [11]). For a simplex , there are two orientations and the opposite orientation is denoted by . Figure 2 shows the orientation of a three-dimensional complex.

In the following we will use to denote a face of , where the vertex is eliminated. The following chain is defined as the boundary of the complex:

If are integers, then is called a chain, where remains the same within a chain. Boundary operator extends to chains naturally. An important property of boundary operator is that . For a -dimensional simplex and a -dimensional simplex , define its relationship operator as follows:

In simplicial complex we can define Morse functions which is a tool for optimization.

Definition 2.3. Let be a simplicial complex. A function is a discrete Morse function if for every the following two statements are true:; .A simplex is critical with index if; .

Discrete gradient can also be defined on the complex . If is a critical point, then define . Otherwise if is not critical and the edge satisfies , , then define where the sign is determined so that . Here the inner product is the obvious inner product on oriented chains with respect to which the oriented simplices are orthonormal. It is easy to see that, if the edge , then . Generally speaking, discrete gradient is a mapping

A natural gradient flow is .

3. Communication P Systems on Simplices

3.1. Traditional P Systems

Membrane is a structure serving as a protected reactor. We will identify a membrane with its delimited space. When we say inclusion for membranes, it is always strict inclusion. Now we list some elementary concepts concerning the basic operations of membranes:(i) are vicinal, if and there is no such that ,(ii)elementary membrane: with no lower vicinal membranes, skin membrane: with no upper vicinal membranes, we assume there is always a unique skin,(iii)degree: number of membranes,(iv)sibling membranes , : if there is a which is upper vicinal for both and .

Parentheses expression is often used to describe membrane structures. For example, the membrane structure as shown in Figure 3 has a parentheses expression for membranes as follows:

For a set , a multiset over is a mapping , where is the set of nonnegative integers. For , is the multiplicity of in . Suppose the set of objects is with a subset such that objects from are available in the environment in arbitrary multiplicities, that is, its multiset is . A P system with symport/antiport rules of degree is a construct where is the alphabet, is the alphabet of terminal objects, is a membrane structure of degree , are the multisets of objects associated with the regions of , and are finite sets of symport and antiport rules associated with the membranes of , and is the input/output region.

A P system is called stable if, even if some rules are still applicable, their application does not change the string/object content of the membrane structure, nor the membrane structure itself. For a subalphabet , we call a system stable over if the projection over of the string/object remains unchanged, even if some rules are still applicable. If , we will say stable over . If , with for , is a subset of rules, we call a P system stable with respect to the rules if the P system with rules is stable (i.e., applications of rules in do not change the string/object content of the system’s membranes) [19].

3.2. Membrane Structures on Simplices

Now we describe membrane structures on simplicial complexes. First we assume that the complex is a simple complex, that is, a simplex with all its faces including vertices. In Figure 4 a simple three dimensional example is presented with 15 membranes. In the general case when there is a complex in , the number of simplices are

The boundary relations of simplices are shown in Figure 5 where the arrows point to the boundary cells.

Now we consider the general cases where is a simplicial complex in . For an example of in as shown in Figure 6, the total number of cells is . For example, the three dimensional simplices are listed as

Generally speaking, a simplicial complex is denoted by a set of vertices in . A simplex is called membrane. A membrane is called a max-simplex if it is not a face of another simplex in . A simplex is denoted by its vertices. Evidently, vertices are zero-dimensional cells and hence are elementary membranes. If is a face of , then we way that is the parent of . Figure 5 shows the parents and neighborhood relations of the complex in Figure 4. For the more general simplicial complex as in Figure 6, the network model is shown in Figure 7.

If , are incident, we say there is an upper link (channel) between , . If , are neighbors, we say there is a lower link between them. A upper link is denoted by , where is their (common) parent, while lower link is written as , where is their common hyperface. Upper link is also written by or, simply, , while lower link is denoted by . Links have no directions. Thus and are identical. We will specify one from these two links, and only one is allowed.

Definition 3.1. A P system on a simplicial complex , called a simplicial P system, with antiport and symport rules is a construct where is the number of cells labeled with , is the alphabet, is the set of objects with unlimited multiplicity in the environment, are initial strings over of multiset, are symport and antiport rules associated with the membranes, is the set of links, and is a finite set of antiport and/or symport rules associated with the link .

Ceterchi and Martin-Vide [19] proposed a new type of communication P systems with priority relations. They introduced a promoter for a rule to be active and a inhibitor for it to be inactive. Induced by their idea, we will present an ordered system in this section.

An antiport rule in exchanges the multiset inside with outside it. A symport rule (or ) sends out (takes in) the multiset with respect to membrane . For a specific membrane , rules are totally ordered as . The rule is applicable if and only if the system has reached a stable configuration with respect to rules . We can use a queue structure to represent this process.

3.3. Rules in Simplicial P Systems

Now we describe the communication rules in simplicial P systems. For our purpose in this paper, there are mainly four types of communication rules in a simplicial P system. Each type of rule may have operators such as .

First suppose is an upper link. A rule like means that the multiset and from and transform into and go up to their parent . For two cells , the antiport rule in means exchanging multiset inside membrane with the multiset outside it (in ). The symport rule sends the multiset outside . Another symport rule works similarly.

An upper link rule in may have the following forms:

An lower link rule in may have the following forms:

Equation (3.6) means moving from cell and from to to parent and simultaneously moving from to into and into . Equation (3.8) means moving from cell and from to to hyperface and simultaneously moving from to up to and up to .

3.4. Configuration and Computation

Now we describe the configuration and computation of simplicial P systems. For our purpose, change of membrane structure is not involved. A configuration of a simplicial P system is the state of the system described by specifying the objects and rules associated to each membrane. The initial state is called initial configuration. Therefore, the multisets represented by in , constitute the initial configuration of the system.

The system evolves by applying rules in membranes and this evolution is called computation. The computation starts with the multisets specified by in the cells. In each time unit, rules are used in a cell. If no rule is applicable for a cell, then no object changes in it. The system is synchronously evolving for all cells.

When the system has reached a configuration in which no rule is any longer applicable, we say that the computation halts. A configuration is stable if, even if some rules are still applicable, their application does not change the object content of the membranes. The computation is successful if and only if it halts, or it is stable. The result of a halting/stable computation is the number described by the multiplicity of objects present in the cell in the halting/stable configuration.

4. Cluster Analysis by Simplicial P Systems

4.1. Problem Setting and Algorithm

Now suppose the data set to be clustered is . We now construct a uniform grid in as follows. Choose integers and let , and divide the interval into subintervals . Therefore is contained in the following grid with cells, where is the number of cells that do not contain data:

Then we can define a new data set , where is each grid point :

Define a weight function on by the cell corresponding to the data . In this way, the original data set is transformed into a new data set with weights equal to the original number of points. Next we will always work on the data set . For simplicity, we will consider a density-based clustering technique. For two points , the similarity is defined by where is the topological distance of the two points. That is, if , are incident, then . Else is the minimal number of edges which form a path connecting and . For two subsets , the similarity is defined by

The clustering is implemented by a hierarchical method as follows. At first, each point of the data set forms a cluster which contains a singleton. Then each data point tries to connect another data point in its neighborhood. After this step, a cluster can be found as connected points. Now we construct a simplicial complex corresponding to the data set . On the basis of the rectangles as in (4.1), we add some hyperplanes to form a triangulation. Then each rectangle is decomposed as several simplices. Hyperplanes can be chosen such that the set of simplices satisfy the definition of simplicial complex. And then is defined as the union of such simplices. In the three dimensional case, this is shown as in Figure 8.

Now we will show that the triangulation as above exists. In fact, we need only to consider an inner cell , where . There are vertices for this cell

There are totally surrounding cells

First we consider the triangulation of the cell . Now we choose one vertex from the cell where can be or . Now we denote

Consider the vertex set

Then . Clearly can be decomposed as disjoint cones . Notice that is an -dimensional rectangle and this means that, for any triangulation of , we can join them with to form a triangulation of . Therefore by induction we get the construction of triangulation.

Lemma 4.1. The decomposition is valid with the property that each pair of is disjoint, where is the set of interior points in .

Proof. Suppose and . Consider the point + where . Let be the first zero point of the following function:
Clearly . Let . Then .

By the above discussion, we also have another lemma.

Lemma 4.2. Suppose the triangulation of in is . Then the set forms a triangulation of .

Finally we obtain a theorem.

Theorem 4.3. A simplicial complex exists corresponding to the grids .

For each node , define its neighborhood as is incident with in . For a subset , define its neighborhood as .

Now we propose an algorithm for our clustering problem. This is a self-joining cluster technique. Initially, one data point in forms a cluster of singleton. Then each node searches for its neighborhood. If there exists a neighboring node which is similar enough, then activate a link between the two nodes. The final cluster is linked nodes. To define the meaning of similar enough, we need a parameter such that . Putting everything together, we get the algorithm as shown in Algorithm 1.

Inputs:   , : similarity threshold value, are incident}
Outputs:   : set of clusters, : number of clusters, : outliers
Begin
Set , , .
for   to
  If , then add into set . Otherwise add into set . For each , calculate
  the similarity measure . If , then set the edge as active. In this case, set
   if is still an empty set.
end
while     do
   (1) For each if there exists an active edge with , , then add into and
  remove from .
   (2) .
end
End

4.2. Design of a P System

Now we have already defined the simplicial complex as the membrane structure with total membranes. Next we need to specify the alphabet and rules to be used. First we design a binary coding scheme for the weight function and the distance function . Suppose the length of the coding is . In this way, the weight function and the distance function take binary strings as values. Now we suppose , where . We also need an integer such that if the coding of is , then

Suppose is a subset of links. Then we can define a P system as follows:

The working alphabet is

Let , be the vertices. We will use to denote a -dimensional simplex. Therefore is a vertex or for short. Define . is the set of edges in .

The initial multiset stands for the multiset in the membrane :

For , rules are with the order as follows:

Rules on edges are

5. Example and Discussion

The Wisconsin breast cancer data set consists of 699 samples with 9 attributes. The samples are classified as two categories, the malignant and the benign. The nine attributes are Clump-Thickness, Cell-Size-Uniformity, Cell-Shape-Uniformity, Marginal-Adhesion, Single-Epi-Cell-Size, Bare-Nuclei, Bland-Chromatin, Normal-Nucleoli, and Mitoses. The data is shown in Figure 9 with an added noise of .

Now we choose attributes 2 and 6 and accumulate the data samples and use the grid technique proposed in this paper. Then the data set is shown in the following matrix:

Next we construct a simplicial complex as shown in Figure 10 in where each vertex corresponds to the data points as in (5.1).

We now define . Then and . Now we choose . Then the final cluster is shown in Figure 11. We find five clusters and 31 outliers.

Next we choose attributes 1 and 2 and then the data matrix is

Now we analyze the effect of the parameter . First we choose . Then the clustering result is one cluster with all data. Then the ill clustered points is 699. Next let . Then we get two clusters and some outliers. However, in this case the red data are all ill-clustered and hence the error rate is 241. Now choose . In this case we find four clusters. The number of outliers are 46 while the ill-clustered points are . Then the total number is 185. Next choose . This time we find 6 clusters. Number of outliers is 87. Number of ill-clustered points is 91. Number of other error points is 40 hence the total number is 218. Again we choose . In this time we obtain 4 clusters with a large outlier (outlier number: 157, error clusters: 65, other errors: 13, total number: 235). The error line is shown in Figure 12.

As a result we see that the best parameter is .

Acknowledgments

This research is supported by the Natural Science Foundation of China (no. 61170038,71071090,60873058), the Natural Science Foundation of Shandong Province (no. ZR2011FM001), and the Shandong Soft Science Major Project (No. 2010RKMA2005).