The Scientific World Journal

Volume 2015 (2015), Article ID 929471, 8 pages

http://dx.doi.org/10.1155/2015/929471

## A Novel Clustering Algorithm Inspired by Membrane Computing

^{1}Center for Radio Administration and Technology Development, Xihua University, Chengdu 610039, China^{2}School of Mathematics and Computer Engineering, Xihua University, Chengdu 610039, China^{3}School of Electrical and Information Engineering, Xihua University, Chengdu 610039, China

Received 10 June 2014; Accepted 7 September 2014

Academic Editor: Shifei Ding

Copyright © 2015 Hong Peng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

P systems are a class of distributed parallel computing models; this paper presents a novel clustering algorithm, which is inspired from mechanism of a tissue-like P system with a loop structure of cells, called membrane clustering algorithm. The objects of the cells express the candidate centers of clusters and are evolved by the evolution rules. Based on the loop membrane structure, the communication rules realize a local neighborhood topology, which helps the coevolution of the objects and improves the diversity of objects in the system. The tissue-like P system can effectively search for the optimal partitioning with the help of its parallel computing advantage. The proposed clustering algorithm is evaluated on four artificial data sets and six real-life data sets. Experimental results show that the proposed clustering algorithm is superior or competitive to *k*-means algorithm and several evolutionary clustering algorithms recently reported in the literature.

#### 1. Introduction

Data clustering is a fundamental conceptual problem in data mining, which describes the process of grouping data into classes or clusters such that the data in each cluster share a high degree of similarity while being very dissimilar to data from other clusters [1]. Over the past years, a large number of clustering algorithms have been proposed [2–4], which can be divided roughly in two categories: hierarchical and partitional. Hierarchical clustering proceeds successively by either merging smaller clusters into larger ones or splitting larger clusters. Partitional clustering attempts to directly decompose a data set into several disjointed clusters based on similarity measure, for example, mean square error (MSE). Clustering algorithms have been used in a wide variety of areas, such as pattern recognition, machine learning, image processing, and web mining [5, 6]. In the present study, -means algorithm [7, 8] has received wide attention because of the following two reasons: (i) -means has been recently elected and listed among the top most influential data mining algorithms [9] and (ii) it is at the same time very simple and quite scalable, as it has linear asymptotic running time with respect to any variable of the problem. However, -means is sensitive to the initial centers and easy to get stuck at the local optimal solutions. Moreover, -means takes large time cost to find the global optimal solution when the number of data points is large.

In recent years, some evolutionary algorithms have been introduced to overcome the shortcomings of -means algorithm because of their global optimization capability. Several genetic algorithms- (GA-) based clustering algorithms have been proposed in the literature [10–14]. However, most of GA-based clustering algorithms can suffer from the degeneracy when numerous chromosomes represent the same solution. The degeneracy can lead to inefficient coverage of the search space as the same configurations of clusters are repeatedly explored. To overcome the shortcoming, particle swarm optimization- (PSO-) based or ant colony optimization- (ACO-) based clustering algorithms have been proposed. Kao et al. have proposed a hybrid technique based on combining the -means and PSO for cluster analysis [15]. Shelokar et al. have introduced an evolutionary algorithm based on ACO for clustering problem [16]. Niknam and Amiri have presented a hybrid evolutionary optimization algorithm based on the combination of PSO and ACO for solving the clustering problem [17].

The aim of membrane computing is to abstract computing ideas (data structures, operations with data, ways to control operations, computing models, etc.) from the structure and the functioning of a single cell and from complexes of cells, such as tissues and organs including the brain. There are three main classes of P systems investigated: cell-like P systems (based on a cell-like (hence hierarchical) arrangement of membranes delimiting compartments where multisets of chemicals evolve according to given evolution rules) [18], tissue-like P systems (instead of hierarchical arrangement of membranes, consider arbitrary graphs as underlying structures, with membranes placed in the nodes while edges correspond to communication channels) [19], and neural-like P systems [20]. Many variants of all these systems have been considered, for example, [21, 22] for cell-like P systems, [23, 24] for tissue-like P systems, and [25–30] for neural-like P systems. An overview of the field can be found in [31], with up-to-date information available at the membrane computing website (http://ppage.psystems.eu/). These efforts have addressed the parallel computing advantage of P systems as well as the high effectiveness of solving a variety of difficult problems; especially, P systems can solve a number of NP-hard problems in linear or polynomial time complexity [32] and even solve PSPACE problems in a feasible time [33, 34]. Moreover, membrane algorithms have demonstrated a powerful global optimization performance [35–37].

This paper focuses on application of membrane computing to data clustering. Our motivation is applying the specially designed elements and inherent mechanisms of P systems to realize a novel clustering algorithm, called the membrane clustering algorithm.

#### 2. Data Clustering Problem

Clustering is the process of recognizing natural groups or clusters from a data set based on some similarity measure. Suppose that data set has sample points, , (), and is partitioned into clusters, . Denote by the corresponding centers. Usually, partitional clustering algorithm searches for the optimal centers in the solution space according to some clustering measure in order to solve data clustering problem. A commonly used clustering measure is
where is the associate weight of point with cluster* j*, which will be either 1 or 0 (if point is allocated to cluster* j*, is 1, otherwise 0).

The clustering process, separating the objects into the clusters, is realized as an optimization problem. The goal of the optimization problem is to find the optimal centers by minimizing objective function 1:

In addition, the value will be used to evaluate objects in the proposed clustering algorithm. If the value of an object is the smaller one, the object is the better; otherwise, it is worse.

#### 3. Proposed Membrane Clustering Algorithm

In this section the proposed membrane clustering algorithm is discussed in detail, which is inspired by the mechanism of membrane computing. A tissue-like P system with a loop structure of cells is designed as its optimization framework. The tissue-like P system with a loop structure of cells can be described as the following construct: where ( ) is the set of objects in cell ; () is the set of evolution rules in cell , which contains three evolution rules: selection, crossover, and mutation rules; is finite set of communication rules with the following forms:(i)antiport rule:, , . The rule is used to communicate the objects between a cell and its two adjacent cells;(ii)symport rule:, . The rule is used to communicate the objects between cell and the environment. indicates the output region of the system.

Figure 1 shows membrane structure of the tissue-like P system, which consists of cells. The cells are labeled by , respectively. The region labeled by 0 is the environment and is also output region of the system. The directed lines in Figure 1 indicate the communication of objects between the cells. Moreover, the cells will be arranged as a loop topology based on the communication rules described below. As usual in P system, the cells, as parallel computing units, will run independently. In addition, the environment always stores the best object found so far in the system. When the system halts, the object in the environment will be regarded as the output of the whole system.