Computational Intelligence and Neuroscience

Volume 2017 (2017), Article ID 4367342, 11 pages

https://doi.org/10.1155/2017/4367342

## GA-Based Membrane Evolutionary Algorithm for Ensemble Clustering

School of Management Science and Engineering, Shandong Normal University, Jinan 250014, China

Correspondence should be addressed to Yanhua Wang, Xiyu Liu, and Laisheng Xiang

Received 21 June 2017; Revised 25 September 2017; Accepted 24 October 2017; Published 16 November 2017

Academic Editor: Leonardo Franco

Copyright © 2017 Yanhua Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Ensemble clustering can improve the generalization ability of a single clustering algorithm and generate a more robust clustering result by integrating multiple base clusterings, so it becomes the focus of current clustering research. Ensemble clustering aims at finding a consensus partition which agrees as much as possible with base clusterings. Genetic algorithm is a highly parallel, stochastic, and adaptive search algorithm developed from the natural selection and evolutionary mechanism of biology. In this paper, an improved genetic algorithm is designed by improving the coding of chromosome. A new membrane evolutionary algorithm is constructed by using genetic mechanisms as evolution rules and combines with the communication mechanism of cell-like P system. The proposed algorithm is used to optimize the base clusterings and find the optimal chromosome as the final ensemble clustering result. The global optimization ability of the genetic algorithm and the rapid convergence of the membrane system make membrane evolutionary algorithm perform better than several state-of-the-art techniques on six real-world UCI data sets.

#### 1. Introduction

Cluster analysis, also known as clustering, is a core technique in machine learning and artificial intelligence [1], which is a process of dividing a data object into subsets, each subset is defined as a cluster, and objects in the same cluster are as similar as possible, yet objects between two clusters are as different as possible.

Ensemble clustering, also known as consensus clustering or cluster aggregation, is simply reconciling clustering result coming from different clustering algorithms [2] or different initialization parameters run in the same algorithm [3]. The purpose of ensemble clustering is to find a consensus result which is as similar as possible to multiple existing base clusterings [4]. Compared with the single clustering algorithm, the clustering ensemble algorithm has higher robustness and stability, and the clustering results are insensitive to noise, isolated points, and sampling changes, so ensemble clustering has become a hotspot of cluster research in recent years. Existing ensemble clustering research methods can be divided into three categories, that is, the median partition based methods [5, 6], the pairwise similarity based methods [7–10], and the graph partitioning based methods [4, 11–13]. Among them, the median partition based methods aim to find a clustering that maximizes the similarity between this clustering and all of the base clusterings which can be viewed as the median point of the median partition [5, 6, 14].

The clustering problem of finding the optimal solution in many base clusterings becomes an optimization problem. Due to the large space of all possible base clusterings, finding the optimal solution is generally infeasible, and genetic algorithm as a classic optimization problem solving method has attracted my attention. Genetic algorithm is a randomized search method which simulates the evolution of biological laws [15]. It has inherent parallelism and global optimization ability. Using probabilistic optimization method, it can automatically obtain and guide the optimization search space and adaptively adjust the search direction [16–18]. The ensemble clustering problem is generally regarded as the median partition problem. In fact, the median partition problem is NP-complete [5]. Genetic algorithm has been proposed to find the approximative solution, in which the base clusterings are represented as chromosomes [5, 19]. In their study, chromosome is defined by base clustering class labels; when the number of data objects is large, the evolutionary efficiency is very low. In this paper, we improve the coding of chromosomes, and then the improved genetic algorithm is combined with membrane computing model for ensemble clustering.

P system, also known as a novel membrane computing model, is a biological computational model inspired by the study of the living cells, initiated by Păun in 1998. It aims to achieve calculation process by simulating the function of living cells, tissues, and organs. Objects in this model, which has complete computing capability, can evolve in a maximal parallelism and distributed manner [20]. It is exactly because of the maximum parallelism of membrane system that realizes multiple cell object concurrent evolution to search the optimal solution, which is similar to the effect of multipopulation evolution, thus making better performance of ensemble clustering. Membrane systems have the same computing power as Turing machines and even do what Turing machines can do more efficient [21, 22]. According to the different organizational structure of the system, the P system is divided into three categories: cell-like P system [23], tissue-like P system [24], and neural-like P system [25]. Among them, the cell-like P system is the first membrane model proposed by scholars, and the research of this P system is also most complete [26–28]. Its basic components include membrane structure, objects, and membrane rules. In the cell-like P system, membranes divide the whole system into different regions in which objects and rules exist; the objects are usually represented by characters or strings of symbols; the rules in each region are used to process the objects in the corresponding membrane. Objects are operated by rules in the membrane in a highly parallel mechanism [29–31], so that the system can make ensemble clustering more efficient.

In this paper, we introduce three genetic operators (selection, crossover, and mutation) of the genetic mechanism to realize the evolution of the chromosome and use the communication mechanism of cell-like P system to realize the sharing of outstanding objects between the membranes; it accelerates the convergence of the algorithm. The proposed algorithm is used to optimize the base clusterings and find the optimal chromosome as the final ensemble clustering result. In Section 2, we give basic concept of ensemble clustering and genetic algorithm and cell-like P system. Section 3 describes the improved GA-based consensus clustering algorithm. Section 4 addresses proposed algorithm. Section 5 shows the result of the experiment and finally we summarized the work in this paper and then plan the future work in Section 6.

#### 2. Preliminaries

In this section, we introduce some basic concepts of ensemble clustering, genetic algorithm, and cell-like P system.

##### 2.1. Ensemble Clustering

Ensemble clustering process is divided into two steps; first we generate a set of different base clusterings and then use consensus function to find a consensus clustering result which agrees as much as possible with existing base clusterings. In order to produce a number of diversified base clusterings, from the perspective of the algorithm, same clustering algorithm can be used with different initialization parameters or the use of different clustering algorithms. From the data set preprocessing point of view, we can choose different attributes or different sample subsets of data sets. The ensemble clustering process is shown as Figure 1.