Mathematical Problems in Engineering

Volume 2018 (2018), Article ID 8724084, 8 pages

https://doi.org/10.1155/2018/8724084

## Research on Clustering Method of Improved Glowworm Algorithm Based on Good-Point Set

^{1}School of Management, Hefei University of Technology, Hefei, Anhui 230009, China^{2}Key Laboratory of Process Optimization and Intelligent Decision-Making, Ministry of Education, Hefei, Anhui 230009, China^{3}Anhui Economic Management Institute, Hefei, Anhui 230059, China

Correspondence should be addressed to Zhiwei Ni

Received 26 October 2017; Accepted 8 January 2018; Published 5 March 2018

Academic Editor: Mohammed Nouari

Copyright © 2018 Yaping Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

As an important data analysis method in data mining, clustering analysis has been researched extensively and in depth. Aiming at the limitation of -means clustering algorithm that it is sensitive to the distribution of initial clustering center, Glowworm Swarm Optimization (GSO) Algorithm is introduced to solve clustering problems. Firstly, this paper introduces the basic ideas of GSO algorithm, -means algorithm, and good-point set and analyzes the feasibility of combining them for clustering optimization. Next, it designs a clustering method of improved GSO algorithm based on good-point set which combines GSO algorithm and classical -means algorithm together, searches data object space, and provides initial clustering centers for -means algorithm by means of improved GSO algorithm and thus obtains better clustering results. Major improvement of GSO algorithm is to optimize the initial distribution of glowworm swarm by introducing the theory and method of good-point set. Finally, the new clustering algorithm is applied to UCI data sets of different categories and numbers for clustering test. The advantages of the improved clustering algorithm in terms of sum of squared errors (SSE), clustering accuracy, and robustness are explained through comparison and analysis.

#### 1. Introduction

As an unsupervised data analysis method, clustering analysis is widely applied in such fields as data mining, pattern recognition, machine learning, and artificial intelligence [1]. Different from classification, clustering algorithm realizes categorization by gathering data objects through certain similarity metric and clustering criterion without any prior knowledge. As a branch of statistics, clustering analysis has been studied extensively. Clustering method can be mainly classified into division method, hierarchy method, and density-based method. The -means algorithm proposed by James Macqueen is a typical clustering algorithm based on division [2]. However, the clustering result of -means algorithm is greatly affected by initial clustering center point and is very sensitive to outliers. Literature [3] optimizes the -means algorithm by integrating the coding, crossing, and aberrance thoughts of genetic algorithm (GA) with the local optimizing ability of -means clustering algorithm and proposes the -means clustering algorithm based on GA. Hierarchy-based clustering methods mainly include CURE algorithm [4] and Chameleon algorithm [5], of which one cluster is represented by multiple points in CURE algorithm, making the processing of nonspherical data sets better. Representative algorithms of density-based clustering methods include DBSCAN algorithm [6], which is able to effectively identify class cluster of any shape, but is very sensitive to the setting of artificial parameters (e.g., radius). Rodriguez and Laio put forward a new density-based density peaks clustering (DPC) algorithm [7] in 2014. In this algorithm, density peaks (i.e., clustering centers) are selected manually through “decision diagram” first, and then, residual data points are allocated to each clustering center on this basis to obtain corresponding clustering result. It is noteworthy that, in recent years, some scholars have started introducing the heuristic swarm optimization algorithm into clustering analysis of different fields and improving clustering effect by virtue of the global searching ability of swarm optimization algorithm. A clustering analysis method combining PSO and -means is proposed in literature [8] through the global searching ability of particle swarm algorithm. In addition, Cuckoo algorithm, artificial bee colony algorithm, artificial fish swarm algorithm, and so forth [9–12] are also started to be introduced in the research of clustering algorithm.

The GSO algorithm [13] proposed by Krishnanand and Ghose is a new swarm intelligence optimization algorithm, which is more efficient in solving multimodal problems compared with traditional swarm intelligence optimization algorithms [14]. Aljarah and Ludwig put forward a new clustering based GSO algorithm in 2013. In this algorithm, the GSO algorithm is adjusted to solve the data clustering problem to locate multiple optimal centroids [15]. An new approach for cluster analysis based on GSO algorithm and -means has been proposed by Onan and Korukoglu [16]. Due to the multimodal nature of multimedia data, Pushpalatha and Ananthanarayana proposed the GSO algorithm based Multimedia Document Clustering (GSOMDC) algorithm to group the multimedia documents into topics in 2015 [17]. A fuzzy clustering algorithm based on GSO algorithm (GSO-KFCM) is proposed by Cheng and Bao in 2017. In this algorithm, the GSO algorithm obtains the optimal solution as the initial clustering center of the kernelized fuzzy mean clustering algorithm [18].

This paper introduces GSO algorithm into clustering analysis, regards each glowworm as a feasible solution in clustering center of data object space, searches data object space through the optimization process of glowworm, and solves clustering center by obtaining multiple extreme points. In this way, it combines GSO algorithm and -means algorithm together, provides initial clustering centers for -means algorithm by means of GSO algorithm, solves the problem that -means algorithm is sensitive to initial clustering centers, and thus obtains better clustering effects. Meanwhile, considering the effect of the initial distribution of glowworm swarm on clustering center search, this paper optimizes the initial glowworm swarm distribution in GSO algorithm by introducing the theory and method of good-point set [19, 20], which improves the global searching performance of GSO algorithm. The research in this paper mainly includes 3 parts. Section 2 gives explanations on relevant algorithms and theories, which puts forward the optimization idea for clustering analysis-oriented GSO algorithm. Section 3 introduces improved GSO algorithm based on good-point set, combines improved GSO algorithm with -means algorithm together, and designs the algorithm framework and implementation steps for new clustering method (GSOK_GP algorithm). Section 4 selects UCI data sets of different categories and numbers for clustering experiment and analysis for the GSOK_GP algorithm designed in this paper.

#### 2. Description of Relevant Algorithms

##### 2.1. -Means Clustering Algorithm

###### 2.1.1. Basic Ideas of -Means Clustering Algorithm

Basic ideas of -means clustering algorithm: select data points at random in the data objects to be clustered to act as initial clustering center points, and allocate other data points to corresponding clustering center points based on their similarity with such initial clustering center points. After one round of allocation, recalculate the clustering centers of each category based on the clustering result of the round, and then, allocate residual data points to obtain the clustering result of the new round. Repeat this process for given times or until the convergence of data center points.

###### 2.1.2. Steps of -Means Clustering Algorithm

*(**1) Problem Description. * represents a given data object, where represents data vector point. Divide into several disjoint clusters , where .

*(**2) Related Definitions*

*Definition 1. *Euclidean distance between data points

*Definition 2. *SSE of clustering results where is the cluster center of . SSE is taken as an important indicator for evaluating clustering result in general.

*(**3) Implementation Steps of **-Means Algorithm*

*Step 1. *Randomly select samples as initial clustering centers.

*Step 2. *Allocate other data points in data object to existing clustering center as per given principles (e.g., shortest Euclidean distance).

*Step 3. *Recalculate clustering center and , as per clustering result, where is the data point allocated to clustering center point .

*Step 4. *If , that is, the new clustering center is different from the original one, turn to Step 2 for iteration again, until the convergence of clustering center points or reaching maximum iterations.

It can be learnt from the steps above that initial clustering centers have significant effect on the clustering result and operating efficiency of -means clustering algorithm and may lead to premature local optimum of -means clustering algorithm, which causes clustering results with large difference in turn.

##### 2.2. Main Ideas and Steps of GSO Algorithm

In GSO algorithm, each glowworm is deemed as a feasible solution of target problem in space. Glowworms gather towards high brightness glowworm through mutual attraction and location movement, and multiple extreme points are found out in the solution space of a target problem. In this way, the problem is solved. Its main ideas can be described as follows.

*Step 1. *Initialize glowworm swarm . Glowworm number in swarm, step , fluorescein initial value , fluorescein volatilization rate , domain change rate , decision domain initial value , domain threshold , and other parameters related need to be initialized and assigned in the initialization.

*Step 2. *Calculate glowworm fitness based on objective function. Calculate the fitness of each glowworm at its location based on specific objective function .

*Step 3. *Calculate the moving direction and step of glowworm. Each glowworm searches for glowworms with higher fluorescein value within its own decision radius , and determine the next moving direction and step based on fluorescein value and distance.

*Step 4. *Update glowworm locations. Update the location of each glowworm based on determined moving direction and step.

*Step 5. *Update the decision domain radius of glowworm.

*Step 6. *Judge whether the algorithm has converged or reached the maximum iterations (itmax) and determine whether to enter the next round of iteration.

It can be learnt from the steps above that algorithm execution efficiency can be improved and premature local optimum of algorithm can be avoided by optimizing the initial distribution of glowworm swarm.

##### 2.3. Basic Theory of Good-Point Set

Basic definition and structure of good-point set are as follows:

(1) Assume is a unit cube in -dimensional Euclidean space, which is expressed as

(2) Assume is a point set with the number of in , which is expressed as

(3) Assume is a given point in and is the number of points not satisfying the inequality below in point set .

, where , and is known as the deviation of point set .

(4) Assume is the deviation of and meets the requirements below:

, where is a constant related to and , .

is known as a good-point set and a good point.

It has been proved by applicable theorems that, with respect to approximate integration, the order of deviation is only relevant to and irrelevant to the space dimensions of the sample. Therefore, good-point set can provide better support for the calculation in high-dimensional spaces [20]. Meanwhile, as for a point set object whose distribution is unknown, the deviation of points obtained by virtue of good-point set is significantly superior to points obtained by random method. Therefore, a better initial distribution scheme can be provided for the swarm distribution in swarm intelligence algorithm based on this feature of good-point set.

#### 3. Design of GSOK_GP Algorithm

This paper proposes an improved GSO algorithm based on good-point set to solve clustering problems on the basis of analysis of relevant algorithms above and characteristics of clustering problems. Its main ideas can be described as firstly, optimize the initial distribution of glowworm swarm through good-point set, so as to optimize GSO algorithm. Secondly, optimize the initial clustering centers in clustering data objects, and obtain characteristics of multiple extreme points and a clustering center point set with optimized GSO algorithm. Thirdly, select extreme points as the initial clustering center of -means algorithm in the clustering center point set as per maximum distance principle. Fourthly, execute the -means algorithm with initial clustering center to figure out the clustering result. The algorithm framework is shown as Figure 1. Where means the iterations are no greater than maximum iterations, means the number of extreme points is greater than the number* k* of initial clustering centers required.