Abstract
Clustering is a popular data analysis and data mining technique. The means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
1. Introduction
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining and a common technique for statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. Cluster analysis was originated in anthropology by Driver and Kroeber in 1932 and introduced to psychology by Zubin in 1938 and Tryon in 1939 and famously used by Cattell beginning of 1943 [1] for trait theory classification in personality psychology. Many clustering methods have been proposed; it is divided into two main categories: hierarchical and partitional. The means clustering method [2] is one of the most commonly used partitional methods. However the results of means solving the clustering problem highly depend on the initial solution and it is easy to fall into local optimal solutions. Zhang et al. have proposed an improved means clustering algorithm called harmonic means [3]. But the accuracy of the results obtained by the method is not high.
In order to overcome this problem, many scholars began to solve the problem using metaheuristic algorithms. In 1991, Colorni et al. have presented ant colony optimization (ACO) algorithm based on the behavior of ants seeking a path between their colony and a source of food. Then Shelokar et al. and Kao and Cheng solved the clustering problem using the ACO algorithm [4, 5]. Niknam et al. have proposed an efficient hybrid evolutionary algorithm based on combining ACO and SA (simulated annealing algorithm, 1989 [6]) for clustering problem [7, 8]. Kennedy and Eberhart have proposed particle swarm optimizer (PSO) algorithm which simulates the movement of organisms in a bird flock or fish school in 1995 [9]. The algorithm also has been adopted to solve this problem by Omran et al. and Merwe and Engelbrecht [10, 11]. Kao et al. have presented a hybrid approach according to combination of the means algorithm, NelderMead simplex search, and PSO for clustering analysis [12]. Niknam et al. have presented a hybrid evolutionary algorithm based on PSO and SA to solve the clustering problem [13]. Niknam has proposed an efficient hybrid approach based on PSO, ACO, and means called PSOACOK approach for cluster analysis [14]. In 2005, the artificial bee colony (ABC) algorithm is described by Karaboga [15] and it has been adopted to solve this problem by Karaboga and Ozturk [16]. Zou et al. have proposed a cooperative artificial bee colony algorithm to solve the clustering problem and experiment on synthetic and real life datasets to evaluate the performance [17]. Voges and Pope have used an evolutionarybased rough clustering algorithm for the clustering problem [18].
Monkey algorithm (MA) is a new type of swarm intelligent algorithm. It was put forward by Ruiqing and Wansheng [19] in 2008 which is used in solving largescale, multimodal optimization problem. The method derives from the simulation of mountainclimbing processes of monkeys. It consists of three processes: climb process, watchjump process, and somersault process. In the original MA, the time consumed mainly lies in using the climb process to search local optimal solutions. The essential feature of this process is the calculation of the pseudogradient of the objective function that only requires two measurements of the objective function regardless of the dimension of the optimization problem. The purpose of the somersault process is to make monkeys find new search domains and this action primely avoids running into local search. Therefore, MA has been successfully applied to solve various optimization problems, such as the transmission network expansion planning [20], the intrusion detection technology [21], the optimal sensor placement in structural health monitoring [22], and the optimization of gas filling station project scheduling problem [23]. In view of the characteristics of the clustering problem, this paper proposed a monkey algorithm with search operator of artificial bee colony algorithm (ABCMA). The algorithm introduced the ABC search operator before the climb process to strengthen the local search ability and to improve the somersault process combined with the means method. The algorithm improves the calculation accuracy in a certain degree. The numerical experiment results show that the proposed algorithm has good performance than that of the basic monkey algorithm for solving the clustering problem.
2. The Means Clustering Algorithm
The goal of data clustering is grouping data into a number of clusters. means is one of the simplest unsupervised learning algorithms that solve the wellknown clustering problem. It was proposed by MacQueen in 1967 [24]. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume clusters) fixed a priori; each data vector is a dimensional vector, satisfying the following conditions [25, 26]:(1), ;(2), , ;(3).
The means clustering algorithm is as follows.(1)Set the number of clusters and the data set .(2)Randomly choose points as the cluster centroids from .(3)Assign each object to the group that has the closest centroid. The principle of division is as follows: if , and . The data will be divided into classified collection .(4)When all objects have been assigned, recalculate the positions of the centroids : where is the number of the points in the classified collection .(5)Repeat steps 2 and 4 until the centroids no longer move.
The main idea is to define centroids, one for each cluster. These centroids should be placed in a cunning way because a different location causes different result. So, the better choice is to place them as much as possible far away from each other. In this study, we will use Euclidian metric as a distance metric. The expression is given as follows: Finally, this algorithm aims at minimizing an objective function, in this case, a squared error function. The objective function
3. Description of Modified Monkey Algorithm
The MA is a novel kind of evolutionary algorithm which can solve a variety of difficult optimization problems featuring nonlinearity, nondifferentiability, and high dimensionality. The difference from the other algorithms is that the time consumed by the MA mainly lies in using the climb process to search local optimal solutions. So according to the characteristics of the clustering problem, a new monkey algorithm with the search operator of artificial bee colony is proposed. In this section, we mainly describe the main components of the algorithm, representation of solution, initialization, climb process, watchjump process, and improved somersault process and search operator. The details are listed as follows.
3.1. Representation of Solution
At first an integer is defined as the population size of monkeys. And then, for the monkey , its position is denoted as a vector , where is equal to the number of the cluster centroids, and each cluster centroid includes components. The position will be employed to express a solution of the optimization problem.
3.2. Initial Population
Initialization of the population will have great effect on the precision. In the original MA, the initial populations of possible solutions are generated randomly in the solution interval. However, for the clustering problem, each component of the data has different intervals. So, for monkey , we randomly choose of the samples (each sample includes components) from the data set.
3.3. Climb Process
The climb process is a stepbystep procedure to change the monkeys’ positions from the initial positions to new ones that can make an improvement in the objective function. The climb process is designed to use the idea of pseudogradientbased simultaneous perturbation stochastic approximation (SPSA) [27, 28], a kind of recursive optimization algorithm. For the monkey , its position is , , respectively. is the corresponding fitness value. The improved climb process is given as follows. (1)Randomly generate two vectors , where , respectively. The parameter (), called the step of the climb process, can be determined by specific situations. The step length plays a crucial role in the precision of the approximation of the local solution in the climb process. Usually, the smaller the parameter is, the more precise the solutions are. (2)Calculate , respectively. The vector is called the pseudogradient of the objective function at the point . (3)Set , , respectively, and let . (4)Update with provided that is feasible. Otherwise, we keep unchanged. (5)Repeat steps (1) to (4) until the maximum allowable number of iterations (called the climb number, denoted by ) has been reached.
Figure 1 shows the climb process of the monkey seeking the local optimal solution of with climb step 0.001 and climb number 1000 in 3d space. The red point represents the initial position and the green is the end.
3.4. WatchJump Process
After the climb process, each monkey arrives at its own mountaintop. And then it will take a look and determine whether there are other points around it being higher than the current one. If yes, it will jump there from the current position and then repeat the climb process until it reaches the top of the mountain. For the monkey , its position is , . The watchjump process is given as follows.(1)Randomly generate real numbers from , , respectively. Let . The parameter is called the eyesight of monkeys which can be determined by specific situations. Usually, the bigger the feasible space of optimal problem is, the bigger the value of should be taken.(2)Update with provided that both and are feasible. Otherwise, repeat step (1) until an appropriate point is found. For the clustering problem, we only replace with whose function value is smaller than or equal to .(3)Repeat the climb process by employing as an initial position.
3.5. Somersault Process Based on the Means
After repetitions of the climb process and the watchjump process, each monkey will find a locally maximal mountaintop around its initial point. In order to find a much higher mountaintop, it is natural for each monkey to somersault to a new search domain. In the original MA, the monkeys will somersault along the direction pointing to the pivot which is equal to the bar center of all monkeys’ current positions. Figure 2 shows the somersault process of the original MA [19]. The points , , , and represent monkeys. The point is the center of all monkeys, the somersault interval . For example, the monkey can reach any point (such as points , , and ) within the circle because of the somersault interval .
However, the monkey is easy to leave the solution interval for the clustering problem and all monkeys will lose the population diversity because of somersaulting along the direction pointing the pivot after many iterations. Here we choose the center of objects belonging to the cluster as the pivot to replace the center of all monkeys by the means algorithm. For the monkey , its position is ; the improved somersault process is given as follows.(1)Assign each object to the group that has the closest centroid according to the location of the monkey .(2)Randomly generate real numbers from the interval (called the somersault interval, which decides the maximum distance that monkeys can somersault).(3)Calculation the positions which are the centers of objectives belonging to centroid according to the formula (1), respectively. The positions form a vector which represents the pivot to replace the center of monkeys. Let .(4)Set , respectively.(5)Update with provided that both and are feasible. Otherwise, generate a new solution to replace .
3.6. Search Operator
The original MA mainly lies in using the climb process to search local optimal solutions. The climb step plays a crucial role in the precision of the approximation of the local solution. The smaller the climb step is, the bigger the climb number is and the higher precision the solution is; it will spend a lot of time to calculate the objective value. For example, the climb step is 0.01; the climb number should be set 100, so it needs to calculate 200 times objective function value every climb process. When we set the climb step 0.001, the climb number should be set 1000; we need to calculate 2000 times objective function value every climb process. In order to reduce the computing time, this paper introduced search operator of artificial bee colony algorithm before climb process.
The artificial bee colony optimization algorithm (ABC) is described by Karaboga based on the foraging behavior of honey bees [29]. In the ABC, the colony consists of three groups of bees: employed bees, onlookers, and scouts. Each employed bee seeks a food source according to the search operator (7) nearby its current food source then evaluates its nectar amount and determines whether to update the food source by greedy strategy. After all employed bees complete the search process, they share the position information of the food sources with the onlookers on the dance area. Each onlooker watches the dance of employed bees and chooses one of their sources with a probability depending on the nectar amounts of sources. If a food source cannot be improved through predetermined cycles, called “limit,” it is removed from the population, and the employed bee of that food source becomes scout. The search operator of employed bees is as follow: where and are randomly chosen indexes. Although is determined randomly, and it is different from , is a random number between . The experimental results show that it has a good optimization performance in optimizing complex multimodal problems [29] due to the strong local exploration ability of search operator.
In the MA, the local exploration ability of the climb process is weak and the somersault process has strong global search ability. Here we introduced the ABC search operator before the climb process to strengthen seeking the local optimal solution. For each monkey, each component is updated once adopting the ABC search operator. So each monkey will move times. The local search process before the climb process is as shown in Algorithm 1.

To sum up, the whole flowchart of ABCMA to find the optimal solution of the clustering problem is shown in Figure 3.
4. Simulation Experiment
In this section, the experiments were done using a desktop computer with a 3.01 GHz AMD Athlon(tm) II X4640 processor, 3 GB of RAM, running a minimal installation of Windows XP. The application software was Matlab 2012a.
The experimental results comparing the ABCMA clustering algorithm with six typical stochastic algorithms including the MA [19], PSO [30], CPSO [1, 17], ABC [16, 17], CABC [17], and means algorithms are provided for two artificial data sets and ten reallife data sets (Iris), Teaching Assistant Evaluation (TAE), wine, seeds, Ripley’s glass, Statlog (heart), Haberman’s survival, balance scale, Contraceptive Method Choice (CMC), and Wisconsin breast cancer which are selected from the UCI machine learning repository [31].
Artificial data set one (, , ): this is a threefeatured problem with five classes, where every feature of the classes was distributed according to Class 1Uniform (85, 100), Class 2Uniform (70, 85), Class 3Uniform (55, 70), Class 4Uniform (40, 55), and Class 5Uniform (25, 40) [12, 14]. The data set is illustrated in Figure 4.
Artificial data set two (, , ). This is a twofeatured problem with four unique classes. A total of 600 patterns were drawn from four independent bivariate normal distributions, where classes were distributed according to , , , , , and being mean vector and covariance matrix, respectively [12, 14]. The data set is illustrated in Figure 5.
Iris data (, , ): this data set with 150 random samples of flowers from the Iris species setosa, versicolor, and virginica were collected by Anderson (1935). From each species there are 50 observations for sepal length, sepal width, petal length, and petal width in cm. This data set was used by Fisher (1936) in his initiation of the lineardiscriminantfunction technique [14, 17, 31].
Teaching Assistant Evaluation (, , ): the data consist of evaluations of teaching performance over three regular semesters and two summer semesters of 151 teaching assistant (TA) assignments at the Statistics Department of the University of WisconsinMadison. The scores were divided into 3 roughly equalsized categories (“low,” “medium,” and “high”) to form the class variable [31].
Wine data (, , ): this is the wine data set, which is also taken from MCI laboratory. These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines. There are 178 instances with 13 numeric attributes in wine data set. All attributes are continuous. There is no missing attribute value [14, 17, 31].
Seeds data (, , ): this data set consists of 210 patterns belonging to three different varieties of wheat: Kama, Rosa, and Canadian. From each species there are 70 observations for area , perimeter , compactness (), length of kernel, width of kernel, asymmetry coefficient, and length of kernel groove [31].
Ripley’s glass (, , ): for which data were sampled from six different types of glass: building windows float processed (70 objects), building windows nonfloat processed (76 objects), vehicle windows float processed (17 objects), containers (13 objects), table ware (9 objects), and headlamps (29 objects) each with nine features, which are refractive index, sodium, magnesium, aluminum, silicon, potassium, calcium, barium, and iron [14, 17, 31].
Statlog (heart) data (, , ): this data set is a heart disease database similar to a database already present in the repository (heart disease databases) but in a slightly different form [31].
Haberman’s survival (, , ): the dataset contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago’s Billings Hospital on the survival of patients who had undergone surgery for breast cancer. It records two survival status patients with the age of patient at time of operation, patient’s year of operation, and number of positive axillary nodes detected [31].
Balance scale data (, , ): this data set was generated to model psychological experimental results. Each example is classified as having the balance scale tip to the right, tip to the left, or balanced. The attributes are the left weight, the left distance, the right weight, and the right distance. The correct way to find the class is the greater of (leftdistance leftweight) and (rightdistance rightweight). If they are equal, it is balanced [31].
Wisconsin breast cancer (, , ): which consists of 683 objects characterized by nine features: clump thickness, cell size uniformity, cell shape uniformity, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli, and mitoses. There are two categories in the data: malignant (444 objects) and benign (239 objects) [14, 17, 31].
Contraceptive Method Choice (, , ): this data set is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey. The samples are married women who were either not pregnant or do not know if they were at the time of interview. The problem is to predict the current Contraceptive Method Choice (no use, long term methods, or shortterm methods) of a woman based on her demographic and socioeconomic characteristics [14, 17, 31].
Here we set the parameters of ABCMA and MA as follows: the climb number of ABCMA and the climb number of MA is set 200, climb step , watchjump number , the eyesight , somersault interval , and the population size . For the PSO, inertia weight , acceleration coefficients , , and population size . The population size of the CPSO is set 20. The population size of the ABC and CABC is set at 50 and 10, respectively. In order to compare with other algorithms, the maximum generations of all algorithms are set at 100.
4.1. Algorithm Comparison
For every data set, each algorithm is applied 20 times individually with random initial solution. For the art1 and art2 data set, once the randomly generated parameters are determined, the same parameters are used to test the performance of three algorithms. The best value, the worst value, the mean value, and standard deviation are recorded in Tables 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12. The results are kept four digits after the decimal point.
The simulation results given in Tables 1–12 show that ABCMA is very precise. As seen from results, the ABCMA algorithm provides the optimum value and small standard deviation in compare to those obtained by the other methods. For Iris data set, the optimum value, the worst value, the average value, and the standard deviation of ABCMA are 96.6555, 96.6563, 96.6558, and , respectively. CABC also seeks the optimum solution 96.6555, but the standard deviation is bigger than ABCMA. While the best solutions of MA, ABC, CPSO, PSO, and means are 96.6614, 96.6566, 96.6580, 96.6556, and 97.1901, respectively. Table 4 shows the results of algorithms on the TAE dataset. The optimum value is 1490.9258 which are obtained only by ABCMA. Noticeably other algorithms fail to attain this value even once within 20 runs. The mean value of ABCMA is 1490.9456 which are smaller than that of MA, CABC, ABC, CPSO, PSO, and means. Table 5 provides the results of algorithms on the wine dataset. As seen from the results, the ABCMA algorithms are far superior to those obtained by the others. For the seeds data set, the best value, the worst value, the worst value, and the standard deviation of ABCMA are 311.7978, 311.7981, 311.7979, and . That means ABCMA converges to the global optimum value 311.79 in all of runs. The standard deviations for them are , , , 2.8999, and , respectively. From the standard deviation, we can see that the ABCMA algorithm is better than the other methods. For Ripley’s glass data set, the optimum value of ABCMA is 210.0222 which are much better than that of other algorithms. The standard deviations of ABCMA, MA, and ABC are , , and . On Statlog (heart) dataset results given in Table 8, the best value, the worst value, the worst value, and the standard deviation of ABCMA are 10622.9824, 10622.9826, 10622.9824, and , respectively. It means that the ABCMA algorithm is able to converge to the global optimum 10622.982 in all of runs, while means, PSO, and CPSO may be trapped at local optimum solutions. For the Haberman’s survival data set, the optimum value 2566.9888 can be obtained by ABCMA and ABC. But the standard deviation of ABC is which is a little smaller than that of ABCMA. The standard deviation of PSO is a little smaller than that of CPSO. Table 10 shows the results of algorithms on the balance scale dataset. As seen from the results, the best value, the worst value, and the mean value of ABCMA algorithm are much better than those obtained by the others. For Wisconsin breast cancer data set, the best value and the worst value are 2964.3870 and 2964.9883. They are just very close, so the standard deviation is very small. The globe optimal value also can be obtained by the CABC algorithm. But the standard deviation is poorer than that of ABCMA and MA. On Contraceptive Method Choice data set, the optimum value, the worst value, the average value, and the standard deviation of ABCMA are 5693.7240, 5693.7418, 5693.7264, and , respectively. The best globe solution also can be obtained by the CABC algorithm. The best value and the worst value of PSO are 5766.6412 and 6059.5781. That means PSO may fall into local optimum solutions.
From Table 1 to Table 12, we can conclude that the results obtained by ABCMA are clearly better than the other algorithms for most of data sets; CABC is a little better than ABC and CPSO is a little better than PSO; the means is the worst for most of data sets.
Figures 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, and 17 show the convergence curves of different data sets for various algorithms. As seen from the figures, the convergence rate of MA is the fastest. Figures 18, 19, 20, and 21 show the original data distribution of Iris and Haberman’s survival data sets and the clustering result by ABCMA algorithm.
4.2. Algorithm Evaluation
In the original MA, the climb step plays a crucial role in the precision of the approximation of the local solution in the climb process. For example, for wine data set, when the climb step is 0.01, the optimum value, the worst value, the average value, and the standard deviation of MA are 16302.7254, 16467.6147, 16366.5331, and 52.4132, respectively. The reason is that the climb step is too small so that sometimes the monkeys cannot arrive at their mountaintops at all in the climb process before the maximal climber number is reached. Here, we replace 0.01 with 0.1 and keep the climb number unchanged. The revised parameters, the optimum value, the worst value, the average value, and the standard deviation are 16293.9147, 16296.2676, 16295.2160, and , respectively. The results are better. For the ABCMA algorithm, the result is not affected by climb step.
In the original MA, the time consumed mainly lies in using the climb process to search local optimal solutions. When we set the climb number 200, it needs computing function values 400 times for every monkey in the climb process. Each iteration needs to calculate about 2000 times function values. For ABCMA, the computing time is determined by the number of the clusters and the dimensions of the object. For example, for the Iris data set, the number of the clusters is 3 and the dimensions of the object is 4; each iteration needs to calculate the objective values about 160 times which is far less than that of MA. For PSO and ABC, the number of function evaluations is 100 at every iteration, but the results are poor. Because of introducing the cooperative strategy, CPSO [32] and CABC [17] increased a lot of computation time compared with PSO and ABC with the same population size. For example, for Iris data set, when the population size is 100, the numbers of the function evaluations of CABC, CPSO, ABC, and PSO are about 1400, 1300, 200, and 100, respectively. However, CABC and CPSO are difficult to convergence and the result of CPSO is not good.
In order to compare the performance of the three kinds of improved algorithms, the ABCMA, CABC, and CPSO algorithms are run 20 times individually with 10000 function evaluations. The results are recorded in Table 13. As seen from the results, the results of the ABCMA algorithm are better than CABC and CPSO. The better solution and the smaller standard deviation can be obtained most of data sets.
The results of CPSO and CABC have apparent difference between the 100 iterations and 10000 function evaluations. However, the difference of ABCMA is small between the two. We can conclude the ABCMA has faster convergence speed than CABC and CPSO. The simulation results in the tables demonstrate that the proposed hybrid evolutionary algorithm converges to global optimum with a smaller standard deviation and better globe value and leads naturally to the conclusion that the ABCMA algorithm is a viable and robust technique for data clustering. Figure 22 shows The boxplots of distribution of the objective values obtained by CPSO, CABC, and ABCMA over 20 independent executions. We can see that ABCMA can obtain smaller upper bound, smaller average, and lower bound of objective values.
(a) Iris data set
(b) TAE data set
(c) Wine data set
(d) Seeds data set
(e) Glass data set
(f) Heart data set
(g) Haberman’s Survival data set
(h) Balance scale data set
(i) Cancer data set
(j) CMC data set
5. Conclusions
Monkey algorithm is a new swarm intelligence algorithm; its outstanding advantage is that it can effectively avoid falling into local optimal solutions through the somersault process. In the original MA, the precision of the problem is decided by climb step and climb number of the climb process. Because climbing number is large, a lot of running time is consumed in the climb process. In this paper, an improved MA is proposed, artificial colony algorithm search operator is introduced on the basis of the original MA; the local optimal solution can be found by the climb process combined with the artificial colony algorithm search operator, so the climb number is reduced and the running time is far less than the original MA. In view of the clustering problem, we choose the center of objects belonging to the cluster as the pivot to replace the center of all monkeys by the means algorithm in the somersault process. In this paper, 10 real instances are tested to compare with other algorithms by 100 iterations and 10000 function evaluations. The numerical experiment results show the improved MA has better results than the means method, PSO, ABC, CPSO, CABC, and MA; especially the testing results of 10000 function evaluations are better, and running time is far lower than the original algorithm. So the improved MA has a good performance than that of the basic monkey algorithm for clustering analysis.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
This work is supported by National Science Foundation of China under Grant no. 61165015, Key Project of Guangxi Science Foundation under Grant no. 2012GXNSFDA053028, Key Project of Guangxi High School Science Foundation under Grant no. 20121ZD008 and funded by Open Research Fund Program of Key Lab of Intelligent Perception and Image Understanding of Ministry of Education of China under Grant no. IPIU01201100 and the Innovation Project of Guangxi Graduate Education under Grant no. YCSZ2012063.