Abstract

We propose a combinatorial clustering algorithm of cloud model and quantum-behaved particle swarm optimization (COCQPSO) to solve the stochastic problem. The algorithm employs a novel probability model as well as a permutation-based local search method. We are setting the parameters of COCQPSO based on the design of experiment. In the comprehensive computational study, we scrutinize the performance of COCQPSO on a set of widely used benchmark instances. By benchmarking combinatorial clustering algorithm with state-of-the-art algorithms, we can show that its performance compares very favorably. The fuzzy combinatorial optimization algorithm of cloud model and quantum-behaved particle swarm optimization (FCOCQPSO) in vague sets (IVSs) is more expressive than the other fuzzy sets. Finally, numerical examples show the clustering effectiveness of COCQPSO and FCOCQPSO clustering algorithms which are extremely remarkable.

1. Introduction

Clustering is a popular data analysis method and plays an important role in data mining. Clustering is an important class of unsupervised learning techniques. How to get the best or a satisfying solution quickly has great significance for improving productivity and the development of society. So far, it has been widely applied in many fields, like web mining, pattern recognition, resources allocation, machinelearning, spatial database analysis, artificial intelligence, and so on. The existing clustering algorithms can be simply classified into the following two categories: hierarchical clustering and partitional clustering [1]. Though the -means algorithm is widely used to solve problems in many areas, is very sensitive to initialization, the better centers we choose, the better results we get. Also, it was easily trapped in local optimal [2]. Recently much work was done to overcome these problems. Traditional cluster analysis requires every point of the data set to be assigned into a cluster precisely, so we call it hard clustering. But in fact, most things exist in ambiguity in the attribute, there are no explicit boundaries among the things, and no the nature of either-or. So, the theory of the fuzzy clustering is more suitable for the nature of things, and it can more objectivly reflect the reality. To date, development of clustering has focused on data processing speed and efficiency. How appropriate the number of clusters may affect the clustering results. For example, a smaller number of clusters can help clearly identify the original data structure but key hidden information may not be obtained.

Data clustering was an exploratory or descriptive process for data analysis, where similar objects or data points were identified and grouped into a set of objects called a cluster [3, 4]. The clustering problem was defined as the problem of classifying a collection of objects into a set of natural clusters without any prior knowledge. The main goal of cluster analysis, an unsupervised classification technique was to analyze similarity of data and divide it into several cluster sets of similar data [5]. In terms of internal homogeneity and external separation, data within a cluster had high similarity, but there was less similarity between clusters. As a result, clustering is utilized to divide the data into several cluster sets for analysis and reduce data complexity. Generally, clustering has several steps: (1) proper features were selected from data as the basis for clustering; (2) a suitable clustering algorithm was selected based on data type; (3) the evaluation principle determines the clustering results, which were provided for experts to interpret [6]. Basically, clustering algorithms can be hierarchical or partitional. Among the unsupervised clustering algorithms, the hierarchical method was most typical [7, 8]. But the problem with partitional methods was necessary to set the number of clusters before clustering as in -means or -medoids [9, 10]. In addition, Gath and Geva [11] proposed an unsupervised clustering algorithm based on the combination of fuzzy C-means and fuzzy maximum likelihood estimation. Lorette et al. [12] proposed an algorithm based on fuzzy clustering to dynamically determine the number of clusters in a data set. Furthermore, SYNERACT [13], an alternative approach to ISODATA [14] combining -means with hierarchical descending approaches, did not require specifying the number of clusters. Based on -means, other unsupervised clustering algorithms have also been developed, such as -means and -means [15, 16]. Omran et al. proposed a new dynamic clustering approach based on binary-PSO (DCPSO) algorithm. Paterlinia and Krink (2006) applied PSO for partitional clustering algorithms partition the data set into a specified number of clusters.These algorithms try to minimize certain criteria and can therefore be treated as optimization problems. In addition, Ouadfel et al. (2010) presented a modified PSO algorithm for automatic image clustering algorithm.Where each cluster groups together similar patterns. According to conducted experiments, the proposed approach outperforms -means, FCM and KHM algorithm [17, 18]. Such an algorithm can directly generate a suitable number of clusters. In addition, it is critical to have a suitable measure for the effectiveness of a clustering method. Though some measures have been proposed in the area of classification, like Wallace’s information measure, few have been proposed for clustering algorithms [19]. The most frequently applied measures were cohesion, within-cluster variance, separation, and between cluster variance [20]. The DBSCAN clustering algorithm can also be used in four applications that were using 2D points, 3D points, 5D points and 2D polygons of real world problems [21]. Those clustering algorithms have been applied in the clustering analysis for many years, but those algorithms are also unilateral.

Now, many improved velocity update rules have been developed, including the inertia weight () method [22], constriction factor method [23], guaranteed convergence method [24], improvement social model [25], global-local best value based PSO method, and GLbest-PSO method [26]. In addition, Montes de Oca et al. [27] proposed a novel PSO algorithm combining a number of algorithmic components that showed distinct advantages for optimization speed and reliability in their experimental study.

In addition to these methods, Kennedy and Eberhart [28] proposed the binary-PSO (BPSO), which is applied to optimization problems requiring discrete or binary variables for solutions. A novel optimization algorithm, based on a modified binary-PSO with mutation (MBPSOM) combined with a support vectormachine (SVM), was proposed to select the fault feature variables for fault diagnosis [29]. Chuang et al. [30] presented improved BPSO for feature selection using gene expression data. In addition, Xie et al. [31] suggested the distributive-PSO (DPSO) algorithm. The DPSO algorithm was used to solve problems in which it was easy for the PSO to fall into a local value from which it cannot escape. In particle searching, the DPSO algorithm was utilized to randomly select particles for mutation. Liu et al. [32] proposed PSO with mutation based on similarity, in which similarity and collectivity are introduced. Most recently, Nani and Lumini [33] used the Nearest Neighbor approach for prototype reduction using PSO and they found the Nearest Neighbor approach to be good. In order to achieve a dynamic clustering, where the optimum number of clusters was also determined within the process, the so-called multidimensional PSO (MDPSO) method extended the native structure of PSO particles in such a way that they can make interdimensional pass with a dedicated dimensional PSO process [34]. In addition, Ouadfel et al. [35] presented an automatic clustering using a modified PSO (ACMPSO) algorithm. Also, Martinez et al. [36] proposed a swarm intelligence feature selection algorithm based on the initialization and update of only a subset of particles in the swarm. When compared to many of the other population-based approaches such as other algorithmssuch as GA, ACO, and DE, the convergence rate of the population was much slower for PSO algorithm [37]. Meanwhile,the PSO algorithms was inspired by social behavior among individuals, for instance, bird flocks. Intelligence particles representing a potential problem solution move through the search space.

2. Combinatorial Optimization Algorithm of Cloud Model and Quantum-Behaved Particle Swarm Optimization

2.1. Quantum-Behaved Particle Swarm Optimization

Genetic clustering algorithm is randomized search and optimization techniques guided by the principles of evolution and natural genetics, having a large amount of implicit parallelism. When the clustering algorithm is terminated in the limited time, one is used to find the methodology exhibiting rather poor search precision. Therefore, this unfavorable characteristic may to some extent indicate that the global optimality of the solution given by GA is dubious. Ant Colony clustering algorithm requires a longer period of time to run. At this stage, ants consider only the similarity without considering the density to pick up and down the object data, as results, data objects will be orderly dispersed. Because the ant colony clustering algorithm is a stochastic algorithm. Class of the results of cluster center reselection has the potential to separate from each other class of cluster center. Particle swarm optimization clustering algorithm is a newly rising evolutionary computation technique based on swarm intelligence, pso possesses the better convergent speed and computational precision compared with the traditional algorithms such as GA, PSO, ACO, and DE, it can effectively search out the global optimal solution in the space of solution [3840]. Let us denote as the swarm size and define as the dimensionality of the search space. Particle swarm optimization is a global optimization algorithm. At the same time, the convergence rate is accelerated by the modified PSO algorithm. From the analysis of quantum physics point of view, the particle state can be described the energy and momentum, measurement results of concrete can be used wave function . The wave function model of the square is a point particle in the space of probability. Therefore, in the QPSO algorithm, each particle is not expressed by the speed and position but as a quantum state. Particle probability distribution at a location space can be obtained by the probability density function of wave function . Once we determine the probability density function of particle position, we can obtain the probability distribution. The particle position update equation under the guidance of Monte Carlo method is expressed as where is random numbers uniformly distributed at the interval . The QPSO algorithm employs local version constriction factor method and global version inertia weight method simultaneously to achieve relatively high performance and is local version constriction factor. It is a random point between and . The formula of is calculated as The analysis results are as follows: The method introduced the concept of average optimal position . Variables are expressed in the current moment group at all the best location history as follows: So, the particle position updates formula transformed can be expressed as Parameter which is set to linear gradient from 1.0 to 0.5 can generally get good results.

Values for the uniform distribution random number between , get basic expressions of QPSO algorithm as follows:

Necessary and sufficient conditions for a point not algorithm global convergence of the PSO, but it can make the algorithm have better convergence speed and accuracy, the reasons not only involve the analysis of the convergence of the algorithm, but also research on search mechanism and complexity more involved algorithm. How to control search mechanism of the PSO by these design means the controlling of . The first algorithm method is evaluated in detail by the following formula: or The evolution equation of particle deduction into is expressed as follows: The second kind of method is introduced in the algorithm of the average best position as follows: The evolution equation of particle deduction into can be expressed as The process and the concrete steps of the QPSO algorithm are described as follows:

Step 1. Set , initializing the particle swarm in the position of each particle and individual best position .

Step 2. Calculate the average best position of particle swarm.

Step 3. For each particle group , Steps 4 to 7.

Step 4. Calculate the fitness of particle in current position , according to equation individual best position update particle. If , . If not, .

Step 5. For the particle, the adaptation values and the global best position values are compared. If , . If not, .

Step 6. According to each dimension of the particles, calculate equation at random point position.

Step 7. Calculate the new position calculating particle.

Step 8. If the end condition is not satisfied, .

Return to Step 2; otherwise, the algorithm terminates.

2.2. Cloud Model

Cloud model which is proposed based on the traditional fuzzy mathematics and probability statistics theory is a conversion model between qualitative concept and quantitative values. The cloud model which was proposed by Li et al. (1998), is novel uncertainty reasoning technology. Cloud model is an effective tool in uncertain transforming between qualitative concepts and their quantitative expressions. There are various implementation approaches of the cloud model, resulting in different kinds of cloud models. The normal cloud model is the most commonly used model, which is based on the normal distribution and the Gaussian membership function. It can be described as follows. They integrate fuzziness and randomness using digital characteristics, such as expected value , entropy and hyper entropy He.

In the domain of cloud droplets, is the most representative of the qualitative concept, the expectation is that it is the domain of the center value. Entropy is the qualitative concept of randomness and fuzziness, is jointly determined, capable of measuring particle size is represents a qualitative concept. is a measure of the randomness of qualitative concept and reflects the discrete degree of the cloud droplet. The excess entropy is the entropy of the uncertainty of measurement. The digital characteristics of the cloud are shown in Figure 1.

The hybrid sets which contain fuzziness and probabilities are defined over continuous universe of discourse. Domain (), all elements are in total contribution to the concept of and expression formula can be expressed as Domain (, ), all elements in total contribution to the concept of The contribution of different areas of the cloud droplet swarm on the qualitative concept is different in Figure 2.

, , and obey normal distribution. Probability density functions of . Consider, It can be inferred that We can calculate the digital characteristic entropy estimate values . At the same time, we can calculate the digital characteristic. The excess entropy estimate values .

Cloud model is also used in two-order Gaussian distribution to produce cloud drops whose distribution displays high kurtosis and fat tail with power-law decay. In sociology and economics, many phenomena have been found to share high kurtosis and fat tail because of preferential attachment in evolution processes. This paper tries to investigate the relation between Gaussian distribution and fat-tail distribution, and to construct fat-tail distribution based on Gaussian distribution with iterations to depict more uncertain phenomena. If the Gaussian distribution as 1 order Gaussian distribution, and then order 1 probability density function is expressed as a random variable. Consider,

For three of the higher order Gaussian iteration with simple parameters, we study the changing trend of its mathematical properties and make lots of experimental analysis.

When (), and ; then When () and ; then When (), then ,

In short, the combinatorial clustering algorithms based on the Cloud, transform a qualitative question, the effectiveness evaluation of the combinatorial clustering algorithms into a quantitative question, and consider the fuzzy factors and random factors effects appeared in the evaluating cause. In the cloud model, determines the steepness of the normal cloud, the greater the , the wider the cloud covered the level range, otherwise the more narrow. According to “” rule of normal cloud, take in the algorithm, which affect the degree of discrete cloud droplets.

2.3. Combinatorial Optimization Clustering Algorithm

Combination with the basic principle of particle swarm optimization, a rapid evolutionary algorithm is proposed based on the characteristics of the cloud model on the process of transforming a qualitative concept to a set of quantitative numerical values, namely cloud hyper mutation particle swarm optimization algorithm. Its core idea is to achieve the evolution of the learning process and the mutation operation by the normal cloud particle operator. The COCQPSO can be modeled naturally and uniformly, which makes it easy and natural to control the scale of the searching space. The simulation results show that the proposed algorithm has fine capability of finding global optimum, especially for multimodal functions.

Choose global extreme value Gbest in mutation as , because at this point it may have been trapped in local optimal algorithm. And according to the principle of sociology, the current outstanding individuals usually also exist better individuals. There are two positions are used in the particle for updating the velocity; one is the global experience position of all particles, which memorizes the global best solution obtained through all particles; the other is each particle’s local optimal point, which memorizes the best position that particle has ever moved to. Namely the local optimal point in its surroundings more chance to find the optimal solution. Parameter is set too large and the variation is too frequent, which influence the operational efficiency of algorithm. Set is too small, the precision will be reduced. And because of the particle swarm algorithm in the early evolutionary convergence speed, convergence speed gradually slows down late, so it is difficult to give parameters set a perfectly reasonable fixed values. In this paper, we make the , the value along with the global optimal value Gbest dynamic decreases, and realized adaptive adjustment.

For threshold selection of , Sphere function, for example, to verify the influence of different values of CHPSO algorithm precision. The experiment parameter settings are as follows: population sizes are 100, scope of the initial value for , the largest iterative algebra are 1, 000, respectively, for, 10, 30, and 50 dimension 5 and 100 Sphere function, in the case of take 2, 5, 10, and 20 independent running 50 times averaging, measure parameter . The experimental results are shown in Table 1.

We can see from Table 1, the smaller is the threshold of dimensionality than 10 low dimensional function, the higher is the accuracy of the solution. For functions test, the acceptable result proportion of the value is shown in Table 2.

Test functions dimension, the range of initial value, the acceptable values of the experimental results of the COCQPSO algorithm, the Cloud genetic algorithm (CGA), the adaptive particle swarm optimization (APSO), Gaussian-Dynamic PSO, GDPSO), are shown in Table 3. All experimental population size is 20, the number of iterations is 1, 000, 100 times and each function operate independently and then we compare the optimal value, average value, and the success rate. To fully embody the effect of mutation operator, COCQPSO. Dynamically adjusted inertia weight is not in, take a fixed value 0.725. Accelerate the constants and , threshold value .

Use the knowledge of probability theory to prove the convergence of the COCQPSO algorithm. The combinatorial algorithm could be adaptive control under the guidance of the scope of the search space, and they are best under the conditions of the larger search space to avoid the local optimal solution. A typical functions of comparative experiment results show that the algorithm can avoid trapping in local optimal solution, and enhance the ability of global optimization at the same time to be able to more quickly converge to the global optimal solution. At the same time, the quality and efficiency of optimization is better than the other algorithms. The simulation results show that the complex constrained optimization problem, optimization algorithm, and excellent performance, especially for the super high-dimensional constraint optimization problem, the combinatorial algorithm obtained the higher accuracy of the solution.

3. Simulation Experiment Results

3.1. The Comparison Results of the Algorithms

PSO is inspired by social behavior among individuals, for instance, bird flocks. Particles (individuals) representing a potential problem solution move through the search space. The COCQPSO and FCOCQPSO algorithms focus on the collective behaviors that result from the local interactions of the individuals and interactions with their environment. The algorithms experimental results prove that the hybridization strategies are effective. The sequence algorithms with the enlarged pheromone table are superior to the other algorithms because the enlarged pheromone table diversifies the generation of new solutions of the COCQPSO and FCOCQPSO algorithms, which prevent traps into the local optimum.

To compare the performance of the COCQPSO and FCOCQPSO algorithms with those of other clustering algorithms such as Genetic algorithm (GA) and ant colony optimization (ACO), we have made lots of clustering experiments such as four different types of artificial data sets to get the average and standard deviation. Four different types of artificial data sets have been randomly generated from a multivariate uniform distribution. Data set 1: the wine dataset. This dataset contains chemical analyses of 178 wines derived from three different cultivars. Data set 2: the iris dataset. This dataset contains three categories of 50 objects. Data set 3: the glass identification dataset. This dataset contains 214 objects with nine attributes. Data set 4: the CMC dataset. Clustering experimental results of comparison of error rates for the algorithms is given in Table 4.

From the results given in Table 4, we can see that none of the previously proposed algorithms outperform the COCQPSO and FCOCQPSO algorithms in terms of the average and best objective function values for the four datasets used here. The proposed algorithms of this study are effective, robust, easy to tune, and tolerably efficient as compared with other algorithms.

The COCQPSO and FCOCQPSO algorithms have been tested in comparison with the GA and PSO for artificial and real world data. Finally, good experimental results show the feasibility of the proposed algorithms. In all our experiments, both the data is randomly or heuristically initialized. For complex problems, it is advised to choose the FCOCQPSO algorithm for the initialization, in order to reach the best solutions. Clustering results of the FCOCQPSO algorithm are shown in Figure 3.

To sum up, simulation experiment results show a superiority of the FCOCQPSO algorithms over other counterpart algorithms in terms of accuracy. In the COCQPSO and FCOCQPSO algorithms, the particle’s search is guided by the position which may be not the global position but may lie in a promising search region so that the particles have a big chance to search this region and find out the global optimal solution. As a result, the COCQPSO and FCOCQPSO algorithms may have stronger global search ability and better overall performance than the other algorithms such as GA, PSO, ACO, and DE, particularly for the hard optimization problems. The experimental results show that the combinatorial clustering algorithms improve the precision and the result is stable.

3.2. Experimental Results

Fuzzy particle swarm optimization clustering algorithm is a novel method for solving real problems by using both the fuzzy rules and the characteristics of particle swarm optimization. In this paper, we successfully solve multiattribute groups’ allocation problems by fuzzy combinatorial optimization clustering algorithm of cloud model and quantum-behaved particle swarm optimization.

Hype Entropy reflects the dispersion of the Cloud drops. The bigger the Hyper Entropy is, the bigger of its dispersion and the randomness of degree of membership, and so is the thickness of Cloud. Two figures contrast can be seen, big super entropy diagram cloud droplets of discrete degree are big, and the entropy of small cloud droplets is relatively concentrated. The clustering algorithm result of the cloud droplets experiments phenomenon is shown in Figure 4.

Fuzzy clustering algorithm of data to some extent, overcomes the inner shape: the distribution of the dependent on, and can correctly in different shape distribute data clustering, compute speed fast, and overcome the sensitivity to noise data and outliers, enhancing the algorithm robustness. According to fuzzy clustering, algorithm is sensitive to initial value, the shortcomings of easy to fall into local optimum, this paper proposes a method of fuzzy clustering based on particle swarm optimization. The design of fitness function according to the clustering criterion, using particle swarm optimization algorithm to optimize clustering center, behind the simulation experiment proves the feasibility and effectiveness of the algorithm.

According to different thresholds of multiproject, multiattribute groups, we get the corresponding dynamic cluster result by algorithm of cloud model and quantum-behaved particle swarm optimization as follows in Figure 5.

According to different thresholds of multiproject, multiattribute groups, we get the corresponding dynamic cluster result by combinatorial optimization algorithm of cloud model and quantum-behaved particle swarm optimization clustering algorithm as follows in Figure 6.

According to different thresholds of multiproject, multiattribute groups, we get the corresponding 200-dynamic combinatorial cluster result as follows in Figure 7.

According to different thresholds of multiproject, multiattribute groups, we get the corresponding dynamic combinatorial cluster result by 200-quantum-behaved particle swarm optimization and cloud model clustering algorithm as follows in Figure 8.

According to different thresholds of multiproject, multiattribute groups we get the corresponding 1000-dynamic combinatorial cluster result as follows in Figure 9.

According to different thresholds of multiproject, multiattribute groups, we get the corresponding dynamic cluster result by 1000-quantum-behaved particle swarm optimization and cloud model combinatorial clustering algorithm as follows in Figure 10.

According to different thresholds of multiproject, multiattribute groups, we get the corresponding 10000-dynamic combinatorial cluster result by quantum-behaved particle swarm optimization clustering algorithm as follows in Figure 11.

According to different thresholds of multiproject, multiattribute groups, we get the corresponding dynamic cluster result by 10000-quantum-behaved particle swarm optimization and cloud model combinatorial clustering algorithm as follows in Figure 12.

4. Conclusion

According to these studies, there are many different kinds of clustering algorithms which can be applied. In addition, it is critical to have a suitable algorithm for the effectiveness of a clustering method. In this paper, we have developed the COCQPSO and FCOCQPSO algorithms. The results from various simulations using artificial data sets show that the proposed combinatorial clustering algorithms have better performance than that of the other clustering algorithms such as GA, PSO, and ACO. We demonstrated through several experiments that much better clustering results can be got by interval-valued combinatorial clustering algorithm than the corresponding dynamic clustering algorithm.

These experiments show that the combinatorial clustering algorithm based on similarity is better than conventional clustering method in terms of calculating complexity and clustering effect. In this work, we have given the full description of implementations and details of the combinatorial algorithms to improve its performance. We have tested the proposed algorithms in different synthetic and real clustering problem, obtaining very good results that improve classical approaches. So, it is very important for us to research on combinatorial clustering algorithms. The combinatorial clustering algorithms are applied in several fields such as statistics, pattern recognition, machine learning, and data mining, to partition a given set of data or objects into clusters. It is also applied in a large variety of applications, for example, image segmentation, objects and character recognition, and document retrieval. The combinatorial clustering algorithms problems are the kind of important optimization problems in our real life which are difficult to find a satisfying solution within a limited time. In addition to the extension of the application domain, our future studies can investigate the combinatorial clustering algorithms to improve their solution quality.

Acknowledgments

This research is partially supported by the National Natural Science Foundation of China (Grant no. 70372039, Grant no. 70671037, and Grant no. 70971036) and the Doctoral Program Foundation of Institutions of Higher Education of China (Grant no. 20050532005).