Explainable and Reliable Machine Learning by Exploiting LargeScale and Heterogeneous Data
View this Special IssueResearch Article  Open Access
Lin Ding, Weihong Xu, Yuantao Chen, "Density Peaks Clustering by ZeroPointed Samples of Regional Group Borders", Computational Intelligence and Neuroscience, vol. 2020, Article ID 8891778, 15 pages, 2020. https://doi.org/10.1155/2020/8891778
Density Peaks Clustering by ZeroPointed Samples of Regional Group Borders
Abstract
Density peaks clustering algorithm (DPC) has attracted the attention of many scholars because of its multiple advantages, including efficiently determining cluster centers, a lower number of parameters, no iterations, and no border noise. However, DPC does not provide a reliable and specific selection method of threshold (cutoff distance) and an automatic selection strategy of cluster centers. In this paper, we propose density peaks clustering by zeropointed samples (DPCZPSs) of regional group borders. DPCZPS finds the subclusters and the cluster borders by zeropointed samples (ZPSs). And then, subclusters are merged into individuals by comparing the density of edge samples. By iteration of the merger, the suitable dc and cluster centers are ensured. Finally, we compared stateoftheart methods with our proposal in public datasets. Experiments show that our algorithm automatically determines cutoff distance and centers accurately.
1. Introduction
Clustering algorithm [1], as the unsupervised learning method, divides the objectives that also are called elements, samples, and items, into several groups according to the similarity of objectives. Compared with supervised learning [2–16], it can carry out the grouping task even though the category labels are pending. Hence, it is widely used in image segmentation [17], bioinformatics [18], pattern recognition [19], data mining [20], and other fields [21, 22]. Representative clustering algorithms cover Kmeans [23, 24] and fuzzy cmeans [25, 26] based on partitioning; AGNES [27], BIRCH [28, 29], and CURE [30, 31] based on hierarchy; DBSCAN [32] and OPTICS [33] based on density; STING [34] based on grids; and statistical clustering CMM [35] and spectral clustering [36] based on graph theory [37]. Kmeans is extremely sensitive to noise and the selection of the initial clustering centers, and the number of clusters needs to be set a priori. Similarly, fuzzy cmeans suffers from initial partition dependence, noise, and outliers. The hierarchical clustering requires to determine the number of clusters a priori, and its effect depends on the choice of distance measurement of groups. Densitybased DBSCAN, OPTICS, and gridbased clustering algorithms determine the number of clusters without artificial intervention. Still, all require preset parameters epsilon and minpts, and a mass of argument adjustments were taken to obtain optimal clustering results. These two types of algorithms generate noises around the cluster boundaries. Statisticsbased CMM needs to select one or more suitable probability models to fit a dataset.
Clustering by fast search and find of density peaks [38] was published in Science, by the preset threshold (cutoff distance, dc), manually selecting the cluster centers from the decision graph proposed by DPC. Compared with traditional clustering algorithms, it has many advantages, such as higher efficiency in finding cluster centers, fewer parameters, no iteration, no noise around the cluster border, and others. However, the algorithm still has the following defects:(1)The original DPC does not provide a reliable and specific selection method of dc. Hence, the cutoff distance is computed in different ways depending on the size of datasets, in which the inappropriate dc leads to performance degradation [39]. Moreover, the dc is generally challenging to determine since the range of each attribute is unknown in most cases [40].(2)It is hard to manually select the cluster centers from a dataset with a large number of clusters. And the artificial option for cluster centers cannot meet the system with high timeliness.
To overcome the above defects, many scholars proposed improvements in the original DPC algorithm. Xie et al. proposed a local density metric based on fuzzy weighted knearest neighbors to solve the problem of difficult to determine dc in the DPC algorithm [39]. Liu et al. proposed sharednearestneighborbased clustering by fast search and find of density peaks clustering (SNNDPC), which converts cutoff distance to the number of nearest neighbors [40]. Mehmood presented a nonparametric method for DPC via heat diffusion for estimating the probability distribution of a given dataset [41]. Guo et al. used linear regression to fit the decision values with a given dc and selected the elements above the fitting function as the central elements [42]. Ding et al. proposed an algorithm based on the generalized extreme value distribution (GEV) to fit the decision values in descending order [43]. In order to reduce the time complexity, an alternative method based on density peaks detection using Chebyshev inequality (DPCCI) was also given. Ni et al. presented the concepts of density path and density gap, as well as a new threshold called dc percentage in [44]. The density gaps are used to draw the summary graph of density gaps calculated by several dc percentages. Instead of the decision graph, the appropriate threshold value is determined by manually observing the summary graph. The algorithm is able to reduce the negative impact of inappropriate dc on the clustering result.
However, in [39–41, 44–47], it is necessary to select the centers or observe the summary graph of density gaps, with the human operation. Gu et al. [42] and Ding et al. [43] proposed the strategies of automatic center selection for the original DPC, but they depend on the given appropriate dc. However, Xie et al. [39] and Liu et al. [40] showed that it was challenging to select the proper dc.
In this paper, we propose the density peaks clustering by zeropointed samples (DPCZPSs) of regional group borders. Our method not only determines the suitable range of dc and the center of each cluster but also reduces the negative impact caused by manual participation in the clustering process. The main innovations and contributions in our algorithm are as follows:(1)To merge the local clusters into individuals, we present a cluster merging strategy based on comparing density among elements of two cluster borders.(2)In order to find the border of each cluster, we propose two conceptions: neighboring cluster border (NCB) and pure cluster border (PCB).(3)For the determination of the correct number of clusters, we provide an iterative procedure, which can converge dc to a suitable value.
The remainder of this paper comprises four sections: Section 2 describes the details of the original DPC and our proposal; Section 3 presents the clustering results on our method and related works and discusses the impact and value range of the parameter of DPCZPS; in the final section, we have a summary of the contributions and features of this paper and put forward to future work.
2. Materials and Methods
2.1. The Original DPC Algorithm
For a given dataset , where .
DPC is based on an assumption where each cluster center has a higher local density than other elements and is far from each other. Centers are manually selected using a decision graph with the local density as the abscissa and as the ordinate. DPC algorithm provides two methods for calculating the local density for each element of the given dataset and is expressed in equations (1) and (2). is calculated by equation (3):where is the Euclidean distance between elements and and is the cutoff distance. As shown in equation (3), is the minimum distance between elements and whose density is higher than . Moreover, for with the highest density, its is the maximum distance between and .
Meanwhile, to simplify the selection of centers, DPC provides the decision value as follows:
After the cluster centers are determined, each of the remaining samples is assigned to the nearest denser one. And the assignment is recorded in the process of calculating .
2.2. Our Method
The main process of DPCZPS is to select multiple distances as dc at equal intervals and calculate the corresponding decision values. Then, among the decision values of each group, the elements greater than the sum of the mean and standard deviation of the decision values are selected as the potential centers. In the range of multiple groups of dc, the iterative merging process makes the number of clusters close to the real value gradually.
2.2.1. Related Concepts
Definition 1. (zeropointed sample). in the assignment, each sample is assigned to the nearest denser one. And the zeropointed sample (ZPS) is the one without any subordinates.
When dc is fixed, we use an array that consists of n zero units to store the assignment process. And the indexes of the array represent the sequence number of objectives. Let , in which sample is the nearest and has density more significant than sample . And cluster centers and potential cluster centers are not assigned. Subsequently, the array is broken at the zero units; then, trees can be obtained, and each tree is a cluster.
Definition 2. (initial border). in a cluster tree, the initial border (IB) consists of all leaf nodes and their father nodes.
As shown in Figure 1, elements 1, 7, and 8 are zeropointed and leaf nodes because they are less dense than neighboring elements. Elements 3 and 32 are inner, but they are still the zeropointed elements since they have no adjacent samples. And there are assignment paths of items 10 ⟶ 11 ⟶ 13 and 12 ⟶ 11 ⟶ 13.
Definition 3. (neighboring cluster border). clusters in a dataset are denoted as , where is the number of clusters in and , where , satisfies the following equation, and then :where is the distance between and , is an array storing all of cluster pair and in descending order, represents the distance, DF is the depth factor of the neighboring clusterborder, its range is , and is the integer part of .
Neighboring cluster border (NCB) consists of all , and it is expressed as follows, where is to delete the symmetrical cluster pairs:It is necessary that two clusters are far from each other with an enormous DF to attain a nonblank NCB. And the bigger the required DF value of the nonblank NCB is, the further distance the two clusters are. While for neighboring subclusters, DF is relatively minute. In the fourth chapter, the DF will be compared with parameters of DPC and is discussed to show the impact on the clustering result.
As shown in Figure 1, there are two clusters A and B in a dataset, and cluster B is misclassified into B1, B2, and B3. The elements I, 7 and 8, and II, 16, 17, 18, 19, 20, and 21, are marked with red wireframes. They belong to NCB.
Definition 4. (pure cluster border). in a cluster, the pure cluster border (PCB) is defined by the following equation:Correspondingly, elements 1, 2, 4, 5, 6, 9, 10, 11, 12, 22, 23, 24, 29, 30, and 31 belong to pure cluster border (PCB) of respective clusters. However, as shown in Figure 2, elements 3 and 32 are zeropointed since they are relatively isolated, but their density is much larger than other ZPS.
To filter out interior and isolated ZPS, we use the threepoint method in fuzzy math to measure the three memberships of the elements in the , including “low density,” “medium density,” and “high density.” In order to prevent the extreme value of elements density from affecting the membership value, we select the normal distribution function as the membership function, and three functions are expressed as follows:where is the standard deviation of the density values of all elements in .
In Figure 3, when , the membership of the element is smaller acuteangle border element than a higher density. For example, element 1 is an acuteangular border element, and elements 2, 12, and 23 belong to obtuseangular border elements. When , the degrees of two memberships are equal. When , the higher the element density is, the smaller the membership degree of the element is, which is an obtuseborder element, and the higher the membership degree of the independent objective within the cluster. When, the two memberships are equal.
2.2.2. Merger Strategy
If a real cluster is mistakenly divided into several subclusters, there are some zeropointed elements in the NCB since the NCB is not only the inner part of the actual group but also the border of subclusters. Due to the aggregation of zeropointed objectives in the NCB, the density of NCB elements is smaller than other inner parts, which corresponds to in Figure 3. Meanwhile, the density of PCB is in . We propose a merging strategy based on the comparison of element density values of NCB and PCB.
If ∃ satisfies and , where and are equal to respective , then are merged; namely, if the density of the elements of the NCB is not more prominent than but more significant than , they must be the inner elements of the real cluster.
2.2.3. The Iteration Strategy
The value of each center depends on the minimum distance between the central objectives and the more significant density objectives. But when the dc is small and far from its suitable range, the algorithm does not measure the density of each sample accurately and precisely. The inexact measurement shows that, in some clusters, local center elements with more prominent local density and far from the suitable center of each group are selected, and their values are much larger than noncenter items. With the increase in dc, the density measurement capability gradually strengthens. The DPCZPS algorithm sequentially filters out fake centers with the weakest central attributes until . When dc is bigger than the most significant value of the suitable range, the clusters with smaller distribution areas will be filtered out; namely, there is not the center selected by the threshold. When dc continues to increase, in the groups with a larger distribution area, the fake centers will appear again. Essentially, the process of dc increase is a gradual transition of the density metric to measure the universal density of elements from their local density. This change process is generally shown in Figure 4.
Based on the above analysis, we propose an automatic iteration strategy as follows: Step 1: as shown in Figure 4, after counting cluster center combination and centers quantity of each dc, the algorithm determines the minrange and divides the rest into Lrange and Rrange. If the minrange is not only one, the DPCZPS chooses the biggest one to separate the dc range. Step 2: let the algorithm find the max Lnum and record its center combination as well as the sequence number of its dc. Step 3: according to the center combination and dc, the noncenter elements are assigned to the closest element among the denser elements. Step 4: execute merge() with clusters of clustering result from step 3. Step 5: if the number of clusters after merge() does not change, the clustering result and the number of clusters are stores; if the number of groups reduces to merged num(r+1) from merged num(r), the third to fifth steps are repeated with the center combination corresponding to the merged num(r+1). Step 6: the second to fifth steps are performed in the Rrange after finding the max Rnum. Step 7: the final result is the maximum value of the final number of clusters in two subranges and its clustering results stored by step 5.
2.2.4. Time Complexity Analysis
Suppose that the number of samples in a dataset is , the max centernum is , the number of pairwise points in SNB is , the max centernum in dc domain is , and the number of zeropointed samples is . Just like DPC, our method needs time complexity to calculate the distance matrix D. We search the nearest denser neighbor for each sample via a KD tree. And the complexity of building the KD tree is . Searching nearest neighbor queries has an average running time of , and hence, for n groups of dc, the complexity of searching nearest neighbor of each sample queries is . For the determination of NCB, we need a matrix M, and the rows and columns represent the samples of two clusters. In the matrix M, each cell stores the distance from matrix D, and then, all distances in the M are sort in ascending order to find the NCB by equation (5). Therefore, the time complexity of NCB depends on the assignment to M, the times of assignment of the matrix M are, the average cost is , and the total time complexity is . How many times the operation for PCB is to be done depends on the number of zeropointed samples, so the time complexity is less than . In the merger process, the density of each pairwise points is compared, and hence, the complexity of the merger depends on the number of pairwise points in SNB and is , where , and only when DF = 1, . However, the reasonable range of DF is (0, 0.05], which will be discussed in Section 3.3. Therefore, the time complexity of the merger is far less than . And iteration is based on the max centernum, and . We can conclude that the time complexity of the entire algorithm is .
3. Results and Discussion
We tested our algorithm and several related works, including PPC [44], DPC [38], DBSCAN [32], OPTICS [33], and AP [54], on several datasets. These datasets have different numbers of samples and stimulate different element distributions. The detailed information is shown in Table 1. Like DPC, AP (affinity propagation) is another advanced clustering algorithm published in Science. The basic idea of the AP algorithm is to treat all data points as potential cluster centers (called exemplar), then connect the data points in pairs to form a network (similarity matrix), and finally transmit the information (responsibility and availability) of each edge in the network to calculate the cluster center of each sample.
3.1. Evaluation Criteria, Parameters of Each Algorithm, and Code Sources and Preprocessing
3.1.1. Evaluation Criteria
For intuitive comparison, we chose the adjusted Rand index (ARI) [55] and adjusted mutual information (AMI) [55] to evaluate the clustering results.
The ARI formula is shown as follows:where E [RI] represents the expectations of RI. RI is calculated as follows:where TP indicates the true positive, TN indicates the real negative, and is the total number of sample pairs in a dataset containing n samples.
The AMI formula is shown as follows:where , , and represents the expectations of ; is expressed as follows:where , , , , and . and represent two allocation methods for a dataset containing n elements, and and are clusters. In experimental verification, let and be the original labels and the clustering results of an algorithm, respectively. The value ranges of the two evaluation criteria are , and “1” denotes the best experimental result.
3.1.2. Parameters of Each Algorithm
DF, the parameter of our proposal, was set from 0.01 to 0.05, in which 0.005 is the interval. And by an equal interval, we choose dc from all in ascending order, where is the number of samples of a given dataset. When performing DBSACN and OPTICS experiments, we took “” as the step and as the initial value to attain 100 epsilons, let the minpts be from 1 to 50, and choose the best result among five thousand clustering results. During the AP experiment, we set the initial value of the unique parameter “performance” of the AP algorithm to 1.5 times the maximum value of the similarity matrix, and each cycle is reduced by 0.03%; the optimal result is selected. The specific situation is shown in Table 2, where the DPC algorithm parameter is a suitable dc, and the PPC algorithm parameter is dc_percent. The results and arguments of DPC and PPC are obtained from [44].

3.1.3. Code Sources and Preprocessing
To ensure that the experimental comparison is valid, we processed each dataset according to the method described in [25] and normalized the lowdimensional dataset and the DIM512 dataset. For preparing the Olivetti faces dataset, we first scaled each image (originally 92 × 112) to a smaller size of 15 × 15 and then performed principal component analysis (PCA) to filter out attributes of cumulative contribution rates greater than 90%. The normalization formula is as follows:where represents the value of the data in the dataset and and represent the maximum and minimum values of the feature in the dataset , respectively.
The DBSCAN codes are all builtin functions of Matlab 2019a. The OPTICS code is from the pyclustering library, the AP code is from the sklearn library, and we provide the DPCZPS codes. We executed all methods on a personal computer with Windows 10, Intel(R) Core (TM) i78750H, 16 GB memory, and Matlab 2019a or Python 3.0.
3.2. Experimental Results and Analyses
As shown in Table 3, the performance of DPCZPS is better than other control groups. Next, we will analyze the specific iterative process of our proposal from Figures 5–9. And each of the Figures 5–8 consist of three subgraphs. The left subgraphs represent the cutoff distance and the number of cluster centers determined by the DPCZPS algorithm, and the red line marks the suitable range of dc. The middle subgraph represents the clustering results of DPCZPS, and the right subgraph represents the category labels. Figure 9 shows the clustering results of our method and the original DPC on the Olivetti face dataset.

(a)
(b)
(c)
(a)
(b)
(c)
(a)
(b)
(c)
(a)
(b)
(c)
(a)
(b)
(a)
(b)
(c)
As shown in Figure 5, our algorithm selects seven appropriate centers and successfully converges dc to the appropriate value interval through iteration. In the iterative processes, the change of centernum in the Lrange is “14877.” The number of centers remains unchanged, which means the seven clusters are relatively dependent. The final centernum of the Rrange is “4,” so the clustering result of the Lrange is selected as the final result.
In Figure 10(a), there is a minrange, and centernum is one. And in the Lrange, the process of iteration is “622,” and that of the Rrange is “211.” Therefore, the final clustering result lies in the Lrange.
In the spiral dataset, three spiral clusters are far from each other. So in Figure 6(a), in most of the dc range, there are three suitable cluster centers. There is no Rrange. And our method successfully merges all subclusters to three correct groups, which is consonant with Figure 6(c).
In the Lrange of R15, the biggest centernum is 15, and the merge does not happen, while the last centernum of the Rrange is 14. Hence, the actual clustering result is determined and is shown in Figure 7(b). The change process of D31 Lrange is from 33 to 31. The ultima center number of the Rrange is approximate to the minimum in Figure 8(a). Hence, the final cluster number is thirtyone.
The Olivetti faces dataset contains 40 (person) × 10 (photo) photos and is widely used in machine learning to test various algorithms. As shown in Table 3, the evaluation results of the DPCZPS on ARI are better than other algorithms. Figure 9 shows the clustering results of the DPCZPS and DPC. The image marked with a white dot in the upper right corner is the cluster center, and the gray photos indicate that there are less than three elements in the cluster.
In Figure 9(b), there are no centers in the , , , , , and group photos, which suggest that the traditional DPC algorithm may also incorrectly merge multiple clusters into one cluster. However, as shown in Figure 9, there are only the and group photos without centers. It demonstrates that DPCZPS is less likely to merge clusters incorrectly.
3.3. Discussion
Xie et al. [39, 40, 44] manifest that the selection rule of dc provided in [38] cannot meet various datasets. Table 2 shows that the values of dc and dc_percentage are diverse in diverse datasets, which increases the tuning cost and magnitude of difficulty, while in the six of the seven tested datasets, our argument is equal to 0.02.
The depth factor, the only parameter of the DPCZPS algorithm, is used in equation (6) to control the depth of the border between two adjacent clusters. When DF = 1, the neighboring cluster borders will contain all the elements in the two clusters. However, the edge should be composed of the elements with a shallow depth, so there are minimal parameter values in different datasets. Therefore, [0.005, 0.05] is a reasonable range for all of the tested datasets. As shown in Figure 11, most datasets severely fluctuate before DF = 0.015, which is just a small part of the whole; after that, our algorithm is not sensitive to the parameter changes. In addition, compared with the DPC and PPC algorithms, the DPCZPS algorithm does not require human intervention in the entire clustering process, which can overcome many defects caused by manual operation.
(a)
(b)
4. Conclusions
In this paper, to overcome the defects of human operation and the difficulty in determination of the suitable dc, we proposed the density peaks clustering by zeropointed samples (DPCZPSs) of regional group borders. DPCZPS is based on the indepth analyses of not only the changing rule between the dc and centers but also the relationship between the density of NCB and PCB. Our proposal covers two main parts: the merger strategy of subclusters based on the cluster borders and the iteration strategy. The merger strategy adaptively determines the threshold of merge for each pairwise local cluster. And the iterative process is to find a suitable range of dc automatically. And experimental results indicate our method is more accurate without artificial operation and has a more reasonable and less sensitive threshold value range. Additionally, we will use the natural nearest neighbors to optimize the local density measurement and assignment process.
Data Availability
All datasets in this paper are from UCI. All readers are able to access datasets from it.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (61972056, 61772454, 61402053, and 61981340416), the Natural Science Foundation of Hunan Province of China (2020JJ4623), the Scientific Research Fund of Hunan Provincial Education Department (17A007, 19C0028, and 19B005), the Changsha Science and Technology Planning (KQ1703018, KQ1706064, KQ170301801, and KQ170301804), the Junior Faculty Development Program Project of Changsha University of Science and Technology (2019QJCZ011), the “Double Firstclass” International Cooperation and Development Scientific Research Project of Changsha University of Science and Technology (2019IC34), the Practical Innovation and Entrepreneurship Ability Improvement Plan for Professional Degree Postgraduate of Changsha University of Science and Technology (SJCX202072), the Postgraduate Training Innovation Base Construction Project of Hunan Province (201924851).and the Beidou Micro Project of Hunan Provincial Education Department (XJT[2020] No.149).
References
 A. Saxena, M. Prasad, A. Gupta et al., “A review of clustering techniques and developments,” Neurocomputing, vol. 267, pp. 664–681, 2017. View at: Publisher Site  Google Scholar
 Y. Chen, J. Wang, S. Liu et al., “Multiscale fast correlation filtering tracking algorithm based on a feature fusion model,” Concurrency and Computation: Practice and Experience, p. e5533, 2019. View at: Publisher Site  Google Scholar
 Z. Liao, R. Zhang, S. He, D. Zeng, J. Wang, and H.J. Kim, “Deep learningbased data storage for low latency in data center networks,” IEEE Access, vol. 7, pp. 26411–26417, 2019. View at: Publisher Site  Google Scholar
 Y. Chen, J. Tao, Q. Zhang et al., “Saliency detection via the improved hierarchical principal component analysis method,” Wireless Communications and Mobile Computing, vol. 2020, Article ID 8822777, 12 pages, 2020. View at: Publisher Site  Google Scholar
 F. Yu, L. Liu, H. Shen et al., “Dynamic analysis, circuit design and Synchronization of a novel 6D memristive fourwing hyperchaotic system with multiple coexisting attractors,” Complexity, vol. 2020, Article ID 5904607, 17 pages, 2020. View at: Publisher Site  Google Scholar
 Y. Chen, J. Wang, X. Chen et al., “Singleimage superresolution algorithm based on structural selfsimilarity and deformation block features,” IEEE Access, vol. 7, pp. 58791–58801, 2019. View at: Publisher Site  Google Scholar
 F. Yu, L. Liu, S. Qian et al., “Chaosbased application of a novel multistable 5D memristive hyperchaotic system with coexisting multiple attractors,” Complexity, vol. 2020, Article ID 8034196, 19 pages, 2020. View at: Publisher Site  Google Scholar
 Y. Chen, W. Xu, J. Zuo, and K. Yang, “The fire recognition algorithm using dynamic feature fusion and IVSVM classifier,” Cluster Computing, vol. 22, no. S3, pp. 7665–7675, 2019. View at: Publisher Site  Google Scholar
 F. Yu, H. Shen, L. Liu et al., “CCII and FPGA realization: a multistable modified fourorder autonomous Chua’s chaotic system with coexisting multiple attractors,” Complexity, vol. 2020, Article ID 5212601, 17 pages, 2020. View at: Publisher Site  Google Scholar
 Y. Chen, J. Xiong, W. Xu, and J. Zuo, “A novel online incremental and decremental learning algorithm based on variable support vector machine,” Cluster Computing, vol. 22, no. S3, pp. 7435–7445, 2019. View at: Publisher Site  Google Scholar
 J. Zhang, Y. Wu, W. Feng, and J. Wang, “Spatially attentive visual tracking using multimodel adaptive response fusion,” IEEE Access, vol. 7, pp. 83873–83887, 2019. View at: Publisher Site  Google Scholar
 W. Li, H. Xu, H. Li et al., “Complexity and algorithms for superposed data uploading problem in networks with smart devices,” IEEE Internet of Things Journal, 2019. View at: Publisher Site  Google Scholar
 K. Gu, N. Wu, B. Yin, and W. Jia, “Secure data query framework for cloud and fog computing,” IEEE Transactions on Network and Service Management, vol. 17, no. 1, pp. 332–345, 2020. View at: Publisher Site  Google Scholar
 J. Wang, Y. Yang, T. Wang, R. S. Sherratt, and J. Zhang, “Big data service architecture: a survey,” Journal of Internet Technology, vol. 21, no. 2, pp. 393–405, 2020. View at: Google Scholar
 Y. Chen, J. Tao, L. Liu et al., “Research of improving semantic image segmentation based on a feature fusion model,” Journal of Ambient Intelligence and Humanized Computing, p. 1, 2020. View at: Publisher Site  Google Scholar
 Y. Chen, J. Wang, R. Xia, Q. Zhang, Z. Cao, and K. Yang, “The visual object tracking algorithm research based on adaptive combination kernel,” Journal of Ambient Intelligence and Humanized Computing, vol. 10, no. 12, pp. 4855–4867, 2019. View at: Publisher Site  Google Scholar
 O. O. Olugbara, E. Adetiba, S. A. Oyewole, and S. A. Oyewole, “Pixel intensity clustering algorithm for multilevel image segmentation,” Mathematical Problems in Engineering, vol. 2015, Article ID 649802, 19 pages, 2015. View at: Publisher Site  Google Scholar
 Z. Hong, H. He, J. Xu, Q. Fang, and W. Wang, “Medical image segmentation using fruit fly optimization and density peaks clustering,” Computational and Mathematical Methods in Medicine, vol. 2018, Article ID 3052852, 11 pages, 2018. View at: Publisher Site  Google Scholar
 T. VoVan, A. NguyenHai, M. V. TatHong, and T. NguyenTrang, “A new clustering algorithm and its application in assessing the quality of underground water,” Scientific Programming, vol. 2020, Article ID 6458576, 12 pages, 2020. View at: Publisher Site  Google Scholar
 C. Ju and C. Xu, “A new collaborative recommendation approach based on users clustering using artificial bee colony algorithm,” The Scientific World Journal, vol. 2013, Article ID 869658, 9 pages, 2013. View at: Publisher Site  Google Scholar
 H. Qu, L. Lei, X. Tang, and P. Wang, “A lightweight intrusion detection method based on fuzzy clustering algorithm for wireless sensor networks,” Advances in Fuzzy Systems, vol. 2018, Article ID 4071851, 12 pages, 2018. View at: Publisher Site  Google Scholar
 A. Amineh, H. Saboohi, T.Y. Wah, and T. Herawan, “A fast densitybased clustering algorithm for realtime internet of things stream,” The Scientific World Journal, vol. 2014, Article ID 926020, 11 pages, 2014. View at: Publisher Site  Google Scholar
 D. Lam and D. C. Wunsch, “Clustering,” Academic Press Library in Signal Processing, vol. 1, pp. 1115–1149, 2014. View at: Publisher Site  Google Scholar
 J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297, Oakland, CA, USA, 1967. View at: Google Scholar
 J. C. Dunn, “A fuzzy relative of the ISODATA process and its use in detecting compact wellseparated clusters,” Journal of Cybernetics, vol. 3, no. 3, pp. 32–57, 1973. View at: Publisher Site  Google Scholar
 R. Xu and D. WunschII, “Survey of clustering algorithms,” IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 645–678, 2005. View at: Publisher Site  Google Scholar
 A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering,” ACM Computing Surveys (CSUR), vol. 31, no. 3, pp. 264–323, 1999. View at: Publisher Site  Google Scholar
 T. Zhang, R. Ramakrishnan, and M. Livny, “Birch,” ACM Sigmod Record, vol. 25, no. 2, pp. 103–114, 1996. View at: Publisher Site  Google Scholar
 J. Zhong, P. W. Tse, and Y. Wei, “An intelligent and improved density and distancebased clustering approach for industrial survey data classification,” Expert Systems with Applications, vol. 68, pp. 21–28, 2017. View at: Publisher Site  Google Scholar
 S. Guha, R. Rastogi, and K. Shim, “Cure,” in Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data ACM, pp. 73–84, Seattle, WA, USA, 1998. View at: Publisher Site  Google Scholar
 S. Guha, R. Rastogi, and K. Shim, “Rock: a robust clustering algorithm for categorical attributes,” in Proceedings of the IEEE Conference on Data Engineering, pp. 512–521, Sydney, Australia, March 1999. View at: Publisher Site  Google Scholar
 M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A densitybased algorithm for discovering clusters in large spatial databases with noise,” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231, Portland, OR, USA, 1996. View at: Google Scholar
 M. Ankerst, M. M. Breunig, H. P. Kriegel, and J. Sander, “Optics: ordering points to identify the clustering structure,” in Proceedings of the ACM Sigmod Record, pp. 49–60, Philadelphia, PA, USA, 1999. View at: Google Scholar
 W. Wang, J. Yang, and R. Muntz, “Sting: a statistical information grid approach to spatial data mining,” in Proceedings of the 23rd International Conference on Very Large Data Bases, pp. 186–195, Athens, Greece, August 1997. View at: Google Scholar
 G. McLachlan and D. Peel, “Finite mixture models,” in Encyclopedia of Autism Spectrum Disorders, F. R. Volkmar, Ed., p. 1296, Springer, New York, NY, USA, 1st edition, 2013. View at: Google Scholar
 U. Von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, 2007. View at: Publisher Site  Google Scholar
 I. Anderson and R. Diestel, “Graphtheory,” The Mathematical Gazette, vol. 85, no. 502, p. 176, 2001. View at: Google Scholar
 A. Rodriguez and A. Laio, “Clustering by fast search and find of density peaks,” Science, vol. 344, no. 6191, pp. 1492–1496, 2014. View at: Publisher Site  Google Scholar
 J. Xie, H. Gao, W. Xie, X. Liu, and P. W. Grant, “Robust clustering by detecting density peaks and assigning points based on fuzzy weighted knearest neighbors,” Information Sciences, vol. 354, pp. 19–40, 2016. View at: Publisher Site  Google Scholar
 R. Liu, H. Wang, and X. Yu, “Sharednearestneighborbased clustering by fast search and find of density peaks,” Information Sciences, vol. 450, pp. 200–226, 2018. View at: Publisher Site  Google Scholar
 R. Mehmood, G. Zhang, R. Bie, H. Dawood, and H. Ahmad, “Clustering by fast search and find of density peaks via heat diffusion,” Neurocomputing, vol. 208, pp. 210–217, 2016. View at: Publisher Site  Google Scholar
 P. Guo, X. Wang, Y. Wang, Y. Chen, and Y. Zhang, “Research on automatic determining clustering centers algorithm based on linear regression analysis,” in Proceedings of the 2017 2nd International Conference on Image, Vision and Computing (ICIVC), pp. 1016–1023, Chengdu, China, June 2017. View at: Publisher Site  Google Scholar
 J. Ding, X. He, J. Yuan, and B. Jiang, “Automatic clustering based on density peak detection using generalized extreme value distribution,” Soft Computing, vol. 22, no. 9, pp. 2777–2796, 2018. View at: Publisher Site  Google Scholar
 L. Ni, W. Luo, W. Zhu, and W. Liu, “Clustering by finding prominent peaks in density space,” Engineering Applications of Artificial Intelligence, vol. 85, pp. 727–739, 2019. View at: Publisher Site  Google Scholar
 Y. Luo, J. Qin, X. Xiang, Y. Tan, and Q. Liu, “Coverless realtime image information hiding based on image block matching and dense convolutional network,” Journal of RealTime Image Processing, vol. 17, no. 1, pp. 125–135, 2020. View at: Publisher Site  Google Scholar
 Y. Tan, J. Qin, X. Xiang, W. Ma, W. Pan, and N. N. Xiong, “A robust watermarking scheme in YCbCr color space based on channel coding,” IEEE Access, vol. 7, no. 1, pp. 25026–25036, 2019. View at: Publisher Site  Google Scholar
 B. Yin, X. Wei, J. Wang, N. Xiong, and K. Ge, “An industrial dynamic skyline based similarity joins for multidimensional big data applications,” IEEE Transactions on Industrial Informatics, vol. 16, no. 4, pp. 2520–2532, 2020. View at: Publisher Site  Google Scholar
 A. Gionis, H. Mannila, and P. Tsaparas, “Clustering aggregation,” ACM Transactions on Knowledge Discovery from Data, vol. 1, no. 1, p. 4, 2007. View at: Publisher Site  Google Scholar
 L. Fu and E. Medico, “Flame, a novel fuzzy clustering method for the analysis of DNA microarray data,” BMC Bioinformatics, vol. 8, no. 1, p. 3, 2007. View at: Publisher Site  Google Scholar
 H. Chang and D.Y. Yeung, “Robust pathbased spectral clustering,” Pattern Recognition, vol. 41, no. 1, pp. 191–203, 2008. View at: Publisher Site  Google Scholar
 C. J. Veenman, M. J. T. Reinders, and E. Backer, “A maximum variance cluster algorithm,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1273–1280, 2002. View at: Publisher Site  Google Scholar
 P. Franti, O. Virmajoki, and V. Hautamaki, “Fast agglomerative clustering using a knearest neighbor graph,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 11, pp. 1875–1881, 2006. View at: Publisher Site  Google Scholar
 F. S. Samaria and A. C. Harter, “Parameterisation of a stochastic model for human face identification,” in Proceedings of the 1994 IEEE Workshop on Applications of Computer Vision, pp. 138–142, Sarasota, FL, USA, December 1994. View at: Publisher Site  Google Scholar
 B. J. Frey and D. Dueck, “Clustering by passing messages between data points,” Science, vol. 315, no. 5814, pp. 972–976, 2007. View at: Publisher Site  Google Scholar
 N. X. Vinh, J. Epps, and J. Bailey, “Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance,” Journal of Machine Learning Research, vol. 11, pp. 2837–2854, 2010. View at: Google Scholar
Copyright
Copyright © 2020 Lin Ding et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.