Mathematical Problems in Engineering

Volume 2018 (2018), Article ID 3742048, 16 pages

https://doi.org/10.1155/2018/3742048

## A -Deviation Density Based Clustering Algorithm

^{1}College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China^{2}College of Electronic Information, Zhejiang Wanli University, Ningbo 315100, China

Correspondence should be addressed to Yang Dongyong

Received 2 October 2017; Revised 29 December 2017; Accepted 17 January 2018; Published 26 February 2018

Academic Editor: Erik Cuevas

Copyright © 2018 Chen Jungan et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Due to the adoption of global parameters, DBSCAN fails to identify clusters with different and varied densities. To solve the problem, this paper extends DBSCAN by exploiting a new density definition and proposes a novel algorithm called -deviation density based DBSCAN (kDDBSCAN). Various datasets containing clusters with arbitrary shapes and different or varied densities are used to demonstrate the performance and investigate the feasibility and practicality of kDDBSCAN. The results show that kDDBSCAN performs better than DBSCAN.

#### 1. Introduction

DBSCAN is a classical density based clustering method [1] and has many desirable features including good robustness to noise and outliers. However, due to the adoption of global parameters, especially the introduction of neighborhood radius Eps, DBSCAN fails to identify clusters with different and varied densities. To solve this problem, two main methods have been proposed as follows.

*(1) Adaptive Local Density or Eps. *GRIDBSCAN [2] and GMDBSCAN [3] use the grid technique to calculate the local density (Eps, MinPts), where MinPts is defined as the minimum neighbors of a point when considering the point as the core point. APSCAN [4] uses the Affinity Propagation (AP) algorithm to partition a dataset into some patches and calculate the local density of each patch. VDBSCAN [5] uses a -dist plot to select several Eps values for different densities. Multi-DBSCAN [6] uses the must-link constraint and -nearest distance to calculate Eps values for different densities. DBSCAN-DLP [7] partitions a dataset into many subsets with different density levels by analyzing the statistical characteristics of its density variation and then estimates the Eps value for each subset. DSets-DBSCAN regards the data in the dominant set as core points and those from extrapolation as border ones, so Eps can be determined automatically based on the dominant set [8]. After the local density or Eps is estimated, all these algorithms apply DBSCAN to merge those data with similar density.

EDBSCAN [9] assigns varied values for Eps according to the local density based on the -nearest neighbors, and the clustering process starts from the highest local density point towards the lowest local density one. DDSC [10] uses the Homogeneity test to detect the density difference between different regions; if their density difference is less than , those regions will be merged into the same cluster.

*(2) Redefinition of the Density with No Parameter Eps. *-density [11] estimates the local density of the nonnormalized probability distribution according to the neighborhood of radius , and the hierarchical agglomerative strategy is used to merge clusters according to the di-similarity measures. In the multidensity DBSCAN, two adjacent spatial regions are separated into two clusters when the difference between DST and AVGDST violates a threshold, where DST is the average distance between one point and its -nearest neighbors and AVGDST is the average distance between any point in one cluster and its neighbors [12]. In -DBSCAN [13], the -means clustering algorithm is employed to divide all points into -level groups based on their -density values (here, -density value is the average distance of the point and its -nearest neighbors), and then DBSCAN is used to merge similar data according to the density levels.

Among these methods, Eps is automatically calculated according to different densities in the first method, and the definitions of different densities are proposed in the second method. Hereinto, one kind of definition is based on the -nearest neighborhood method such as the local density of the nonnormalized probability distribution [11], the average distance between one point and its -nearest neighbors [12], -density [13], and neighborhood density [7]. Based on these definitions, the varied densities can be represented separately by di-similarity measures [11], the difference between DST and AVGDST [12], and the density variation or density level [7].

It is known that the main objective of defining the density concept is to cluster the objects with similar density into the same cluster. For example, in Figure 1, the blue circles of A, B, C, and D can be viewed as one normal cluster, while the red circle E is an abnormal one. However, it is difficult to describe the density with Eps and MinPts because of different and varied densities. For example, in Figure 1(a), the fixed Eps have different densities in all these circle regions and the difference between circle E and another circle is not apparent.