Review Article

A Survey of Parallel Clustering Algorithms Based on Spark

Table 1

Comparison of key features of various algorithms.

Clusters shapeHandling outliersInput parametersData points’ distribution

K-meansSpheralityNomRandom
n
HierarchySpheralityNomRandom or by data space

DensityArbitraryYesEpsData space
MinPts

ModelMathematical modelYesmRandom

m: number of clusters; n: maximum number of iterations; Eps: radius of data point neighborhood; MinPts: minimum density of core data point; : parameters required for a specific model.