Research Article

Weighted k-Prototypes Clustering Algorithm Based on the Hybrid Dissimilarity Coefficient

Table 1

Symbolic description.

SymbolDescription

is a nonempty limited datasets, containing data objects, and each data object is descripted by features
# of clusters
data object. For any, if and only if , then
Categorical eigenvalues of data object; , is the feature dimension of the dataset, the former −dimension is the Categorical Feature, and the latter −dimension is the Numerical Feature
is Categorical Data,
is Numerical Data,
Describes the feature of data object
Describes the feature of the Categorical Data part of the mixed-type data object
Describes the feature of the Numerical Data part of the mixed-type data object
eigenvalues of Cluster Centers
The eigenvalue domain of feature is described by and represents all possible values of each feature ; is the eigenvalue domain corresponding to each feature; for Numerical Feature, ; for Categorical Feature, its value domain is limited and disordered. Suppose that the s-dimensional feature of a dataset has categories, then
Calculates the dissimilarity between data objects
The cluster of , is all the clusters contained in the dataset
The Cluster Center corresponding to the cluster: is the Cluster Centers set
# of data objects in cluster , given by formula
The s-dimension feature of cluster , the data object number of eigenvalue of , given by the formula
Total number of data objects in all clusters contain eigenvalues
The relative frequency in the cluster of eigenvalues
The frequency distribution between clusters of eigenvalues
Entropy of Categorical Feature
The quantized entropy of Categorical Feature
The entropy weight of Categorical Feature
The maximum values of Numerical Feature
The minimum values of Numerical Feature
Quantized Numerical Feature
The weighted Hybrid Dissimilarity Coefficient
Consider the Cost Function of the weight
The average distance between two data objects
Local neighborhood density
Cutoff distance
Relative distance