Weighted k-Prototypes Clustering Algorithm Based on the Hybrid Dissimilarity Coefficient
Table 1
Symbolic description.
Symbol
Description
is a nonempty limited datasets, containing data objects, and each data object is descripted by features
# of clusters
data object. For any, if and only if , then
Categorical eigenvalues of data object; , is the feature dimension of the dataset, the former −dimension is the Categorical Feature, and the latter −dimension is the Numerical Feature
is Categorical Data,
is Numerical Data,
Describes the feature of data object
Describes the feature of the Categorical Data part of the mixed-type data object
Describes the feature of the Numerical Data part of the mixed-type data object
eigenvalues of Cluster Centers
The eigenvalue domain of feature is described by and represents all possible values of each feature ; is the eigenvalue domain corresponding to each feature; for Numerical Feature, ; for Categorical Feature, its value domain is limited and disordered. Suppose that the s-dimensional feature of a dataset has categories, then
Calculates the dissimilarity between data objects
The cluster of , is all the clusters contained in the dataset
The Cluster Center corresponding to the cluster: is the Cluster Centers set
# of data objects in cluster , given by formula
The s-dimension feature of cluster , the data object number of eigenvalue of , given by the formula
Total number of data objects in all clusters contain eigenvalues
The relative frequency in the cluster of eigenvalues
The frequency distribution between clusters of eigenvalues