Mathematical Problems in Engineering

Research Article

Weighted k-Prototypes Clustering Algorithm Based on the Hybrid Dissimilarity Coefficient

Table 1

Symbolic description.


Symbol	Description

	is a nonempty limited datasets, containing data objects, and each data object is descripted by features
	# of clusters
	data object. For any, if and only if , then
	Categorical eigenvalues of data object; , is the feature dimension of the dataset, the former −dimension is the Categorical Feature, and the latter −dimension is the Numerical Feature
	is Categorical Data,
	is Numerical Data,
	Describes the feature of data object
	Describes the feature of the Categorical Data part of the mixed-type data object
	Describes the feature of the Numerical Data part of the mixed-type data object
	eigenvalues of Cluster Centers
	The eigenvalue domain of feature is described by and represents all possible values of each feature ; is the eigenvalue domain corresponding to each feature; for Numerical Feature, ; for Categorical Feature, its value domain is limited and disordered. Suppose that the s-dimensional feature of a dataset has categories, then
	Calculates the dissimilarity between data objects
	The cluster of , is all the clusters contained in the dataset
	The Cluster Center corresponding to the cluster: is the Cluster Centers set
	# of data objects in cluster , given by formula
	The s-dimension feature of cluster , the data object number of eigenvalue of , given by the formula
	Total number of data objects in all clusters contain eigenvalues
	The relative frequency in the cluster of eigenvalues
	The frequency distribution between clusters of eigenvalues
	Entropy of Categorical Feature
	The quantized entropy of Categorical Feature
	The entropy weight of Categorical Feature
	The maximum values of Numerical Feature
	The minimum values of Numerical Feature
	Quantized Numerical Feature
	The weighted Hybrid Dissimilarity Coefficient
	Consider the Cost Function of the weight
	The average distance between two data objects
	Local neighborhood density
	Cutoff distance
	Relative distance