Scientific Programming

Research Article

K-Modes Clustering Algorithm Based on Weighted Overlap Distance and Its Application in Intrusion Detection

The initial center selects the Ini_Weight algorithm.

	Input: Information table IS = (U, A, V, f), where U = {x1, ..., xn}, A = {a1, ..., am}; the number of clusters k expected.
	Output: k initial center points.
	Input: Information table IS = (U, A, V, f), where U = {x1, ..., xn}, A = {a1, ..., am}; the number of clusters k expected.
	Output: k initial center points.
	Initialization: Let C = Φ, where C is the initial set of center points that have been selected
(1)	Calculate the division U/IND (A-{a}) and U/IND ({a}) respectively by counting sorting;
(1.1)	Calculate the information entropy E (A-{a}) of IND (A-{a});
(1.2)	Calculate the importance of the attribute a Sig (a), and thus obtain the weight of a weight (a);
(1.3)	For any x ∈ U, calculate \|[x]{a}\| according to the division U/IND ({a}), and
(2)	Calculate WDens (x) for any x ∈ U;
(3)	Select the object y with the largest weighted average density from U as the first initial center, and C = C{y};
(4)	If \|C\| < k, go to step (5), otherwise go to step (10);
(5)	Assume that C = {c1, c2, ..., cq}, repeated for any x ∈ U -C
(5.1)	Calculate the weighted overlapping distance wd (x, ci) of x and ci, where ci ∈ C, 1 ≤ i ≤ q;
(5.2)	Calculate Pos_ Center (x);
(6)	Select the object y that is the most likely to be the initial center from U-C as the new initial center.
And let C = C{y};
(7)	If \|C\| < k, go to step (5), otherwise go to step (8);
(8)	Return k initial centers in C.