Research Article

K-Modes Clustering Algorithm Based on Weighted Overlap Distance and Its Application in Intrusion Detection

Algorithm 1

The initial center selects the Ini_Weight algorithm.
Input: Information table IS = (U, A, V, f), where U = {x1, ..., xn}, A = {a1, ..., am}; the number of clusters k expected.
Output: k initial center points.
Input: Information table IS = (U, A, V, f), where U = {x1, ..., xn}, A = {a1, ..., am}; the number of clusters k expected.
Output: k initial center points.
Initialization: Let C = Φ, where C is the initial set of center points that have been selected
(1)Calculate the division U/IND (A-{a}) and U/IND ({a}) respectively by counting sorting;
(1.1)Calculate the information entropy E (A-{a}) of IND (A-{a});
(1.2)Calculate the importance of the attribute a Sig (a), and thus obtain the weight of a weight (a);
(1.3)For any x ∈ U, calculate |[x]{a}| according to the division U/IND ({a}), and
(2)Calculate WDens (x) for any x ∈ U;
(3)Select the object y with the largest weighted average density from U as the first initial center, and C = C{y};
(4)If |C| < k, go to step (5), otherwise go to step (10);
(5)Assume that C = {c1, c2, ..., cq}, repeated for any x ∈ U -C
(5.1)Calculate the weighted overlapping distance wd (x, ci) of x and ci, where ci ∈ C, 1 ≤ i ≤ q;
(5.2)Calculate Pos_ Center (x);
(6)Select the object y that is the most likely to be the initial center from U-C as the new initial center.
 And let C = C{y};
(7)If |C| < k, go to step (5), otherwise go to step (8);
(8)Return k initial centers in C.