Research Article
[Retracted] The Use of Hellinger Distance Undersampling Model to Improve the Classification of Disease Class in Imbalanced Medical Datasets
Algorithm 1: Hellinger Distance Undersampling (HDUS) pseudocode
Input: Imbalanced Training dataset (ITrD) | Output: Balanced Training dataset (BTrD) | 1 Group the ITrD according to the classes | 2 C1= ITrD (class1) //C1 indicates the minor class which contains less number of instances | 3 C2= ITrD (class2) //C2 indicates the major class which contains more number of instances | 4 For i in rows of (C2) | 5 For j in rows of (C1) | 6 Simi,j = calculate the similarity between C2(i) and C1(j) using Hellinger Distance | 7 append Simi,j To HD(i) | 8 Next j | 9 select m top values from HD (i) // where m is a given number of neighbouring minority class | 10 HDsum(i)= sum the selected m top values | 11 Next i | 12 C2HD=select w majority class instances according to the highest similarity value in HDsum(i), | // where w is a given number | 13 return (BTrD= C2HD +C1) |
|
Algorithm 1: Hellinger Distance Undersampling (HDUS) pseudocode |