Research Article

[Retracted] The Use of Hellinger Distance Undersampling Model to Improve the Classification of Disease Class in Imbalanced Medical Datasets

Algorithm 1: Hellinger Distance Undersampling (HDUS) pseudocode

Input: Imbalanced Training dataset (ITrD)
Output: Balanced Training dataset (BTrD)
1 Group the ITrD according to the classes
2  C1= ITrD (class1) //C1 indicates the minor class which contains less number of instances
3  C2= ITrD (class2) //C2 indicates the major class which contains more number of instances
4  For i in rows of (C2)
5   For j in rows of (C1)
6    Simi,j = calculate the similarity between C2(i) and C1(j) using Hellinger Distance
7     append Simi,j To HD(i)
8   Next j
9    select m top values from HD (i) // where m is a given number of neighbouring minority class
10    HDsum(i)= sum the selected m top values
11   Next i
12 C2HD=select w majority class instances according to the highest similarity value in HDsum(i),
  // where w is a given number
13 return (BTrD= C2HD +C1)
Algorithm 1: Hellinger Distance Undersampling (HDUS) pseudocode