Applied Bionics and Biomechanics

Research Article

[Retracted] The Use of Hellinger Distance Undersampling Model to Improve the Classification of Disease Class in Imbalanced Medical Datasets

Input: Imbalanced Training dataset (ITrD)
Output: Balanced Training dataset (BTrD)
1 Group the ITrD according to the classes
2 C1= ITrD (class1) //C1 indicates the minor class which contains less number of instances
3 C2= ITrD (class2) //C2 indicates the major class which contains more number of instances
4 For i in rows of (C2)
5 For j in rows of (C1)
6 Sim_i,j = calculate the similarity between C2(i) and C1(j) using Hellinger Distance
7 append Sim_i,j To HD(i)
8 Next j
9 select m top values from HD (i) // where m is a given number of neighbouring minority class
10 HD_sum(i)= sum the selected m top values
11 Next i
12 C2HD=select w majority class instances according to the highest similarity value in HD_sum(i),
// where w is a given number
13 return (BTrD= C2HD +C1)

Algorithm 1: Hellinger Distance Undersampling (HDUS) pseudocode