Research Article

A Variable-Clustering-Based Feature Selection to Improve Positive and Negative Discrimination of P53 Protein in Colorectal Cancer Patients

Algorithm 1

Pseudocode of IV_Cluster Methodology.
Input: Datasets D, Feature sets F
Output: K features
1: Use the optimal decision tree algorithm to do variable binning on the feature set.
2: Do WOE coding for variable binning results.
3: Map the dataset D to a new dataset D1 through the WOE encoding of the variable.
4: Use the formula (see 2) to calculate the variable IV value.
5: According to the required K feature sets, use the clustering algorithm to do K-cluster variable clustering on D1.
6: The variable with the largest IV value is selected from the K clusters to form K feature subsets.