Research Article
A Variable-Clustering-Based Feature Selection to Improve Positive and Negative Discrimination of P53 Protein in Colorectal Cancer Patients
Algorithm 1
Pseudocode of IV_Cluster Methodology.
Input: Datasets D, Feature sets F | Output: K features | 1: Use the optimal decision tree algorithm to do variable binning on the feature set. | 2: Do WOE coding for variable binning results. | 3: Map the dataset D to a new dataset D1 through the WOE encoding of the variable. | 4: Use the formula (see 2) to calculate the variable IV value. | 5: According to the required K feature sets, use the clustering algorithm to do K-cluster variable clustering on D1. | 6: The variable with the largest IV value is selected from the K clusters to form K feature subsets. |
|