| Input: database descriptors, variable importance threshold, accuracy threshold, and number of rounds |
| Output: selected variables |
(1) | begin |
(2) | Create empty optimized model set; |
(3) | for to Number of rounds do |
(4) | Define all the descriptor database variables as the current variables; |
(5) | while True do |
(6) | Split dataset in training and test partitions; |
(7) | Create and train the model using training data partition; |
(8) | Select the most important variables from the trained model; |
(9) | Calculate the cumulative importance of variables from the trained model; |
(10) | if max (cumulative importance of variables) < Variable importance threshold then |
(11) | Exit loop; |
(12) | end |
(13) | Train the model using only the most important variables; |
(14) | Test the trained model and calculate the accuracy; |
(15) | if Calculated accuracy < Accuracy threshold then |
(16) | Exit loop; |
(17) | end |
(18) | Add current model to optimized model set; |
(19) | Define the most important variables from the trained model as the current variables; |
(20) | end |
(21) | end |
(22) | Group the models by number of variables; |
(23) | Remove outliers from the grouped model set; |
(24) | Select the group of models with the highest frequency and their number of variables “N”; |
(25) | Rank the variables by the mean of the importance calculated in step 7; |
(26) | Return the “N” most important variables; |
(27) | end |