Research Article

Integration of Residue Attributes for Sequence Diversity Characterization of Terpenoid Enzymes

Figure 5

Nested random forest error rates. Nested random forest variable selection: variables have been ordered by their importance scores, new RF models are built by single variable addition in the nested RF setup, and RF error rates are measured. In this experiment, each box and whisker plot represents the distribution of error rates from 100 trials at each nested RF step. In total, there were 93 steps corresponding to the 93 top indices from previous step. The -axis shows the error rates as a percentage. The threshold of acceptable mean error rate was set at 2 percent shown by the red horizontal line.
753428.fig.005