Research Article

Random Forests in Count Data Modelling: An Analysis of the Influence of Data Features and Overdispersion on Regression Performance

Table 2

Effect of predictor types and dispersion amplitude on the number of variables randomly selected at each split.

Data typesVariance-to-mean relationshipBest mtryN = 50 (%)N = 250 (%)N = 1250 (%)

CategoricalLinear2818990565865989799808486100100100919294
936272730131171514000986
8749115000000000000
210840100310000000
Quadratic2898888627474989999818592100100100979998
78428161820118158000312
231476010100000000
217632000000000000

25% of predictors are quantitativeLinear20010001001001001001001000100100100100100100100
00000000001000000000
1000010000000000000000
01000000000000000000
Quadratic210010010010010001001001001000100100100100100100100
00000000001000000000
00000100000000000000

50% of predictors are quantitativeLinear210010010000010010000100100100100100100100100
000001000010010000000000
00010000000000000000
00001000000000000000
Quadratic2100100100100100100100100100100100100100100100100100100

75% of predictors are quantitativeLinear210010010001001001001001001000100100100100100100100
0001000000001000000000
Quadratic210001000100100100100100100100100100100100100100100
0100010000000000000000

QuantitativeLinear2100010010010010001001000100100100100100100100100
010000001000010000000000
Quadratic21001000100100100100100100100100100100100100100100100
00100000000000000000