Research Article

Random Forests in Count Data Modelling: An Analysis of the Influence of Data Features and Overdispersion on Regression Performance

Table 4

The impact of overdispersion and data features on the minimal terminal node size tuning.

Data typesVariance-to-mean relationshipNode sizeN = 50 (%)N = 250 (%)N = 1250 (%)

CategoricalLinear2526183231345112272625000181517
121016293126312128474839221628435652
636466393840646870262636788472392931
Quadratic221619312731465301818100151010
151711262214221625335245141610495751
636770435155747870373037858490363339

25% of predictors are quantitativeLinear100001001000000010010000010000
0010000100100001000001001000100100
01000000010010000010000000
Quadratic0000010000000000010000
010000100010010000010001000000
10001001000000100100100010001000100100

50% of predictors are quantitativeLinear000001000000010000010000
0010010000100001000010010000100100
10010000100001001000100000100000
Quadratic0100000100000010010000001000
1000010000100000000010010000
0010001000010010010000100100000100

75% of predictors are quantitativeLinear10001000010010010001000000010000
000000000010010010010000100100
0100010010000010000000100000
Quadratic01000100001000100010000010001000
0000000001000100100100010000
100010001001000100000000000100

QuantitativeLinear0100100001000100100100010010000100100100
000100100000001000000000
10000000100000000100100000
Quadratic1000100001000010001001000001000100
000100100000010000010010001000
01000000100100000010000000