Journal of Probability and Statistics

Research Article

Random Forests in Count Data Modelling: An Analysis of the Influence of Data Features and Overdispersion on Regression Performance

The impact of overdispersion and data features on the minimal terminal node size tuning.


Data types	Variance-to-mean relationship	Node size	N = 50 (%)						N = 250 (%)						N = 1250 (%)



Categorical	Linear		25	26	18	32	31	34	5	11	2	27	26	25	0	0	0	18	15	17
			12	10	16	29	31	26	31	21	28	47	48	39	22	16	28	43	56	52
			63	64	66	39	38	40	64	68	70	26	26	36	78	84	72	39	29	31
	Quadratic		22	16	19	31	27	31	4	6	5	30	18	18	1	0	0	15	10	10
			15	17	11	26	22	14	22	16	25	33	52	45	14	16	10	49	57	51
			63	67	70	43	51	55	74	78	70	37	30	37	85	84	90	36	33	39

25% of predictors are quantitative	Linear		100	0	0	100	100	0	0	0	0	0	100	100	0	0	0	100	0	0
			0	0	100	0	0	100	100	0	0	100	0	0	0	100	100	0	100	100
			0	100	0	0	0	0	0	100	100	0	0	0	100	0	0	0	0	0
	Quadratic		0	0	0	0	0	100	0	0	0	0	0	0	0	0	0	100	0	0
			0	100	0	0	100	0	100	100	0	0	0	100	0	100	0	0	0	0
			100	0	100	100	0	0	0	0	100	100	100	0	100	0	100	0	100	100

50% of predictors are quantitative	Linear		0	0	0	0	0	100	0	0	0	0	0	100	0	0	0	100	0	0
			0	0	100	100	0	0	100	0	0	100	0	0	100	100	0	0	100	100
			100	100	0	0	100	0	0	100	100	0	100	0	0	0	100	0	0	0
	Quadratic		0	100	0	0	0	100	0	0	0	0	100	100	0	0	0	0	100	0
			100	0	0	100	0	0	100	0	0	0	0	0	0	0	100	100	0	0
			0	0	100	0	100	0	0	100	100	100	0	0	100	100	0	0	0	100

75% of predictors are quantitative	Linear		100	0	100	0	0	100	100	100	0	100	0	0	0	0	0	100	0	0
			0	0	0	0	0	0	0	0	0	0	100	100	100	100	0	0	100	100
			0	100	0	100	100	0	0	0	100	0	0	0	0	0	100	0	0	0
	Quadratic		0	100	0	100	0	0	100	0	100	0	100	0	0	0	100	0	100	0
			0	0	0	0	0	0	0	0	0	100	0	100	100	100	0	100	0	0
			100	0	100	0	100	100	0	100	0	0	0	0	0	0	0	0	0	100

Quantitative	Linear		0	100	100	0	0	100	0	100	100	100	0	100	100	0	0	100	100	100
			0	0	0	100	100	0	0	0	0	0	100	0	0	0	0	0	0	0
			100	0	0	0	0	0	100	0	0	0	0	0	0	100	100	0	0	0
	Quadratic		100	0	100	0	0	100	0	0	100	0	100	100	0	0	0	100	0	100
			0	0	0	100	100	0	0	0	0	100	0	0	0	100	100	0	100	0
			0	100	0	0	0	0	100	100	0	0	0	0	100	0	0	0	0	0