Applied Computational Intelligence and Soft Computing

Research Article

Distributed Nonparametric and Semiparametric Regression on SPARK for Big Data Forecasting

Table 2

Fragments of datasets.

(a) Synthetic dataset




2.31	0.87	1.26
1.45	1.27	−0.36
−0.5	0.47	−0.06
−1.9	−0.94	1.51
−1.51	0.33	1.72
−0.09	−0.1	−1.71
0.17	0.24	1.64
1.8	0.07	0.77
−0.5	0.45	−0.22
0.76	0.94	1.32

(b) Hanover dataset


: travel time	: length	: speed	: stops	: congestion	: tr. lights	: left turns

256	2107.51	30.30	2.10	42.43	225	0
284	2349.74	22.36	4.89	85.56	289	4
162	1248.51	19.33	9.27	85.91	81	1
448	2346.80	20.58	8.39	86.60	289	1
248	352.67	19.33	9.27	85.91	25	1
327	907.30	23.54	3.96	86.95	100	0
443.5	1093.29	22.01	5.44	88.66	169	0
294	348.35	23.68	3.81	89.33	25	0
125.5	1236.62	18.97	10.65	85.21	81	1
511.5	357.23	19.96	7.66	84.85	25	1

(c) Airlines dataset


DepDelay	DayOfWeek	Distance	MeanVis	MeanWind	Thunderstorm	Precipitationmm	WindDirDegrees	Num	Dest	DepTime

0	3	588	24	14	0	12.7	333	18	SNA	2150
63	7	546	22	13	0	1.78	153	2	GEG	2256
143	5	919	24	18	0	9.14	308	27	MCI	2203
−4	6	599	22	16	0	11.68	161	23	SFO	2147
4	6	368	24	14	0	7.62	151	22	LAS	2159
19	5	188	20	35	0	1.02	170	23	IDA	2204
25	7	291	23	19	0	0	128	22	BOI	2200
1	6	585	17	10	0	6.6	144	25	SJC	2151
0	3	507	28	13	0	9.91	353	24	PHX	2150
38	2	590	28	23	0	0	176	7	LAX	2243