Journal of Analytical Methods in Chemistry

Research Article

Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches

Table 2

Comparative classification of tea samples based on seven different classifiers.


Classifier	Training set (70% of 14 dataset)			Testing set (30% of 14 dataset)			15 dataset
Classifier	Recall 1	Recall 2	MAR^a	Recall 1	Recall 2	MAR	Recall 1	Recall 2	MAR

LDA	0.9994	0.9983	0.9989 A	0.9826	0.8925	0.9376 A	0.6199	0.2347	0.4273 C
SVM	1.0000	1.0000	1.0000 A	0.9639	0.8291	0.8965 B	0.9074	0.2298	0.5686 A
SGD^b	0.9987	0.9804	0.9895 C	0.9740	0.7961	0.8851 B	0.6621	0.1638	0.4130 C
Decision tree	1.0000	1.0000	1.0000 A	0.9132	0.7046	0.8089 D	0.8461	0.2461	0.5461 B
Random forest	0.9996	0.9904	0.9950 B	0.9703	0.7046	0.8374 C	0.8723	0.2195	0.5459 B
AdaBoost^c	1.0000	1.0000	1.0000 A	0.9691	0.7963	0.8827 B	0.8370	0.2622	0.5496 B
MLP^d	1.0000	1.0000	1.0000 A	0.9670	0.8957	0.9314 A	0.5842	0.2576	0.4209 C

^aMAR: macro average recall; ^bSGD: stochastic gradient descent; ^cAdaBoost: adaptive boosting; ^dMLP: multilayer perceptron. Means with the same letter(s) are not significantly different at 0.01 level.