Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches
Table 2
Comparative classification of tea samples based on seven different classifiers.
Classifier
Training set (70% of 14 dataset)
Testing set (30% of 14 dataset)
15 dataset
Recall 1
Recall 2
MARa
Recall 1
Recall 2
MAR
Recall 1
Recall 2
MAR
LDA
0.9994
0.9983
0.9989 A
0.9826
0.8925
0.9376 A
0.6199
0.2347
0.4273 C
SVM
1.0000
1.0000
1.0000 A
0.9639
0.8291
0.8965 B
0.9074
0.2298
0.5686 A
SGDb
0.9987
0.9804
0.9895 C
0.9740
0.7961
0.8851 B
0.6621
0.1638
0.4130 C
Decision tree
1.0000
1.0000
1.0000 A
0.9132
0.7046
0.8089 D
0.8461
0.2461
0.5461 B
Random forest
0.9996
0.9904
0.9950 B
0.9703
0.7046
0.8374 C
0.8723
0.2195
0.5459 B
AdaBoostc
1.0000
1.0000
1.0000 A
0.9691
0.7963
0.8827 B
0.8370
0.2622
0.5496 B
MLPd
1.0000
1.0000
1.0000 A
0.9670
0.8957
0.9314 A
0.5842
0.2576
0.4209 C
aMAR: macro average recall; bSGD: stochastic gradient descent; cAdaBoost: adaptive boosting; dMLP: multilayer perceptron. Means with the same letter(s) are not significantly different at 0.01 level.