Research Article

Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches

Table 2

Comparative classification of tea samples based on seven different classifiers.

ClassifierTraining set (70% of 14 dataset)Testing set (30% of 14 dataset)15 dataset
Recall 1Recall 2MARaRecall 1Recall 2MARRecall 1Recall 2MAR

LDA0.99940.99830.9989 A0.98260.89250.9376 A0.61990.23470.4273 C
SVM1.00001.00001.0000 A0.96390.82910.8965 B0.90740.22980.5686 A
SGDb0.99870.98040.9895 C0.97400.79610.8851 B0.66210.16380.4130 C
Decision tree1.00001.00001.0000 A0.91320.70460.8089 D0.84610.24610.5461 B
Random forest0.99960.99040.9950 B0.97030.70460.8374 C0.87230.21950.5459 B
AdaBoostc1.00001.00001.0000 A0.96910.79630.8827 B0.83700.26220.5496 B
MLPd1.00001.00001.0000 A0.96700.89570.9314 A0.58420.25760.4209 C

aMAR: macro average recall; bSGD: stochastic gradient descent; cAdaBoost: adaptive boosting; dMLP: multilayer perceptron. Means with the same letter(s) are not significantly different at 0.01 level.