Commodity Attributes Methods Wavelength scanned Spectral pretreatment Calibration models Results No. of samples References Tea leaves Caffeine, catechin (gallic acid, GC, EGC, C, EGCG, EC, GCG, ECG) NIR 400∼2500 nm Win ISI Score MPLS r 2 for caffeine: 0.97; GA: 0.85; GC: 0.78; EGC: 0.95; C: 0.91; EGCG: 0.97; EC: 0.95; GCG: 0.85; ECG: 0.94665 [38 ] Black tea Amino acids, caffeine, theaflavins, water extract FT-NIR 4000∼10,000 cm−1 SNV, MSC PLS, Si-PLS, GA-PLS, Bi-PLS Using GA-PLS, R p for amino acids: 0.9498; water extract: 0.8785; using Bi-PLS, R p for caffeine: 0.9232; theaflavins: 0.924 95 [39 ] Black, dark, oolong, and green tea Total polyphenols, caffeine, free amino acids FT-NIR 4000∼10,000 cm−1 MSC combined with first-order derivative and SG smoothing PLS1, PLS2, RF-PLS, CARS-PLS CARS-PLS ( ) achieved best predictive performance for total polyphenols: 0.994; caffeine: 0.986; free amino acids: 0.993 145 [35 ] Green tea Total polyphenols VIS-NIR 300∼1000 nm SNV PLS, Si-PLS, CARS-Si-PLS, GA-Si-PLS Prediction set (R p ) for PLS: 0.8043; Si-PLS: 0.8804; GA-Si-PLS: 0.8859; CARS-Si-PLS: 0.8753 50 [40 ] Tea extract Total polyphenols VIS-NIR 300∼1000 nm SNV PLS, Si-PLS, GA-PLS, CARS-PLS, ACO-PLS Prediction set (RMSEP) for PLS: 0.7659; Si-PLS: 0.8766; GA-PLS: 0.8993; CARS-PLS: 0.8897; ACO-PLS: 0.8853 85 [41 ] Longjing tea leaves Moisture content NIR hyperspectral imaging 874.41∼1733.91 nm Smoothing filter (3 ∗ 3 window) MNF rotation, 2D filter LoG (Laplacian of Gaussian) PCPLS1-9, SPA-PLS r 2 for PCPLS1-9: 0.9491, 0.8826, 0.9531, 0.8905, 0.9548, 0.9105, 0.9713, 0.9071, and 0.9610; SPA-PLS: 0.921630 [42 ] “Biluochun” green tea Sensory attributes FT-NIR 4000∼10,000 cm−1 SNV Si-PLS, PCA, BPNN, BP-AdaBoost BP-AdaBoost model revealed its superior performance, R p = 0.7717 70 [37 ] Black tea Color sensory quality VIS-NIR 200∼1100 nm SNV GA-BPANN R p = 0.8935127 [18 ] Black tea Theaflavin, thearubigin NIR 1000∼1799 nm MSC, SG 1st derivative, Min/Max, SNVT PLS, SI-PLS, SI-CARS-PLS, SI-CARS-ELM, SI-CARS-SVM, SI-CARS-ELM-AdaBoost ELM-AdaBoost was used for the validation, = 0.893 78 [43 ] Green tea Lutein, Chl-b, Chl-a, Phe-b, Phe-a, - carotene VIS-NIR 400∼2498 nm ANOVA PLS, SPA, MLR MLR gave superior prediction ( ) for lutein: 0.975; Chl-b: 0.973; Chl-a: 0.993; Phe-b: 0.919; Phe-a: 0.962; β -carotene: 0.965 135 [44 ] White tea and albino tea Tea polyphenols, free amino acids, moisture, ash contents FT-NIR 4000∼12,400 cm−1 MSC, SNV, SG smoothing, KND, 1st and 2nd derivatives DPLS, DA DPLS: 98.48; DA: 100 70 [45 ] Green, yellow, white, black, and oolong tea Region of interest VIS-NIR 589, 635, 670, 783 nm SNV LDA, Lib-SVM, ELM Lib-SVM was the best model, r 2 = 98.39% 206 [46 ] Green, yellow, white, black, and pu-erh tea — NIR 950∼1760 nm SG smoothing, standard deviation, SNV PCA, MDS, t-SNE, ISOMAP, SVM-ECOC SVM-ECOC model provided a classification accuracy of 97.41 ± 0.16% 6 [19 ] Iron Buddha tea Total polyphenols VIS-NIR 800∼2500 nm SNV PLS (LS-SVM and BPNN) Classification accuracies: LS-SVM: 95.0%; BPNN: 97.5% 180 [47 ] Pu-erh tea Metabolomics analysis NIR 3600∼12,500 cm−1 OPUS 7.2 software from Bruker Optics PCA, PLS, HCA, PLS-DA PLS model showed nearly complete fit and excellent predictive capability (r 2 = 0.967; Q 2 = 0.93) 17 [48 ] Green tea (Anji-white) — NIR 4000∼12,000 cm−1 Smoothing, 2nd derivative, SNV OCPLS, SIMCA With SNV preprocessing, OCPLS provided sensitivity of 0.886 and specificity of 0.951; SIMCA provided sensitivity of 0.886 and specificity of 0.938 and achieved best classification performance 248 [36 ] Green tea Catechin, EC, EGC, ECG, EGCG, GCG NIR 1050∼2500 nm 1st derivative PLS, BP-ANN, SVM Accuracy (%): PLS: 100.000; BP-ANN: 95.455; SVM: 98.485 220 [49 ] Green tea — NIR 4000∼9000 cm−1 2nd derivative, SNV OVR-PLSDA, OVO-PLSDA, PLSDA-softmax, ES-PLSDA Total accuracy (%): OVR-PLSDA: 64.68; OVO-PLSDA: 84.94; PLSDA-softmax: 92.99; ES-PLSDA: 93.77 1540 [50 ] Black tea Caffeine, water extract, total polyphenols, free amino acids NIR 4000∼12,500 cm−1 SNV, MSC, Min/Max PLS (1) R in the prediction set for caffeine: 0.955; water extracts: 0.962; total polyphenols: 0.954; free amino acids: 0.927 (2) Identification accuracy (%): 94.30 140 [51 ] Green and black tea — NIR 3800∼14,000 cm−1 1st derivative, SG smoothing SIMCA, PLSDA, SPA-LDA Classification accuracy (%): SIMCA: 88.00; PLSDA: 92.00; SPA-LDA: 100 82 [52 ] Oolong tea — NIR 4000∼12,000 cm−1 SNV, 2nd derivative, smoothing PLSDA The sensitivity of PLSDA model for raw data: 0.971; SNV: 1.000; 2nd derivative: 0.886; smoothing: 0.971 570 [53 ] Oolong tea Polyphenols, alkaloids, protein, volatile and nonvolatile acids, aroma compounds NIR and NMR 3300∼12,500 cm−1 SNV, 2nd derivative, SG smoothing PCA, PLSDA Discrimination accuracy (%) for NMR + NIR data: 86.20∼95.80; NMR data: 68.20∼78.70; NIR data: 80.00∼89.30 90 [17 ]
Abbreviations: ACO, ant colony optimization; ANOVA, one-way analysis of variance; Bi-PLS, backward interval PLS; BP-ANN, backpropagation artificial neural network; BPNN, backpropagation neural network; C, (+)-catechin; CARS-PLS, competitive adaptive reweighted sampling-partial least squares; Chl-a, chlorophyll a; Chl-b, chlorophyll b; EC, (−)-epicatechin; ECG, (−)-epicatechin gallate; EGC, (−)-epigallocatechin; EGCG, (−)-epigallocatechin gallate; ELM, extreme learning machine; ES, ensemble strategy; NIR: near-infrared reflectance; FT-NIR: Fourier transform near-infrared reflectance; GA, genetic algorithm; GC, (−)-gallocatechin; GCG, (−)-gallocatechin gallate; ISOMAP, isometric mapping; KND, Karl Norris derivative filter; LDA, linear discriminant analysis; PLS, partial least squares; PLSDA, partial least squares discriminant analysis; Lib-SVM, library support vector machine; MDS, multidimensional scaling; Min/Max, min/max normalization; MLR, multiple linear regression; MNF, minimal noise fraction; MPLS, modified partial least squares; MSC, multiplicative scattering correction; NMR, nuclear magnetic resonance; OCPLS, one-class partial least squares; OVO-PLSDA, one-versus-one-partial least squares discriminant analysis; OVR-PLSDA, one-versus-rest-partial least squares discriminant analysis; PCA, principal component analysis; Phe-a, pheophytin a; Phe-b, pheophytin b;
Q 2 , cross-validated correlation coefficient;
r 2 , coefficient of determination in the prediction set;
R p , correlation coefficient in the prediction set;
, determinate coefficient; RF-PLS, random frog-partial least squares; SG smoothing, Savitzky–Golay smoothing; SIMCA, soft independent modeling of class analogy; Si-PLS, synergy interval partial least squares; SNV, standard normal variate; SNVT, standard normal variate transformation; SPA-LDA, successive projections algorithm associated with linear discriminant analysis; SVM, support vector machine; SVM-ECOC, error-correcting output code (ECOC) model containing support vector machine (SVM); t-SNE, t-distributed stochastic neighbor embedding; VIS-NIR, visible and near-infrared reflectance; —, not mentioned.