Review Article

The Quality Control of Tea by Near-Infrared Reflectance (NIR) Spectroscopy and Chemometrics

Table 1

Overview of NIR spectroscopy for the quality control of tea.

CommodityAttributesMethodsWavelength scannedSpectral pretreatmentCalibration modelsResultsNo. of samplesReferences

Tea leavesCaffeine, catechin (gallic acid, GC, EGC, C, EGCG, EC, GCG, ECG)NIR400∼2500 nmWin ISI ScoreMPLSr2 for caffeine: 0.97; GA: 0.85; GC: 0.78; EGC: 0.95; C: 0.91; EGCG: 0.97; EC: 0.95; GCG: 0.85; ECG: 0.94665[38]
Black teaAmino acids, caffeine, theaflavins, water extractFT-NIR4000∼10,000 cm−1SNV, MSCPLS, Si-PLS, GA-PLS, Bi-PLSUsing GA-PLS, Rp for amino acids: 0.9498; water extract: 0.8785; using Bi-PLS, Rp for caffeine: 0.9232; theaflavins: 0.92495[39]
Black, dark, oolong, and green teaTotal polyphenols, caffeine, free amino acidsFT-NIR4000∼10,000 cm−1MSC combined with first-order derivative and SG smoothingPLS1, PLS2, RF-PLS, CARS-PLSCARS-PLS () achieved best predictive performance for total polyphenols: 0.994; caffeine: 0.986; free amino acids: 0.993145[35]
Green teaTotal polyphenolsVIS-NIR300∼1000 nmSNVPLS, Si-PLS, CARS-Si-PLS, GA-Si-PLSPrediction set (Rp) for PLS: 0.8043; Si-PLS: 0.8804; GA-Si-PLS: 0.8859; CARS-Si-PLS: 0.875350[40]
Tea extractTotal polyphenolsVIS-NIR300∼1000 nmSNVPLS, Si-PLS, GA-PLS, CARS-PLS, ACO-PLSPrediction set (RMSEP) for PLS: 0.7659; Si-PLS: 0.8766; GA-PLS: 0.8993; CARS-PLS: 0.8897; ACO-PLS: 0.885385[41]
Longjing tea leavesMoisture contentNIR hyperspectral imaging874.41∼1733.91 nmSmoothing filter (3 ∗ 3 window) MNF rotation, 2D filter LoG (Laplacian of Gaussian)PCPLS1-9, SPA-PLSr2 for PCPLS1-9: 0.9491, 0.8826, 0.9531, 0.8905, 0.9548, 0.9105, 0.9713, 0.9071, and 0.9610; SPA-PLS: 0.921630[42]
“Biluochun” green teaSensory attributesFT-NIR4000∼10,000 cm−1SNVSi-PLS, PCA, BPNN, BP-AdaBoostBP-AdaBoost model revealed its superior performance, Rp = 0.771770[37]
Black teaColor sensory qualityVIS-NIR200∼1100 nmSNVGA-BPANNRp = 0.8935127[18]
Black teaTheaflavin, thearubiginNIR1000∼1799 nmMSC, SG 1st derivative, Min/Max, SNVTPLS, SI-PLS, SI-CARS-PLS, SI-CARS-ELM, SI-CARS-SVM, SI-CARS-ELM-AdaBoostELM-AdaBoost was used for the validation, = 0.89378[43]
Green teaLutein, Chl-b, Chl-a, Phe-b, Phe-a, -caroteneVIS-NIR400∼2498 nmANOVAPLS, SPA, MLRMLR gave superior prediction () for lutein: 0.975; Chl-b: 0.973; Chl-a: 0.993; Phe-b: 0.919; Phe-a: 0.962; β-carotene: 0.965135[44]
White tea and albino teaTea polyphenols, free amino acids, moisture, ash contentsFT-NIR4000∼12,400 cm−1MSC, SNV, SG smoothing, KND, 1st and 2nd derivativesDPLS, DADPLS: 98.48; DA: 10070[45]
Green, yellow, white, black, and oolong teaRegion of interestVIS-NIR589, 635, 670, 783 nmSNVLDA, Lib-SVM, ELMLib-SVM was the best model, r2 = 98.39%206[46]
Green, yellow, white, black, and pu-erh teaNIR950∼1760 nmSG smoothing, standard deviation, SNVPCA, MDS, t-SNE, ISOMAP, SVM-ECOCSVM-ECOC model provided a classification accuracy of 97.41 ± 0.16%6[19]
Iron Buddha teaTotal polyphenolsVIS-NIR800∼2500 nmSNVPLS (LS-SVM and BPNN)Classification accuracies: LS-SVM: 95.0%; BPNN: 97.5%180[47]
Pu-erh teaMetabolomics analysisNIR3600∼12,500 cm−1OPUS 7.2 software from Bruker OpticsPCA, PLS, HCA, PLS-DAPLS model showed nearly complete fit and excellent predictive capability (r2 = 0.967; Q2 = 0.93)17[48]
Green tea (Anji-white)NIR4000∼12,000 cm−1Smoothing, 2nd derivative, SNVOCPLS, SIMCAWith SNV preprocessing, OCPLS provided sensitivity of 0.886 and specificity of 0.951; SIMCA provided sensitivity of 0.886 and specificity of 0.938 and achieved best classification performance248[36]
Green teaCatechin, EC, EGC, ECG, EGCG, GCGNIR1050∼2500 nm1st derivativePLS, BP-ANN, SVMAccuracy (%): PLS: 100.000; BP-ANN: 95.455; SVM: 98.485220[49]
Green teaNIR4000∼9000 cm−12nd derivative, SNVOVR-PLSDA, OVO-PLSDA, PLSDA-softmax, ES-PLSDATotal accuracy (%): OVR-PLSDA: 64.68; OVO-PLSDA: 84.94; PLSDA-softmax: 92.99; ES-PLSDA: 93.771540[50]
Black teaCaffeine, water extract, total polyphenols, free amino acidsNIR4000∼12,500 cm−1SNV, MSC, Min/MaxPLS(1) R in the prediction set for caffeine: 0.955; water extracts: 0.962; total polyphenols: 0.954; free amino acids: 0.927
(2) Identification accuracy (%): 94.30
140[51]
Green and black teaNIR3800∼14,000 cm−11st derivative, SG smoothingSIMCA, PLSDA, SPA-LDAClassification accuracy (%): SIMCA: 88.00; PLSDA: 92.00; SPA-LDA: 10082[52]
Oolong teaNIR4000∼12,000 cm−1SNV, 2nd derivative, smoothingPLSDAThe sensitivity of PLSDA model for raw data: 0.971; SNV: 1.000; 2nd derivative: 0.886; smoothing: 0.971570[53]
Oolong teaPolyphenols, alkaloids, protein, volatile and nonvolatile acids, aroma compoundsNIR and NMR3300∼12,500 cm−1SNV, 2nd derivative, SG smoothingPCA, PLSDADiscrimination accuracy (%) for NMR + NIR data: 86.20∼95.80; NMR data: 68.20∼78.70; NIR data: 80.00∼89.3090[17]

Abbreviations: ACO, ant colony optimization; ANOVA, one-way analysis of variance; Bi-PLS, backward interval PLS; BP-ANN, backpropagation artificial neural network; BPNN, backpropagation neural network; C, (+)-catechin; CARS-PLS, competitive adaptive reweighted sampling-partial least squares; Chl-a, chlorophyll a; Chl-b, chlorophyll b; EC, (−)-epicatechin; ECG, (−)-epicatechin gallate; EGC, (−)-epigallocatechin; EGCG, (−)-epigallocatechin gallate; ELM, extreme learning machine; ES, ensemble strategy; NIR: near-infrared reflectance; FT-NIR: Fourier transform near-infrared reflectance; GA, genetic algorithm; GC, (−)-gallocatechin; GCG, (−)-gallocatechin gallate; ISOMAP, isometric mapping; KND, Karl Norris derivative filter; LDA, linear discriminant analysis; PLS, partial least squares; PLSDA, partial least squares discriminant analysis; Lib-SVM, library support vector machine; MDS, multidimensional scaling; Min/Max, min/max normalization; MLR, multiple linear regression; MNF, minimal noise fraction; MPLS, modified partial least squares; MSC, multiplicative scattering correction; NMR, nuclear magnetic resonance; OCPLS, one-class partial least squares; OVO-PLSDA, one-versus-one-partial least squares discriminant analysis; OVR-PLSDA, one-versus-rest-partial least squares discriminant analysis; PCA, principal component analysis; Phe-a, pheophytin a; Phe-b, pheophytin b; Q2, cross-validated correlation coefficient; r2, coefficient of determination in the prediction set; Rp, correlation coefficient in the prediction set; , determinate coefficient; RF-PLS, random frog-partial least squares; SG smoothing, Savitzky–Golay smoothing; SIMCA, soft independent modeling of class analogy; Si-PLS, synergy interval partial least squares; SNV, standard normal variate; SNVT, standard normal variate transformation; SPA-LDA, successive projections algorithm associated with linear discriminant analysis; SVM, support vector machine; SVM-ECOC, error-correcting output code (ECOC) model containing support vector machine (SVM); t-SNE, t-distributed stochastic neighbor embedding; VIS-NIR, visible and near-infrared reflectance; —, not mentioned.