Table of Contents Author Guidelines Submit a Manuscript
Journal of Analytical Methods in Chemistry
Volume 2019, Article ID 1537568, 8 pages
https://doi.org/10.1155/2019/1537568
Research Article

Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches

1College of Quality & Safety Engineering, China Jiliang University, Xueyuan Street, Xiasha Higher Education District, Hangzhou 310018, China
2BioCircuits Institute, University of California, La Jolla, San Diego, CA 92093, USA
3Zhejiang Provincial Key Laboratory of Biometrology and Inspection & Quarantine, College of Life Sciences, China Jiliang University, Xueyuan Street, Xiasha Higher Education District, Hangzhou 310018, China
4Department of Computer Science, Zhejiang University, Hangzhou 310027, China

Correspondence should be addressed to Zi-Hong Ye; nc.ude.uljc@eyhz

Received 3 August 2018; Accepted 29 November 2018; Published 3 January 2019

Guest Editor: Andrey Bogomolov

Copyright © 2019 Xue-Zhen Hong et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

This work presents a reliable approach to trace teas’ geographical origins despite changes in teas caused by different harvest years. A total of 1447 tea samples collected from various areas in 2014 (660 samples) and 2015 (787 samples) were detected by FT-NIR. Seven classifiers trained on the 2014 dataset all succeeded to trace origins of samples collected in 2014; however, they all failed to predict origins for the 2015 samples due to different data distributions and imbalanced dataset. Three outlier detection based undersampling approaches—one-class SVM (OC-SVM), isolation forest and elliptic envelope—were then proposed; as a result, the highest macro average recall (MAR) for the 2015 dataset was improved from 56.86% to 73.95% (by SVM). A model updating approach was also applied, and the prediction MAR was significantly improved with increase in the updating rate. The best MAR (90.31%) was first achieved by the OC-SVM combined SVM classifier at a 50% rate.