Cargando…

Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches

This work presents a reliable approach to trace teas' geographical origins despite changes in teas caused by different harvest years. A total of 1447 tea samples collected from various areas in 2014 (660 samples) and 2015 (787 samples) were detected by FT-NIR. Seven classifiers trained on the 2...

Descripción completa

Detalles Bibliográficos
Autores principales: Hong, Xue-Zhen, Fu, Xian-Shu, Wang, Zheng-Liang, Zhang, Li, Yu, Xiao-Ping, Ye, Zi-Hong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6335731/
https://www.ncbi.nlm.nih.gov/pubmed/30719371
http://dx.doi.org/10.1155/2019/1537568
Descripción
Sumario:This work presents a reliable approach to trace teas' geographical origins despite changes in teas caused by different harvest years. A total of 1447 tea samples collected from various areas in 2014 (660 samples) and 2015 (787 samples) were detected by FT-NIR. Seven classifiers trained on the 2014 dataset all succeeded to trace origins of samples collected in 2014; however, they all failed to predict origins for the 2015 samples due to different data distributions and imbalanced dataset. Three outlier detection based undersampling approaches—one-class SVM (OC-SVM), isolation forest and elliptic envelope—were then proposed; as a result, the highest macro average recall (MAR) for the 2015 dataset was improved from 56.86% to 73.95% (by SVM). A model updating approach was also applied, and the prediction MAR was significantly improved with increase in the updating rate. The best MAR (90.31%) was first achieved by the OC-SVM combined SVM classifier at a 50% rate.