Cargando…

Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches

This work presents a reliable approach to trace teas' geographical origins despite changes in teas caused by different harvest years. A total of 1447 tea samples collected from various areas in 2014 (660 samples) and 2015 (787 samples) were detected by FT-NIR. Seven classifiers trained on the 2...

Descripción completa

Detalles Bibliográficos
Autores principales: Hong, Xue-Zhen, Fu, Xian-Shu, Wang, Zheng-Liang, Zhang, Li, Yu, Xiao-Ping, Ye, Zi-Hong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6335731/
https://www.ncbi.nlm.nih.gov/pubmed/30719371
http://dx.doi.org/10.1155/2019/1537568
_version_ 1783387948264194048
author Hong, Xue-Zhen
Fu, Xian-Shu
Wang, Zheng-Liang
Zhang, Li
Yu, Xiao-Ping
Ye, Zi-Hong
author_facet Hong, Xue-Zhen
Fu, Xian-Shu
Wang, Zheng-Liang
Zhang, Li
Yu, Xiao-Ping
Ye, Zi-Hong
author_sort Hong, Xue-Zhen
collection PubMed
description This work presents a reliable approach to trace teas' geographical origins despite changes in teas caused by different harvest years. A total of 1447 tea samples collected from various areas in 2014 (660 samples) and 2015 (787 samples) were detected by FT-NIR. Seven classifiers trained on the 2014 dataset all succeeded to trace origins of samples collected in 2014; however, they all failed to predict origins for the 2015 samples due to different data distributions and imbalanced dataset. Three outlier detection based undersampling approaches—one-class SVM (OC-SVM), isolation forest and elliptic envelope—were then proposed; as a result, the highest macro average recall (MAR) for the 2015 dataset was improved from 56.86% to 73.95% (by SVM). A model updating approach was also applied, and the prediction MAR was significantly improved with increase in the updating rate. The best MAR (90.31%) was first achieved by the OC-SVM combined SVM classifier at a 50% rate.
format Online
Article
Text
id pubmed-6335731
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-63357312019-02-04 Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches Hong, Xue-Zhen Fu, Xian-Shu Wang, Zheng-Liang Zhang, Li Yu, Xiao-Ping Ye, Zi-Hong J Anal Methods Chem Research Article This work presents a reliable approach to trace teas' geographical origins despite changes in teas caused by different harvest years. A total of 1447 tea samples collected from various areas in 2014 (660 samples) and 2015 (787 samples) were detected by FT-NIR. Seven classifiers trained on the 2014 dataset all succeeded to trace origins of samples collected in 2014; however, they all failed to predict origins for the 2015 samples due to different data distributions and imbalanced dataset. Three outlier detection based undersampling approaches—one-class SVM (OC-SVM), isolation forest and elliptic envelope—were then proposed; as a result, the highest macro average recall (MAR) for the 2015 dataset was improved from 56.86% to 73.95% (by SVM). A model updating approach was also applied, and the prediction MAR was significantly improved with increase in the updating rate. The best MAR (90.31%) was first achieved by the OC-SVM combined SVM classifier at a 50% rate. Hindawi 2019-01-03 /pmc/articles/PMC6335731/ /pubmed/30719371 http://dx.doi.org/10.1155/2019/1537568 Text en Copyright © 2019 Xue-Zhen Hong et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Hong, Xue-Zhen
Fu, Xian-Shu
Wang, Zheng-Liang
Zhang, Li
Yu, Xiao-Ping
Ye, Zi-Hong
Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches
title Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches
title_full Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches
title_fullStr Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches
title_full_unstemmed Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches
title_short Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches
title_sort tracing geographical origins of teas based on ft-nir spectroscopy: introduction of model updating and imbalanced data handling approaches
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6335731/
https://www.ncbi.nlm.nih.gov/pubmed/30719371
http://dx.doi.org/10.1155/2019/1537568
work_keys_str_mv AT hongxuezhen tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches
AT fuxianshu tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches
AT wangzhengliang tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches
AT zhangli tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches
AT yuxiaoping tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches
AT yezihong tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches