Cargando…
Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches
This work presents a reliable approach to trace teas' geographical origins despite changes in teas caused by different harvest years. A total of 1447 tea samples collected from various areas in 2014 (660 samples) and 2015 (787 samples) were detected by FT-NIR. Seven classifiers trained on the 2...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6335731/ https://www.ncbi.nlm.nih.gov/pubmed/30719371 http://dx.doi.org/10.1155/2019/1537568 |
_version_ | 1783387948264194048 |
---|---|
author | Hong, Xue-Zhen Fu, Xian-Shu Wang, Zheng-Liang Zhang, Li Yu, Xiao-Ping Ye, Zi-Hong |
author_facet | Hong, Xue-Zhen Fu, Xian-Shu Wang, Zheng-Liang Zhang, Li Yu, Xiao-Ping Ye, Zi-Hong |
author_sort | Hong, Xue-Zhen |
collection | PubMed |
description | This work presents a reliable approach to trace teas' geographical origins despite changes in teas caused by different harvest years. A total of 1447 tea samples collected from various areas in 2014 (660 samples) and 2015 (787 samples) were detected by FT-NIR. Seven classifiers trained on the 2014 dataset all succeeded to trace origins of samples collected in 2014; however, they all failed to predict origins for the 2015 samples due to different data distributions and imbalanced dataset. Three outlier detection based undersampling approaches—one-class SVM (OC-SVM), isolation forest and elliptic envelope—were then proposed; as a result, the highest macro average recall (MAR) for the 2015 dataset was improved from 56.86% to 73.95% (by SVM). A model updating approach was also applied, and the prediction MAR was significantly improved with increase in the updating rate. The best MAR (90.31%) was first achieved by the OC-SVM combined SVM classifier at a 50% rate. |
format | Online Article Text |
id | pubmed-6335731 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-63357312019-02-04 Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches Hong, Xue-Zhen Fu, Xian-Shu Wang, Zheng-Liang Zhang, Li Yu, Xiao-Ping Ye, Zi-Hong J Anal Methods Chem Research Article This work presents a reliable approach to trace teas' geographical origins despite changes in teas caused by different harvest years. A total of 1447 tea samples collected from various areas in 2014 (660 samples) and 2015 (787 samples) were detected by FT-NIR. Seven classifiers trained on the 2014 dataset all succeeded to trace origins of samples collected in 2014; however, they all failed to predict origins for the 2015 samples due to different data distributions and imbalanced dataset. Three outlier detection based undersampling approaches—one-class SVM (OC-SVM), isolation forest and elliptic envelope—were then proposed; as a result, the highest macro average recall (MAR) for the 2015 dataset was improved from 56.86% to 73.95% (by SVM). A model updating approach was also applied, and the prediction MAR was significantly improved with increase in the updating rate. The best MAR (90.31%) was first achieved by the OC-SVM combined SVM classifier at a 50% rate. Hindawi 2019-01-03 /pmc/articles/PMC6335731/ /pubmed/30719371 http://dx.doi.org/10.1155/2019/1537568 Text en Copyright © 2019 Xue-Zhen Hong et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Hong, Xue-Zhen Fu, Xian-Shu Wang, Zheng-Liang Zhang, Li Yu, Xiao-Ping Ye, Zi-Hong Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches |
title | Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches |
title_full | Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches |
title_fullStr | Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches |
title_full_unstemmed | Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches |
title_short | Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches |
title_sort | tracing geographical origins of teas based on ft-nir spectroscopy: introduction of model updating and imbalanced data handling approaches |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6335731/ https://www.ncbi.nlm.nih.gov/pubmed/30719371 http://dx.doi.org/10.1155/2019/1537568 |
work_keys_str_mv | AT hongxuezhen tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches AT fuxianshu tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches AT wangzhengliang tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches AT zhangli tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches AT yuxiaoping tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches AT yezihong tracinggeographicaloriginsofteasbasedonftnirspectroscopyintroductionofmodelupdatingandimbalanceddatahandlingapproaches |