Cargando…

Determination of the Geographical Origin of Coffee Beans Using Terahertz Spectroscopy Combined With Machine Learning Methods

Different geographical origins can lead to great variance in coffee quality, taste, and commercial value. Hence, controlling the authenticity of the origin of coffee beans is of great importance for producers and consumers worldwide. In this study, terahertz (THz) spectroscopy, combined with machine...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Si, Li, Chenxi, Mei, Yang, Liu, Wen, Liu, Rong, Chen, Wenliang, Han, Donghai, Xu, Kexin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8247636/
https://www.ncbi.nlm.nih.gov/pubmed/34222305
http://dx.doi.org/10.3389/fnut.2021.680627
_version_ 1783716557271072768
author Yang, Si
Li, Chenxi
Mei, Yang
Liu, Wen
Liu, Rong
Chen, Wenliang
Han, Donghai
Xu, Kexin
author_facet Yang, Si
Li, Chenxi
Mei, Yang
Liu, Wen
Liu, Rong
Chen, Wenliang
Han, Donghai
Xu, Kexin
author_sort Yang, Si
collection PubMed
description Different geographical origins can lead to great variance in coffee quality, taste, and commercial value. Hence, controlling the authenticity of the origin of coffee beans is of great importance for producers and consumers worldwide. In this study, terahertz (THz) spectroscopy, combined with machine learning methods, was investigated as a fast and non-destructive method to classify the geographic origin of coffee beans, comparing it with the popular machine learning methods, including convolutional neural network (CNN), linear discriminant analysis (LDA), and support vector machine (SVM) to obtain the best model. The curse of dimensionality will cause some classification methods which are struggling to train effective models. Thus, principal component analysis (PCA) and genetic algorithm (GA) were applied for LDA and SVM to create a smaller set of features. The first nine principal components (PCs) with an accumulative contribution rate of 99.9% extracted by PCA and 21 variables selected by GA were the inputs of LDA and SVM models. The results demonstrate that the excellent classification (accuracy was 90% in a prediction set) could be achieved using a CNN method. The results also indicate variable selecting as an important step to create an accurate and robust discrimination model. The performances of LDA and SVM algorithms could be improved with spectral features extracted by PCA and GA. The GA-SVM has achieved 75% accuracy in a prediction set, while the SVM and PCA-SVM have achieved 50 and 65% accuracy, respectively. These results demonstrate that THz spectroscopy, together with machine learning methods, is an effective and satisfactory approach for classifying geographical origins of coffee beans, suggesting the techniques to tap the potential application of deep learning in the authenticity of agricultural products while expanding the application of THz spectroscopy.
format Online
Article
Text
id pubmed-8247636
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-82476362021-07-02 Determination of the Geographical Origin of Coffee Beans Using Terahertz Spectroscopy Combined With Machine Learning Methods Yang, Si Li, Chenxi Mei, Yang Liu, Wen Liu, Rong Chen, Wenliang Han, Donghai Xu, Kexin Front Nutr Nutrition Different geographical origins can lead to great variance in coffee quality, taste, and commercial value. Hence, controlling the authenticity of the origin of coffee beans is of great importance for producers and consumers worldwide. In this study, terahertz (THz) spectroscopy, combined with machine learning methods, was investigated as a fast and non-destructive method to classify the geographic origin of coffee beans, comparing it with the popular machine learning methods, including convolutional neural network (CNN), linear discriminant analysis (LDA), and support vector machine (SVM) to obtain the best model. The curse of dimensionality will cause some classification methods which are struggling to train effective models. Thus, principal component analysis (PCA) and genetic algorithm (GA) were applied for LDA and SVM to create a smaller set of features. The first nine principal components (PCs) with an accumulative contribution rate of 99.9% extracted by PCA and 21 variables selected by GA were the inputs of LDA and SVM models. The results demonstrate that the excellent classification (accuracy was 90% in a prediction set) could be achieved using a CNN method. The results also indicate variable selecting as an important step to create an accurate and robust discrimination model. The performances of LDA and SVM algorithms could be improved with spectral features extracted by PCA and GA. The GA-SVM has achieved 75% accuracy in a prediction set, while the SVM and PCA-SVM have achieved 50 and 65% accuracy, respectively. These results demonstrate that THz spectroscopy, together with machine learning methods, is an effective and satisfactory approach for classifying geographical origins of coffee beans, suggesting the techniques to tap the potential application of deep learning in the authenticity of agricultural products while expanding the application of THz spectroscopy. Frontiers Media S.A. 2021-06-17 /pmc/articles/PMC8247636/ /pubmed/34222305 http://dx.doi.org/10.3389/fnut.2021.680627 Text en Copyright © 2021 Yang, Li, Mei, Liu, Liu, Chen, Han and Xu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Nutrition
Yang, Si
Li, Chenxi
Mei, Yang
Liu, Wen
Liu, Rong
Chen, Wenliang
Han, Donghai
Xu, Kexin
Determination of the Geographical Origin of Coffee Beans Using Terahertz Spectroscopy Combined With Machine Learning Methods
title Determination of the Geographical Origin of Coffee Beans Using Terahertz Spectroscopy Combined With Machine Learning Methods
title_full Determination of the Geographical Origin of Coffee Beans Using Terahertz Spectroscopy Combined With Machine Learning Methods
title_fullStr Determination of the Geographical Origin of Coffee Beans Using Terahertz Spectroscopy Combined With Machine Learning Methods
title_full_unstemmed Determination of the Geographical Origin of Coffee Beans Using Terahertz Spectroscopy Combined With Machine Learning Methods
title_short Determination of the Geographical Origin of Coffee Beans Using Terahertz Spectroscopy Combined With Machine Learning Methods
title_sort determination of the geographical origin of coffee beans using terahertz spectroscopy combined with machine learning methods
topic Nutrition
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8247636/
https://www.ncbi.nlm.nih.gov/pubmed/34222305
http://dx.doi.org/10.3389/fnut.2021.680627
work_keys_str_mv AT yangsi determinationofthegeographicaloriginofcoffeebeansusingterahertzspectroscopycombinedwithmachinelearningmethods
AT lichenxi determinationofthegeographicaloriginofcoffeebeansusingterahertzspectroscopycombinedwithmachinelearningmethods
AT meiyang determinationofthegeographicaloriginofcoffeebeansusingterahertzspectroscopycombinedwithmachinelearningmethods
AT liuwen determinationofthegeographicaloriginofcoffeebeansusingterahertzspectroscopycombinedwithmachinelearningmethods
AT liurong determinationofthegeographicaloriginofcoffeebeansusingterahertzspectroscopycombinedwithmachinelearningmethods
AT chenwenliang determinationofthegeographicaloriginofcoffeebeansusingterahertzspectroscopycombinedwithmachinelearningmethods
AT handonghai determinationofthegeographicaloriginofcoffeebeansusingterahertzspectroscopycombinedwithmachinelearningmethods
AT xukexin determinationofthegeographicaloriginofcoffeebeansusingterahertzspectroscopycombinedwithmachinelearningmethods