Cargando…

Synchronously Predicting Tea Polyphenol and Epigallocatechin Gallate in Tea Leaves Using Fourier Transform–Near-Infrared Spectroscopy and Machine Learning

Tea polyphenol and epigallocatechin gallate (EGCG) were considered as key components of tea. The rapid prediction of these two components can be beneficial for tea quality control and product development for tea producers, breeders and consumers. This study aimed to develop reliable models for tea p...

Descripción completa

Detalles Bibliográficos
Autores principales: Ye, Sitan, Weng, Haiyong, Xiang, Lirong, Jia, Liangquan, Xu, Jinchai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10384235/
https://www.ncbi.nlm.nih.gov/pubmed/37513250
http://dx.doi.org/10.3390/molecules28145379
_version_ 1785081107804323840
author Ye, Sitan
Weng, Haiyong
Xiang, Lirong
Jia, Liangquan
Xu, Jinchai
author_facet Ye, Sitan
Weng, Haiyong
Xiang, Lirong
Jia, Liangquan
Xu, Jinchai
author_sort Ye, Sitan
collection PubMed
description Tea polyphenol and epigallocatechin gallate (EGCG) were considered as key components of tea. The rapid prediction of these two components can be beneficial for tea quality control and product development for tea producers, breeders and consumers. This study aimed to develop reliable models for tea polyphenols and EGCG content prediction during the breeding process using Fourier Transform–near infrared (FT-NIR) spectroscopy combined with machine learning algorithms. Various spectral preprocessing methods including Savitzky–Golay smoothing (SG), standard normal variate (SNV), vector normalization (VN), multiplicative scatter correction (MSC) and first derivative (FD) were applied to improve the quality of the collected spectra. Partial least squares regression (PLSR) and least squares support vector regression (LS-SVR) were introduced to establish models for tea polyphenol and EGCG content prediction based on different preprocessed spectral data. Variable selection algorithms, including competitive adaptive reweighted sampling (CARS) and random forest (RF), were further utilized to identify key spectral bands to improve the efficiency of the models. The results demonstrate that the optimal model for tea polyphenols calibration was the LS-SVR with R(p) = 0.975 and RPD = 4.540 based on SG-smoothed full spectra. For EGCG detection, the best model was the LS-SVR with R(p) = 0.936 and RPD = 2.841 using full original spectra as model inputs. The application of variable selection algorithms further improved the predictive performance of the models. The LS-SVR model for tea polyphenols prediction with R(p) = 0.978 and RPD = 4.833 used 30 CARS-selected variables, while the LS-SVR model build on 27 RF-selected variables achieved the best predictive ability with R(p) = 0.944 and RPD = 3.049, respectively, for EGCG prediction. The results demonstrate a potential of FT-NIR spectroscopy combined with machine learning for the rapid screening of genotypes with high tea polyphenol and EGCG content in tea leaves.
format Online
Article
Text
id pubmed-10384235
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-103842352023-07-30 Synchronously Predicting Tea Polyphenol and Epigallocatechin Gallate in Tea Leaves Using Fourier Transform–Near-Infrared Spectroscopy and Machine Learning Ye, Sitan Weng, Haiyong Xiang, Lirong Jia, Liangquan Xu, Jinchai Molecules Article Tea polyphenol and epigallocatechin gallate (EGCG) were considered as key components of tea. The rapid prediction of these two components can be beneficial for tea quality control and product development for tea producers, breeders and consumers. This study aimed to develop reliable models for tea polyphenols and EGCG content prediction during the breeding process using Fourier Transform–near infrared (FT-NIR) spectroscopy combined with machine learning algorithms. Various spectral preprocessing methods including Savitzky–Golay smoothing (SG), standard normal variate (SNV), vector normalization (VN), multiplicative scatter correction (MSC) and first derivative (FD) were applied to improve the quality of the collected spectra. Partial least squares regression (PLSR) and least squares support vector regression (LS-SVR) were introduced to establish models for tea polyphenol and EGCG content prediction based on different preprocessed spectral data. Variable selection algorithms, including competitive adaptive reweighted sampling (CARS) and random forest (RF), were further utilized to identify key spectral bands to improve the efficiency of the models. The results demonstrate that the optimal model for tea polyphenols calibration was the LS-SVR with R(p) = 0.975 and RPD = 4.540 based on SG-smoothed full spectra. For EGCG detection, the best model was the LS-SVR with R(p) = 0.936 and RPD = 2.841 using full original spectra as model inputs. The application of variable selection algorithms further improved the predictive performance of the models. The LS-SVR model for tea polyphenols prediction with R(p) = 0.978 and RPD = 4.833 used 30 CARS-selected variables, while the LS-SVR model build on 27 RF-selected variables achieved the best predictive ability with R(p) = 0.944 and RPD = 3.049, respectively, for EGCG prediction. The results demonstrate a potential of FT-NIR spectroscopy combined with machine learning for the rapid screening of genotypes with high tea polyphenol and EGCG content in tea leaves. MDPI 2023-07-13 /pmc/articles/PMC10384235/ /pubmed/37513250 http://dx.doi.org/10.3390/molecules28145379 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ye, Sitan
Weng, Haiyong
Xiang, Lirong
Jia, Liangquan
Xu, Jinchai
Synchronously Predicting Tea Polyphenol and Epigallocatechin Gallate in Tea Leaves Using Fourier Transform–Near-Infrared Spectroscopy and Machine Learning
title Synchronously Predicting Tea Polyphenol and Epigallocatechin Gallate in Tea Leaves Using Fourier Transform–Near-Infrared Spectroscopy and Machine Learning
title_full Synchronously Predicting Tea Polyphenol and Epigallocatechin Gallate in Tea Leaves Using Fourier Transform–Near-Infrared Spectroscopy and Machine Learning
title_fullStr Synchronously Predicting Tea Polyphenol and Epigallocatechin Gallate in Tea Leaves Using Fourier Transform–Near-Infrared Spectroscopy and Machine Learning
title_full_unstemmed Synchronously Predicting Tea Polyphenol and Epigallocatechin Gallate in Tea Leaves Using Fourier Transform–Near-Infrared Spectroscopy and Machine Learning
title_short Synchronously Predicting Tea Polyphenol and Epigallocatechin Gallate in Tea Leaves Using Fourier Transform–Near-Infrared Spectroscopy and Machine Learning
title_sort synchronously predicting tea polyphenol and epigallocatechin gallate in tea leaves using fourier transform–near-infrared spectroscopy and machine learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10384235/
https://www.ncbi.nlm.nih.gov/pubmed/37513250
http://dx.doi.org/10.3390/molecules28145379
work_keys_str_mv AT yesitan synchronouslypredictingteapolyphenolandepigallocatechingallateintealeavesusingfouriertransformnearinfraredspectroscopyandmachinelearning
AT wenghaiyong synchronouslypredictingteapolyphenolandepigallocatechingallateintealeavesusingfouriertransformnearinfraredspectroscopyandmachinelearning
AT xianglirong synchronouslypredictingteapolyphenolandepigallocatechingallateintealeavesusingfouriertransformnearinfraredspectroscopyandmachinelearning
AT jialiangquan synchronouslypredictingteapolyphenolandepigallocatechingallateintealeavesusingfouriertransformnearinfraredspectroscopyandmachinelearning
AT xujinchai synchronouslypredictingteapolyphenolandepigallocatechingallateintealeavesusingfouriertransformnearinfraredspectroscopyandmachinelearning