Cargando…

Data-driven interpretable analysis for polysaccharide yield prediction

Cornstalks show promise as a raw material for polysaccharide production through xylanase. Rapid and accurate prediction of polysaccharide yield can facilitate process optimization, eliminating the need for extensive experimentation in actual production to refine reaction conditions, thereby saving t...

Descripción completa

Detalles Bibliográficos
Autores principales: Tian, Yushi, Yang, Xu, Chen, Nianhua, Li, Chunyan, Yang, Wulin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10661693/
https://www.ncbi.nlm.nih.gov/pubmed/38021368
http://dx.doi.org/10.1016/j.ese.2023.100321
_version_ 1785138033215930368
author Tian, Yushi
Yang, Xu
Chen, Nianhua
Li, Chunyan
Yang, Wulin
author_facet Tian, Yushi
Yang, Xu
Chen, Nianhua
Li, Chunyan
Yang, Wulin
author_sort Tian, Yushi
collection PubMed
description Cornstalks show promise as a raw material for polysaccharide production through xylanase. Rapid and accurate prediction of polysaccharide yield can facilitate process optimization, eliminating the need for extensive experimentation in actual production to refine reaction conditions, thereby saving time and costs. However, the intricate interplay of enzymatic factors poses challenges in predicting and optimizing polysaccharide yield accurately. Here, we introduce an innovative data-driven approach leveraging multiple artificial intelligence techniques to enhance polysaccharide production. We propose a machine learning framework to identify highly accurate polysaccharide yield prediction modeling methods and uncover optimal enzymatic parameter combinations. Notably, Random Forest (RF) and eXtreme Gradient Boost (XGB) demonstrate robust performance, achieving prediction accuracies of 93.0% and 95.6%, respectively, while an independently developed deep neural network (DNN) model achieves 91.1% accuracy. A feature importance analysis of XGB reveals the enzyme solution volume's dominant role (43.7%), followed by time (20.7%), substrate concentration (15%), temperature (15%), and pH (5.6%). Further interpretability analysis unveils complex parameter interactions and potential optimization strategies. This data-driven approach, incorporating machine learning, deep learning, and interpretable analysis, offers a viable pathway for polysaccharide yield prediction and the potential recovery of various agricultural residues.
format Online
Article
Text
id pubmed-10661693
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-106616932023-09-27 Data-driven interpretable analysis for polysaccharide yield prediction Tian, Yushi Yang, Xu Chen, Nianhua Li, Chunyan Yang, Wulin Environ Sci Ecotechnol Original Research Cornstalks show promise as a raw material for polysaccharide production through xylanase. Rapid and accurate prediction of polysaccharide yield can facilitate process optimization, eliminating the need for extensive experimentation in actual production to refine reaction conditions, thereby saving time and costs. However, the intricate interplay of enzymatic factors poses challenges in predicting and optimizing polysaccharide yield accurately. Here, we introduce an innovative data-driven approach leveraging multiple artificial intelligence techniques to enhance polysaccharide production. We propose a machine learning framework to identify highly accurate polysaccharide yield prediction modeling methods and uncover optimal enzymatic parameter combinations. Notably, Random Forest (RF) and eXtreme Gradient Boost (XGB) demonstrate robust performance, achieving prediction accuracies of 93.0% and 95.6%, respectively, while an independently developed deep neural network (DNN) model achieves 91.1% accuracy. A feature importance analysis of XGB reveals the enzyme solution volume's dominant role (43.7%), followed by time (20.7%), substrate concentration (15%), temperature (15%), and pH (5.6%). Further interpretability analysis unveils complex parameter interactions and potential optimization strategies. This data-driven approach, incorporating machine learning, deep learning, and interpretable analysis, offers a viable pathway for polysaccharide yield prediction and the potential recovery of various agricultural residues. Elsevier 2023-09-27 /pmc/articles/PMC10661693/ /pubmed/38021368 http://dx.doi.org/10.1016/j.ese.2023.100321 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Original Research
Tian, Yushi
Yang, Xu
Chen, Nianhua
Li, Chunyan
Yang, Wulin
Data-driven interpretable analysis for polysaccharide yield prediction
title Data-driven interpretable analysis for polysaccharide yield prediction
title_full Data-driven interpretable analysis for polysaccharide yield prediction
title_fullStr Data-driven interpretable analysis for polysaccharide yield prediction
title_full_unstemmed Data-driven interpretable analysis for polysaccharide yield prediction
title_short Data-driven interpretable analysis for polysaccharide yield prediction
title_sort data-driven interpretable analysis for polysaccharide yield prediction
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10661693/
https://www.ncbi.nlm.nih.gov/pubmed/38021368
http://dx.doi.org/10.1016/j.ese.2023.100321
work_keys_str_mv AT tianyushi datadriveninterpretableanalysisforpolysaccharideyieldprediction
AT yangxu datadriveninterpretableanalysisforpolysaccharideyieldprediction
AT chennianhua datadriveninterpretableanalysisforpolysaccharideyieldprediction
AT lichunyan datadriveninterpretableanalysisforpolysaccharideyieldprediction
AT yangwulin datadriveninterpretableanalysisforpolysaccharideyieldprediction