Cargando…

Machine learning framework for assessment of microbial factory performance

Metabolic models can estimate intrinsic product yields for microbial factories, but such frameworks struggle to predict cell performance (including product titer or rate) under suboptimal metabolism and complex bioprocess conditions. On the other hand, machine learning, complementary to metabolic mo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Oyetunde, Tolutola, Liu, Di, Martin, Hector Garcia, Tang, Yinjie J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2019
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6333410/ https://www.ncbi.nlm.nih.gov/pubmed/30645629 http://dx.doi.org/10.1371/journal.pone.0210558

_version_	1783387560254373888
author	Oyetunde, Tolutola Liu, Di Martin, Hector Garcia Tang, Yinjie J.
author_facet	Oyetunde, Tolutola Liu, Di Martin, Hector Garcia Tang, Yinjie J.
author_sort	Oyetunde, Tolutola
collection	PubMed
description	Metabolic models can estimate intrinsic product yields for microbial factories, but such frameworks struggle to predict cell performance (including product titer or rate) under suboptimal metabolism and complex bioprocess conditions. On the other hand, machine learning, complementary to metabolic modeling necessitates large amounts of data. Building such a database for metabolic engineering designs requires significant manpower and is prone to human errors and bias. We propose an approach to integrate data-driven methods with genome scale metabolic model for assessment of microbial bio-production (yield, titer and rate). Using engineered E. coli as an example, we manually extracted and curated a data set comprising about 1200 experimentally realized cell factories from ~100 papers. We furthermore augmented the key design features (e.g., genetic modifications and bioprocess variables) extracted from literature with additional features derived from running the genome-scale metabolic model iML1515 simulations with constraints that match the experimental data. Then, data augmentation and ensemble learning (e.g., support vector machines, gradient boosted trees, and neural networks in a stacked regressor model) are employed to alleviate the challenges of sparse, non-standardized, and incomplete data sets, while multiple correspondence analysis/principal component analysis are used to rank influential factors on bio-production. The hybrid framework demonstrates a reasonably high cross-validation accuracy for prediction of E.coli factory performance metrics under presumed bioprocess and pathway conditions (Pearson correlation coefficients between 0.8 and 0.93 on new data not seen by the model).
format	Online Article Text
id	pubmed-6333410
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-63334102019-01-31 Machine learning framework for assessment of microbial factory performance Oyetunde, Tolutola Liu, Di Martin, Hector Garcia Tang, Yinjie J. PLoS One Research Article Metabolic models can estimate intrinsic product yields for microbial factories, but such frameworks struggle to predict cell performance (including product titer or rate) under suboptimal metabolism and complex bioprocess conditions. On the other hand, machine learning, complementary to metabolic modeling necessitates large amounts of data. Building such a database for metabolic engineering designs requires significant manpower and is prone to human errors and bias. We propose an approach to integrate data-driven methods with genome scale metabolic model for assessment of microbial bio-production (yield, titer and rate). Using engineered E. coli as an example, we manually extracted and curated a data set comprising about 1200 experimentally realized cell factories from ~100 papers. We furthermore augmented the key design features (e.g., genetic modifications and bioprocess variables) extracted from literature with additional features derived from running the genome-scale metabolic model iML1515 simulations with constraints that match the experimental data. Then, data augmentation and ensemble learning (e.g., support vector machines, gradient boosted trees, and neural networks in a stacked regressor model) are employed to alleviate the challenges of sparse, non-standardized, and incomplete data sets, while multiple correspondence analysis/principal component analysis are used to rank influential factors on bio-production. The hybrid framework demonstrates a reasonably high cross-validation accuracy for prediction of E.coli factory performance metrics under presumed bioprocess and pathway conditions (Pearson correlation coefficients between 0.8 and 0.93 on new data not seen by the model). Public Library of Science 2019-01-15 /pmc/articles/PMC6333410/ /pubmed/30645629 http://dx.doi.org/10.1371/journal.pone.0210558 Text en © 2019 Oyetunde et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Oyetunde, Tolutola Liu, Di Martin, Hector Garcia Tang, Yinjie J. Machine learning framework for assessment of microbial factory performance
title	Machine learning framework for assessment of microbial factory performance
title_full	Machine learning framework for assessment of microbial factory performance
title_fullStr	Machine learning framework for assessment of microbial factory performance
title_full_unstemmed	Machine learning framework for assessment of microbial factory performance
title_short	Machine learning framework for assessment of microbial factory performance
title_sort	machine learning framework for assessment of microbial factory performance
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6333410/ https://www.ncbi.nlm.nih.gov/pubmed/30645629 http://dx.doi.org/10.1371/journal.pone.0210558
work_keys_str_mv	AT oyetundetolutola machinelearningframeworkforassessmentofmicrobialfactoryperformance AT liudi machinelearningframeworkforassessmentofmicrobialfactoryperformance AT martinhectorgarcia machinelearningframeworkforassessmentofmicrobialfactoryperformance AT tangyinjiej machinelearningframeworkforassessmentofmicrobialfactoryperformance

Machine learning framework for assessment of microbial factory performance

Ejemplares similares