Cargando…

Insight to Gene Expression From Promoter Libraries With the Machine Learning Workflow Exp2Ipynb

Metabolic engineering relies on modifying gene expression to regulate protein concentrations and reaction activities. The gene expression is controlled by the promoter sequence, and sequence libraries are used to scan expression activities and to identify correlations between sequence and activity....

Descripción completa

Detalles Bibliográficos
Autores principales:	Liebal, Ulf W., Köbbing, Sebastian, Netze, Linus, Schweidtmann, Artur M., Mitsos, Alexander, Blank, Lars M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2021
Materias:	Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9581000/ https://www.ncbi.nlm.nih.gov/pubmed/36303772 http://dx.doi.org/10.3389/fbinf.2021.747428

_version_	1784812519623229440
author	Liebal, Ulf W. Köbbing, Sebastian Netze, Linus Schweidtmann, Artur M. Mitsos, Alexander Blank, Lars M.
author_facet	Liebal, Ulf W. Köbbing, Sebastian Netze, Linus Schweidtmann, Artur M. Mitsos, Alexander Blank, Lars M.
author_sort	Liebal, Ulf W.
collection	PubMed
description	Metabolic engineering relies on modifying gene expression to regulate protein concentrations and reaction activities. The gene expression is controlled by the promoter sequence, and sequence libraries are used to scan expression activities and to identify correlations between sequence and activity. We introduce a computational workflow called Exp2Ipynb to analyze promoter libraries maximizing information retrieval and promoter design with desired activity. We applied Exp2Ipynb to seven prokaryotic expression libraries to identify optimal experimental design principles. The workflow is open source, available as Jupyter Notebooks and covers the steps to 1) generate a statistical overview to sequence and activity, 2) train machine-learning algorithms, such as random forest, gradient boosting trees and support vector machines, for prediction and extraction of feature importance, 3) evaluate the performance of the estimator, and 4) to design new sequences with a desired activity using numerical optimization. The workflow can perform regression or classification on multiple promoter libraries, across species or reporter proteins. The most accurate predictions in the sample libraries were achieved when the promoters in the library were recognized by a single sigma factor and a unique reporter system. The prediction confidence mostly depends on sample size and sequence diversity, and we present a relationship to estimate their respective effects. The workflow can be adapted to process sequence libraries from other expression-related problems and increase insight to the growing application of high-throughput experiments, providing support for efficient strain engineering.
format	Online Article Text
id	pubmed-9581000
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-95810002022-10-26 Insight to Gene Expression From Promoter Libraries With the Machine Learning Workflow Exp2Ipynb Liebal, Ulf W. Köbbing, Sebastian Netze, Linus Schweidtmann, Artur M. Mitsos, Alexander Blank, Lars M. Front Bioinform Bioinformatics Metabolic engineering relies on modifying gene expression to regulate protein concentrations and reaction activities. The gene expression is controlled by the promoter sequence, and sequence libraries are used to scan expression activities and to identify correlations between sequence and activity. We introduce a computational workflow called Exp2Ipynb to analyze promoter libraries maximizing information retrieval and promoter design with desired activity. We applied Exp2Ipynb to seven prokaryotic expression libraries to identify optimal experimental design principles. The workflow is open source, available as Jupyter Notebooks and covers the steps to 1) generate a statistical overview to sequence and activity, 2) train machine-learning algorithms, such as random forest, gradient boosting trees and support vector machines, for prediction and extraction of feature importance, 3) evaluate the performance of the estimator, and 4) to design new sequences with a desired activity using numerical optimization. The workflow can perform regression or classification on multiple promoter libraries, across species or reporter proteins. The most accurate predictions in the sample libraries were achieved when the promoters in the library were recognized by a single sigma factor and a unique reporter system. The prediction confidence mostly depends on sample size and sequence diversity, and we present a relationship to estimate their respective effects. The workflow can be adapted to process sequence libraries from other expression-related problems and increase insight to the growing application of high-throughput experiments, providing support for efficient strain engineering. Frontiers Media S.A. 2021-10-14 /pmc/articles/PMC9581000/ /pubmed/36303772 http://dx.doi.org/10.3389/fbinf.2021.747428 Text en Copyright © 2021 Liebal, Köbbing, Netze, Schweidtmann, Mitsos and Blank. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Bioinformatics Liebal, Ulf W. Köbbing, Sebastian Netze, Linus Schweidtmann, Artur M. Mitsos, Alexander Blank, Lars M. Insight to Gene Expression From Promoter Libraries With the Machine Learning Workflow Exp2Ipynb
title	Insight to Gene Expression From Promoter Libraries With the Machine Learning Workflow Exp2Ipynb
title_full	Insight to Gene Expression From Promoter Libraries With the Machine Learning Workflow Exp2Ipynb
title_fullStr	Insight to Gene Expression From Promoter Libraries With the Machine Learning Workflow Exp2Ipynb
title_full_unstemmed	Insight to Gene Expression From Promoter Libraries With the Machine Learning Workflow Exp2Ipynb
title_short	Insight to Gene Expression From Promoter Libraries With the Machine Learning Workflow Exp2Ipynb
title_sort	insight to gene expression from promoter libraries with the machine learning workflow exp2ipynb
topic	Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9581000/ https://www.ncbi.nlm.nih.gov/pubmed/36303772 http://dx.doi.org/10.3389/fbinf.2021.747428
work_keys_str_mv	AT liebalulfw insighttogeneexpressionfrompromoterlibrarieswiththemachinelearningworkflowexp2ipynb AT kobbingsebastian insighttogeneexpressionfrompromoterlibrarieswiththemachinelearningworkflowexp2ipynb AT netzelinus insighttogeneexpressionfrompromoterlibrarieswiththemachinelearningworkflowexp2ipynb AT schweidtmannarturm insighttogeneexpressionfrompromoterlibrarieswiththemachinelearningworkflowexp2ipynb AT mitsosalexander insighttogeneexpressionfrompromoterlibrarieswiththemachinelearningworkflowexp2ipynb AT blanklarsm insighttogeneexpressionfrompromoterlibrarieswiththemachinelearningworkflowexp2ipynb

Insight to Gene Expression From Promoter Libraries With the Machine Learning Workflow Exp2Ipynb

Ejemplares similares