Cargando…

EVMP: enhancing machine learning models for synthetic promoter strength prediction by Extended Vision Mutant Priority framework

INTRODUCTION: In metabolic engineering and synthetic biology applications, promoters with appropriate strengths are critical. However, it is time-consuming and laborious to annotate promoter strength by experiments. Nowadays, constructing mutation-based synthetic promoter libraries that span multipl...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Weiqin, Li, Dexin, Huang, Ranran
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10354429/
https://www.ncbi.nlm.nih.gov/pubmed/37476664
http://dx.doi.org/10.3389/fmicb.2023.1215609
_version_ 1785074926745550848
author Yang, Weiqin
Li, Dexin
Huang, Ranran
author_facet Yang, Weiqin
Li, Dexin
Huang, Ranran
author_sort Yang, Weiqin
collection PubMed
description INTRODUCTION: In metabolic engineering and synthetic biology applications, promoters with appropriate strengths are critical. However, it is time-consuming and laborious to annotate promoter strength by experiments. Nowadays, constructing mutation-based synthetic promoter libraries that span multiple orders of magnitude of promoter strength is receiving increasing attention. A number of machine learning (ML) methods are applied to synthetic promoter strength prediction, but existing models are limited by the excessive proximity between synthetic promoters. METHODS: In order to enhance ML models to better predict the synthetic promoter strength, we propose EVMP(Extended Vision Mutant Priority), a universal framework which utilize mutation information more effectively. In EVMP, synthetic promoters are equivalently transformed into base promoter and corresponding k-mer mutations, which are input into BaseEncoder and VarEncoder, respectively. EVMP also provides optional data augmentation, which generates multiple copies of the data by selecting different base promoters for the same synthetic promoter. RESULTS: In Trc synthetic promoter library, EVMP was applied to multiple ML models and the model effect was enhanced to varying extents, up to 61.30% (MAE), while the SOTA(state-of-the-art) record was improved by 15.25% (MAE) and 4.03% (R(2)). Data augmentation based on multiple base promoters further improved the model performance by 17.95% (MAE) and 7.25% (R(2)) compared with non-EVMP SOTA record. DISCUSSION: In further study, extended vision (or k-mer) is shown to be essential for EVMP. We also found that EVMP can alleviate the over-smoothing phenomenon, which may contributes to its effectiveness. Our work suggests that EVMP can highlight the mutation information of synthetic promoters and significantly improve the prediction accuracy of strength. The source code is publicly available on GitHub: https://github.com/Tiny-Snow/EVMP.
format Online
Article
Text
id pubmed-10354429
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-103544292023-07-20 EVMP: enhancing machine learning models for synthetic promoter strength prediction by Extended Vision Mutant Priority framework Yang, Weiqin Li, Dexin Huang, Ranran Front Microbiol Microbiology INTRODUCTION: In metabolic engineering and synthetic biology applications, promoters with appropriate strengths are critical. However, it is time-consuming and laborious to annotate promoter strength by experiments. Nowadays, constructing mutation-based synthetic promoter libraries that span multiple orders of magnitude of promoter strength is receiving increasing attention. A number of machine learning (ML) methods are applied to synthetic promoter strength prediction, but existing models are limited by the excessive proximity between synthetic promoters. METHODS: In order to enhance ML models to better predict the synthetic promoter strength, we propose EVMP(Extended Vision Mutant Priority), a universal framework which utilize mutation information more effectively. In EVMP, synthetic promoters are equivalently transformed into base promoter and corresponding k-mer mutations, which are input into BaseEncoder and VarEncoder, respectively. EVMP also provides optional data augmentation, which generates multiple copies of the data by selecting different base promoters for the same synthetic promoter. RESULTS: In Trc synthetic promoter library, EVMP was applied to multiple ML models and the model effect was enhanced to varying extents, up to 61.30% (MAE), while the SOTA(state-of-the-art) record was improved by 15.25% (MAE) and 4.03% (R(2)). Data augmentation based on multiple base promoters further improved the model performance by 17.95% (MAE) and 7.25% (R(2)) compared with non-EVMP SOTA record. DISCUSSION: In further study, extended vision (or k-mer) is shown to be essential for EVMP. We also found that EVMP can alleviate the over-smoothing phenomenon, which may contributes to its effectiveness. Our work suggests that EVMP can highlight the mutation information of synthetic promoters and significantly improve the prediction accuracy of strength. The source code is publicly available on GitHub: https://github.com/Tiny-Snow/EVMP. Frontiers Media S.A. 2023-07-05 /pmc/articles/PMC10354429/ /pubmed/37476664 http://dx.doi.org/10.3389/fmicb.2023.1215609 Text en Copyright © 2023 Yang, Li and Huang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Yang, Weiqin
Li, Dexin
Huang, Ranran
EVMP: enhancing machine learning models for synthetic promoter strength prediction by Extended Vision Mutant Priority framework
title EVMP: enhancing machine learning models for synthetic promoter strength prediction by Extended Vision Mutant Priority framework
title_full EVMP: enhancing machine learning models for synthetic promoter strength prediction by Extended Vision Mutant Priority framework
title_fullStr EVMP: enhancing machine learning models for synthetic promoter strength prediction by Extended Vision Mutant Priority framework
title_full_unstemmed EVMP: enhancing machine learning models for synthetic promoter strength prediction by Extended Vision Mutant Priority framework
title_short EVMP: enhancing machine learning models for synthetic promoter strength prediction by Extended Vision Mutant Priority framework
title_sort evmp: enhancing machine learning models for synthetic promoter strength prediction by extended vision mutant priority framework
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10354429/
https://www.ncbi.nlm.nih.gov/pubmed/37476664
http://dx.doi.org/10.3389/fmicb.2023.1215609
work_keys_str_mv AT yangweiqin evmpenhancingmachinelearningmodelsforsyntheticpromoterstrengthpredictionbyextendedvisionmutantpriorityframework
AT lidexin evmpenhancingmachinelearningmodelsforsyntheticpromoterstrengthpredictionbyextendedvisionmutantpriorityframework
AT huangranran evmpenhancingmachinelearningmodelsforsyntheticpromoterstrengthpredictionbyextendedvisionmutantpriorityframework