Cargando…

iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species

Promoters are consensus DNA sequences located near the transcription start sites and they play an important role in transcription initiation. Due to their importance in biological processes, the identification of promoters is significantly important for characterizing the expression of the genes. Nu...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Pengyu, Zhang, Hongming, Wu, Hao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9561371/
https://www.ncbi.nlm.nih.gov/pubmed/36161334
http://dx.doi.org/10.1093/nar/gkac824
_version_ 1784807938237399040
author Zhang, Pengyu
Zhang, Hongming
Wu, Hao
author_facet Zhang, Pengyu
Zhang, Hongming
Wu, Hao
author_sort Zhang, Pengyu
collection PubMed
description Promoters are consensus DNA sequences located near the transcription start sites and they play an important role in transcription initiation. Due to their importance in biological processes, the identification of promoters is significantly important for characterizing the expression of the genes. Numerous computational methods have been proposed to predict promoters. However, it is difficult for these methods to achieve satisfactory performance in multiple species. In this study, we propose a novel weighted average ensemble learning model, termed iPro-WAEL, for identifying promoters in multiple species, including Human, Mouse, E.coli, Arabidopsis, B.amyloliquefaciens, B.subtilis and R.capsulatus. Extensive benchmarking experiments illustrate that iPro-WAEL has optimal performance and is superior to the current methods in promoter prediction. The experimental results also demonstrate a satisfactory prediction ability of iPro-WAEL on cross-cell lines, promoters annotated by other methods and distinguishing between promoters and enhancers. Moreover, we identify the most important transcription factor binding site (TFBS) motif in promoter regions to facilitate the study of identifying important motifs in the promoter regions. The source code of iPro-WAEL is freely available at https://github.com/HaoWuLab-Bioinformatics/iPro-WAEL.
format Online
Article
Text
id pubmed-9561371
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-95613712022-10-18 iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species Zhang, Pengyu Zhang, Hongming Wu, Hao Nucleic Acids Res Computational Biology Promoters are consensus DNA sequences located near the transcription start sites and they play an important role in transcription initiation. Due to their importance in biological processes, the identification of promoters is significantly important for characterizing the expression of the genes. Numerous computational methods have been proposed to predict promoters. However, it is difficult for these methods to achieve satisfactory performance in multiple species. In this study, we propose a novel weighted average ensemble learning model, termed iPro-WAEL, for identifying promoters in multiple species, including Human, Mouse, E.coli, Arabidopsis, B.amyloliquefaciens, B.subtilis and R.capsulatus. Extensive benchmarking experiments illustrate that iPro-WAEL has optimal performance and is superior to the current methods in promoter prediction. The experimental results also demonstrate a satisfactory prediction ability of iPro-WAEL on cross-cell lines, promoters annotated by other methods and distinguishing between promoters and enhancers. Moreover, we identify the most important transcription factor binding site (TFBS) motif in promoter regions to facilitate the study of identifying important motifs in the promoter regions. The source code of iPro-WAEL is freely available at https://github.com/HaoWuLab-Bioinformatics/iPro-WAEL. Oxford University Press 2022-09-26 /pmc/articles/PMC9561371/ /pubmed/36161334 http://dx.doi.org/10.1093/nar/gkac824 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Computational Biology
Zhang, Pengyu
Zhang, Hongming
Wu, Hao
iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species
title iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species
title_full iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species
title_fullStr iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species
title_full_unstemmed iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species
title_short iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species
title_sort ipro-wael: a comprehensive and robust framework for identifying promoters in multiple species
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9561371/
https://www.ncbi.nlm.nih.gov/pubmed/36161334
http://dx.doi.org/10.1093/nar/gkac824
work_keys_str_mv AT zhangpengyu iprowaelacomprehensiveandrobustframeworkforidentifyingpromotersinmultiplespecies
AT zhanghongming iprowaelacomprehensiveandrobustframeworkforidentifyingpromotersinmultiplespecies
AT wuhao iprowaelacomprehensiveandrobustframeworkforidentifyingpromotersinmultiplespecies