Cargando…
iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species
Promoters are consensus DNA sequences located near the transcription start sites and they play an important role in transcription initiation. Due to their importance in biological processes, the identification of promoters is significantly important for characterizing the expression of the genes. Nu...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9561371/ https://www.ncbi.nlm.nih.gov/pubmed/36161334 http://dx.doi.org/10.1093/nar/gkac824 |
_version_ | 1784807938237399040 |
---|---|
author | Zhang, Pengyu Zhang, Hongming Wu, Hao |
author_facet | Zhang, Pengyu Zhang, Hongming Wu, Hao |
author_sort | Zhang, Pengyu |
collection | PubMed |
description | Promoters are consensus DNA sequences located near the transcription start sites and they play an important role in transcription initiation. Due to their importance in biological processes, the identification of promoters is significantly important for characterizing the expression of the genes. Numerous computational methods have been proposed to predict promoters. However, it is difficult for these methods to achieve satisfactory performance in multiple species. In this study, we propose a novel weighted average ensemble learning model, termed iPro-WAEL, for identifying promoters in multiple species, including Human, Mouse, E.coli, Arabidopsis, B.amyloliquefaciens, B.subtilis and R.capsulatus. Extensive benchmarking experiments illustrate that iPro-WAEL has optimal performance and is superior to the current methods in promoter prediction. The experimental results also demonstrate a satisfactory prediction ability of iPro-WAEL on cross-cell lines, promoters annotated by other methods and distinguishing between promoters and enhancers. Moreover, we identify the most important transcription factor binding site (TFBS) motif in promoter regions to facilitate the study of identifying important motifs in the promoter regions. The source code of iPro-WAEL is freely available at https://github.com/HaoWuLab-Bioinformatics/iPro-WAEL. |
format | Online Article Text |
id | pubmed-9561371 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-95613712022-10-18 iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species Zhang, Pengyu Zhang, Hongming Wu, Hao Nucleic Acids Res Computational Biology Promoters are consensus DNA sequences located near the transcription start sites and they play an important role in transcription initiation. Due to their importance in biological processes, the identification of promoters is significantly important for characterizing the expression of the genes. Numerous computational methods have been proposed to predict promoters. However, it is difficult for these methods to achieve satisfactory performance in multiple species. In this study, we propose a novel weighted average ensemble learning model, termed iPro-WAEL, for identifying promoters in multiple species, including Human, Mouse, E.coli, Arabidopsis, B.amyloliquefaciens, B.subtilis and R.capsulatus. Extensive benchmarking experiments illustrate that iPro-WAEL has optimal performance and is superior to the current methods in promoter prediction. The experimental results also demonstrate a satisfactory prediction ability of iPro-WAEL on cross-cell lines, promoters annotated by other methods and distinguishing between promoters and enhancers. Moreover, we identify the most important transcription factor binding site (TFBS) motif in promoter regions to facilitate the study of identifying important motifs in the promoter regions. The source code of iPro-WAEL is freely available at https://github.com/HaoWuLab-Bioinformatics/iPro-WAEL. Oxford University Press 2022-09-26 /pmc/articles/PMC9561371/ /pubmed/36161334 http://dx.doi.org/10.1093/nar/gkac824 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Computational Biology Zhang, Pengyu Zhang, Hongming Wu, Hao iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species |
title | iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species |
title_full | iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species |
title_fullStr | iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species |
title_full_unstemmed | iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species |
title_short | iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species |
title_sort | ipro-wael: a comprehensive and robust framework for identifying promoters in multiple species |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9561371/ https://www.ncbi.nlm.nih.gov/pubmed/36161334 http://dx.doi.org/10.1093/nar/gkac824 |
work_keys_str_mv | AT zhangpengyu iprowaelacomprehensiveandrobustframeworkforidentifyingpromotersinmultiplespecies AT zhanghongming iprowaelacomprehensiveandrobustframeworkforidentifyingpromotersinmultiplespecies AT wuhao iprowaelacomprehensiveandrobustframeworkforidentifyingpromotersinmultiplespecies |