Cargando…

ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation

Regulatory elements control gene expression through transcription initiation (promoters) and by enhancing transcription at distant regions (enhancers). Accurate identification of regulatory elements is fundamental for annotating genomes and understanding gene expression patterns. While there are man...

Descripción completa

Detalles Bibliográficos
Autores principales: Umarov, Ramzan, Li, Yu, Arakawa, Takahiro, Takizawa, Satoshi, Gao, Xin, Arner, Erik
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8448322/
https://www.ncbi.nlm.nih.gov/pubmed/34491989
http://dx.doi.org/10.1371/journal.pcbi.1009376
_version_ 1784569214089035776
author Umarov, Ramzan
Li, Yu
Arakawa, Takahiro
Takizawa, Satoshi
Gao, Xin
Arner, Erik
author_facet Umarov, Ramzan
Li, Yu
Arakawa, Takahiro
Takizawa, Satoshi
Gao, Xin
Arner, Erik
author_sort Umarov, Ramzan
collection PubMed
description Regulatory elements control gene expression through transcription initiation (promoters) and by enhancing transcription at distant regions (enhancers). Accurate identification of regulatory elements is fundamental for annotating genomes and understanding gene expression patterns. While there are many attempts to develop computational promoter and enhancer identification methods, reliable tools to analyze long genomic sequences are still lacking. Prediction methods often perform poorly on the genome-wide scale because the number of negatives is much higher than that in the training sets. To address this issue, we propose a dynamic negative set updating scheme with a two-model approach, using one model for scanning the genome and the other one for testing candidate positions. The developed method achieves good genome-level performance and maintains robust performance when applied to other vertebrate species, without re-training. Moreover, the unannotated predicted regulatory regions made on the human genome are enriched for disease-associated variants, suggesting them to be potentially true regulatory elements rather than false positives. We validated high scoring “false positive” predictions using reporter assay and all tested candidates were successfully validated, demonstrating the ability of our method to discover novel human regulatory regions.
format Online
Article
Text
id pubmed-8448322
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-84483222021-09-18 ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation Umarov, Ramzan Li, Yu Arakawa, Takahiro Takizawa, Satoshi Gao, Xin Arner, Erik PLoS Comput Biol Research Article Regulatory elements control gene expression through transcription initiation (promoters) and by enhancing transcription at distant regions (enhancers). Accurate identification of regulatory elements is fundamental for annotating genomes and understanding gene expression patterns. While there are many attempts to develop computational promoter and enhancer identification methods, reliable tools to analyze long genomic sequences are still lacking. Prediction methods often perform poorly on the genome-wide scale because the number of negatives is much higher than that in the training sets. To address this issue, we propose a dynamic negative set updating scheme with a two-model approach, using one model for scanning the genome and the other one for testing candidate positions. The developed method achieves good genome-level performance and maintains robust performance when applied to other vertebrate species, without re-training. Moreover, the unannotated predicted regulatory regions made on the human genome are enriched for disease-associated variants, suggesting them to be potentially true regulatory elements rather than false positives. We validated high scoring “false positive” predictions using reporter assay and all tested candidates were successfully validated, demonstrating the ability of our method to discover novel human regulatory regions. Public Library of Science 2021-09-07 /pmc/articles/PMC8448322/ /pubmed/34491989 http://dx.doi.org/10.1371/journal.pcbi.1009376 Text en © 2021 Umarov et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Umarov, Ramzan
Li, Yu
Arakawa, Takahiro
Takizawa, Satoshi
Gao, Xin
Arner, Erik
ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation
title ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation
title_full ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation
title_fullStr ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation
title_full_unstemmed ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation
title_short ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation
title_sort refeafi: genome-wide prediction of regulatory elements driving transcription initiation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8448322/
https://www.ncbi.nlm.nih.gov/pubmed/34491989
http://dx.doi.org/10.1371/journal.pcbi.1009376
work_keys_str_mv AT umarovramzan refeafigenomewidepredictionofregulatoryelementsdrivingtranscriptioninitiation
AT liyu refeafigenomewidepredictionofregulatoryelementsdrivingtranscriptioninitiation
AT arakawatakahiro refeafigenomewidepredictionofregulatoryelementsdrivingtranscriptioninitiation
AT takizawasatoshi refeafigenomewidepredictionofregulatoryelementsdrivingtranscriptioninitiation
AT gaoxin refeafigenomewidepredictionofregulatoryelementsdrivingtranscriptioninitiation
AT arnererik refeafigenomewidepredictionofregulatoryelementsdrivingtranscriptioninitiation