Cargando…

Characterizing Promoter and Enhancer Sequences by a Deep Learning Method

Promoters and enhancers are well-known regulatory elements modulating gene expression. As confirmed by high-throughput sequencing technologies, these regulatory elements are bidirectionally transcribed. That is, promoters produce stable mRNA in the sense direction and unstable RNA in the antisense d...

Descripción completa

Detalles Bibliográficos
Autores principales: Zeng, Xin, Park, Sung-Joon, Nakai, Kenta
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8239401/
https://www.ncbi.nlm.nih.gov/pubmed/34211503
http://dx.doi.org/10.3389/fgene.2021.681259
_version_ 1783715069359554560
author Zeng, Xin
Park, Sung-Joon
Nakai, Kenta
author_facet Zeng, Xin
Park, Sung-Joon
Nakai, Kenta
author_sort Zeng, Xin
collection PubMed
description Promoters and enhancers are well-known regulatory elements modulating gene expression. As confirmed by high-throughput sequencing technologies, these regulatory elements are bidirectionally transcribed. That is, promoters produce stable mRNA in the sense direction and unstable RNA in the antisense direction, while enhancers transcribe unstable RNA in both directions. Although it is thought that enhancers and promoters share a similar architecture of transcription start sites (TSSs), how the transcriptional machinery distinctly uses these genomic regions as promoters or enhancers remains unclear. To address this issue, we developed a deep learning (DL) method by utilizing a convolutional neural network (CNN) and the saliency algorithm. In comparison with other classifiers, our CNN presented higher predictive performance, suggesting the overarching importance of the high-order sequence features, captured by the CNN. Moreover, our method revealed that there are substantial sequence differences between the enhancers and promoters. Remarkably, the 20–120 bp downstream regions from the center of bidirectional TSSs seemed to contribute to the RNA stability. These regions in promoters tend to have a larger number of guanines and cytosines compared to those in enhancers, and this feature contributed to the classification of the regulatory elements. Our CNN-based method can capture the complex TSS architectures. We found that the genomic regions around TSSs for promoters and enhancers contribute to RNA stability and show GC-biased characteristics as a critical determinant for promoter TSSs.
format Online
Article
Text
id pubmed-8239401
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-82394012021-06-30 Characterizing Promoter and Enhancer Sequences by a Deep Learning Method Zeng, Xin Park, Sung-Joon Nakai, Kenta Front Genet Genetics Promoters and enhancers are well-known regulatory elements modulating gene expression. As confirmed by high-throughput sequencing technologies, these regulatory elements are bidirectionally transcribed. That is, promoters produce stable mRNA in the sense direction and unstable RNA in the antisense direction, while enhancers transcribe unstable RNA in both directions. Although it is thought that enhancers and promoters share a similar architecture of transcription start sites (TSSs), how the transcriptional machinery distinctly uses these genomic regions as promoters or enhancers remains unclear. To address this issue, we developed a deep learning (DL) method by utilizing a convolutional neural network (CNN) and the saliency algorithm. In comparison with other classifiers, our CNN presented higher predictive performance, suggesting the overarching importance of the high-order sequence features, captured by the CNN. Moreover, our method revealed that there are substantial sequence differences between the enhancers and promoters. Remarkably, the 20–120 bp downstream regions from the center of bidirectional TSSs seemed to contribute to the RNA stability. These regions in promoters tend to have a larger number of guanines and cytosines compared to those in enhancers, and this feature contributed to the classification of the regulatory elements. Our CNN-based method can capture the complex TSS architectures. We found that the genomic regions around TSSs for promoters and enhancers contribute to RNA stability and show GC-biased characteristics as a critical determinant for promoter TSSs. Frontiers Media S.A. 2021-06-15 /pmc/articles/PMC8239401/ /pubmed/34211503 http://dx.doi.org/10.3389/fgene.2021.681259 Text en Copyright © 2021 Zeng, Park and Nakai. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Zeng, Xin
Park, Sung-Joon
Nakai, Kenta
Characterizing Promoter and Enhancer Sequences by a Deep Learning Method
title Characterizing Promoter and Enhancer Sequences by a Deep Learning Method
title_full Characterizing Promoter and Enhancer Sequences by a Deep Learning Method
title_fullStr Characterizing Promoter and Enhancer Sequences by a Deep Learning Method
title_full_unstemmed Characterizing Promoter and Enhancer Sequences by a Deep Learning Method
title_short Characterizing Promoter and Enhancer Sequences by a Deep Learning Method
title_sort characterizing promoter and enhancer sequences by a deep learning method
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8239401/
https://www.ncbi.nlm.nih.gov/pubmed/34211503
http://dx.doi.org/10.3389/fgene.2021.681259
work_keys_str_mv AT zengxin characterizingpromoterandenhancersequencesbyadeeplearningmethod
AT parksungjoon characterizingpromoterandenhancersequencesbyadeeplearningmethod
AT nakaikenta characterizingpromoterandenhancersequencesbyadeeplearningmethod