Cargando…

Genome-wide prediction of cis-regulatory regions using supervised deep learning methods

BACKGROUND: In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yifeng, Shi, Wenqiang, Wasserman, Wyeth W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5984344/
https://www.ncbi.nlm.nih.gov/pubmed/29855387
http://dx.doi.org/10.1186/s12859-018-2187-1
_version_ 1783328595197820928
author Li, Yifeng
Shi, Wenqiang
Wasserman, Wyeth W.
author_facet Li, Yifeng
Shi, Wenqiang
Wasserman, Wyeth W.
author_sort Li, Yifeng
collection PubMed
description BACKGROUND: In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. RESULTS: Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). CONCLUSION: The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2187-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5984344
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-59843442018-06-07 Genome-wide prediction of cis-regulatory regions using supervised deep learning methods Li, Yifeng Shi, Wenqiang Wasserman, Wyeth W. BMC Bioinformatics Methodology Article BACKGROUND: In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. RESULTS: Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). CONCLUSION: The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2187-1) contains supplementary material, which is available to authorized users. BioMed Central 2018-05-31 /pmc/articles/PMC5984344/ /pubmed/29855387 http://dx.doi.org/10.1186/s12859-018-2187-1 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Li, Yifeng
Shi, Wenqiang
Wasserman, Wyeth W.
Genome-wide prediction of cis-regulatory regions using supervised deep learning methods
title Genome-wide prediction of cis-regulatory regions using supervised deep learning methods
title_full Genome-wide prediction of cis-regulatory regions using supervised deep learning methods
title_fullStr Genome-wide prediction of cis-regulatory regions using supervised deep learning methods
title_full_unstemmed Genome-wide prediction of cis-regulatory regions using supervised deep learning methods
title_short Genome-wide prediction of cis-regulatory regions using supervised deep learning methods
title_sort genome-wide prediction of cis-regulatory regions using supervised deep learning methods
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5984344/
https://www.ncbi.nlm.nih.gov/pubmed/29855387
http://dx.doi.org/10.1186/s12859-018-2187-1
work_keys_str_mv AT liyifeng genomewidepredictionofcisregulatoryregionsusingsuperviseddeeplearningmethods
AT shiwenqiang genomewidepredictionofcisregulatoryregionsusingsuperviseddeeplearningmethods
AT wassermanwyethw genomewidepredictionofcisregulatoryregionsusingsuperviseddeeplearningmethods