Cargando…
De novo identification of replication-timing domains in the human genome by deep learning
Motivation: The de novo identification of the initiation and termination zones—regions that replicate earlier or later than their upstream and downstream neighbours, respectively—remains a key challenge in DNA replication. Results: Building on advances in deep learning, we developed a novel hybrid a...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4795613/ https://www.ncbi.nlm.nih.gov/pubmed/26545821 http://dx.doi.org/10.1093/bioinformatics/btv643 |
_version_ | 1782421630602969088 |
---|---|
author | Liu, Feng Ren, Chao Li, Hao Zhou, Pingkun Bo, Xiaochen Shu, Wenjie |
author_facet | Liu, Feng Ren, Chao Li, Hao Zhou, Pingkun Bo, Xiaochen Shu, Wenjie |
author_sort | Liu, Feng |
collection | PubMed |
description | Motivation: The de novo identification of the initiation and termination zones—regions that replicate earlier or later than their upstream and downstream neighbours, respectively—remains a key challenge in DNA replication. Results: Building on advances in deep learning, we developed a novel hybrid architecture combining a pre-trained, deep neural network and a hidden Markov model (DNN-HMM) for the de novo identification of replication domains using replication timing profiles. Our results demonstrate that DNN-HMM can significantly outperform strong, discriminatively trained Gaussian mixture model–HMM (GMM-HMM) systems and other six reported methods that can be applied to this challenge. We applied our trained DNN-HMM to identify distinct replication domain types, namely the early replication domain (ERD), the down transition zone (DTZ), the late replication domain (LRD) and the up transition zone (UTZ), using newly replicated DNA sequencing (Repli-Seq) data across 15 human cells. A subsequent integrative analysis revealed that these replication domains harbour unique genomic and epigenetic patterns, transcriptional activity and higher-order chromosomal structure. Our findings support the ‘replication-domain’ model, which states (1) that ERDs and LRDs, connected by UTZs and DTZs, are spatially compartmentalized structural and functional units of higher-order chromosomal structure, (2) that the adjacent DTZ-UTZ pairs form chromatin loops and (3) that intra-interactions within ERDs and LRDs tend to be short-range and long-range, respectively. Our model reveals an important chromatin organizational principle of the human genome and represents a critical step towards understanding the mechanisms regulating replication timing. Availability and implementation: Our DNN-HMM method and three additional algorithms can be freely accessed at https://github.com/wenjiegroup/DNN-HMM. The replication domain regions identified in this study are available in GEO under the accession ID GSE53984. Contact: shuwj@bmi.ac.cn or boxc@bmi.ac.cn Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-4795613 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-47956132016-03-21 De novo identification of replication-timing domains in the human genome by deep learning Liu, Feng Ren, Chao Li, Hao Zhou, Pingkun Bo, Xiaochen Shu, Wenjie Bioinformatics Original Papers Motivation: The de novo identification of the initiation and termination zones—regions that replicate earlier or later than their upstream and downstream neighbours, respectively—remains a key challenge in DNA replication. Results: Building on advances in deep learning, we developed a novel hybrid architecture combining a pre-trained, deep neural network and a hidden Markov model (DNN-HMM) for the de novo identification of replication domains using replication timing profiles. Our results demonstrate that DNN-HMM can significantly outperform strong, discriminatively trained Gaussian mixture model–HMM (GMM-HMM) systems and other six reported methods that can be applied to this challenge. We applied our trained DNN-HMM to identify distinct replication domain types, namely the early replication domain (ERD), the down transition zone (DTZ), the late replication domain (LRD) and the up transition zone (UTZ), using newly replicated DNA sequencing (Repli-Seq) data across 15 human cells. A subsequent integrative analysis revealed that these replication domains harbour unique genomic and epigenetic patterns, transcriptional activity and higher-order chromosomal structure. Our findings support the ‘replication-domain’ model, which states (1) that ERDs and LRDs, connected by UTZs and DTZs, are spatially compartmentalized structural and functional units of higher-order chromosomal structure, (2) that the adjacent DTZ-UTZ pairs form chromatin loops and (3) that intra-interactions within ERDs and LRDs tend to be short-range and long-range, respectively. Our model reveals an important chromatin organizational principle of the human genome and represents a critical step towards understanding the mechanisms regulating replication timing. Availability and implementation: Our DNN-HMM method and three additional algorithms can be freely accessed at https://github.com/wenjiegroup/DNN-HMM. The replication domain regions identified in this study are available in GEO under the accession ID GSE53984. Contact: shuwj@bmi.ac.cn or boxc@bmi.ac.cn Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-03-01 2015-11-05 /pmc/articles/PMC4795613/ /pubmed/26545821 http://dx.doi.org/10.1093/bioinformatics/btv643 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Original Papers Liu, Feng Ren, Chao Li, Hao Zhou, Pingkun Bo, Xiaochen Shu, Wenjie De novo identification of replication-timing domains in the human genome by deep learning |
title |
De novo identification of replication-timing domains in the human genome by deep learning |
title_full |
De novo identification of replication-timing domains in the human genome by deep learning |
title_fullStr |
De novo identification of replication-timing domains in the human genome by deep learning |
title_full_unstemmed |
De novo identification of replication-timing domains in the human genome by deep learning |
title_short |
De novo identification of replication-timing domains in the human genome by deep learning |
title_sort | de novo identification of replication-timing domains in the human genome by deep learning |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4795613/ https://www.ncbi.nlm.nih.gov/pubmed/26545821 http://dx.doi.org/10.1093/bioinformatics/btv643 |
work_keys_str_mv | AT liufeng denovoidentificationofreplicationtimingdomainsinthehumangenomebydeeplearning AT renchao denovoidentificationofreplicationtimingdomainsinthehumangenomebydeeplearning AT lihao denovoidentificationofreplicationtimingdomainsinthehumangenomebydeeplearning AT zhoupingkun denovoidentificationofreplicationtimingdomainsinthehumangenomebydeeplearning AT boxiaochen denovoidentificationofreplicationtimingdomainsinthehumangenomebydeeplearning AT shuwenjie denovoidentificationofreplicationtimingdomainsinthehumangenomebydeeplearning |