Cargando…

De novo identification of replication-timing domains in the human genome by deep learning

Motivation: The de novo identification of the initiation and termination zones—regions that replicate earlier or later than their upstream and downstream neighbours, respectively—remains a key challenge in DNA replication. Results: Building on advances in deep learning, we developed a novel hybrid a...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Feng, Ren, Chao, Li, Hao, Zhou, Pingkun, Bo, Xiaochen, Shu, Wenjie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4795613/
https://www.ncbi.nlm.nih.gov/pubmed/26545821
http://dx.doi.org/10.1093/bioinformatics/btv643
_version_ 1782421630602969088
author Liu, Feng
Ren, Chao
Li, Hao
Zhou, Pingkun
Bo, Xiaochen
Shu, Wenjie
author_facet Liu, Feng
Ren, Chao
Li, Hao
Zhou, Pingkun
Bo, Xiaochen
Shu, Wenjie
author_sort Liu, Feng
collection PubMed
description Motivation: The de novo identification of the initiation and termination zones—regions that replicate earlier or later than their upstream and downstream neighbours, respectively—remains a key challenge in DNA replication. Results: Building on advances in deep learning, we developed a novel hybrid architecture combining a pre-trained, deep neural network and a hidden Markov model (DNN-HMM) for the de novo identification of replication domains using replication timing profiles. Our results demonstrate that DNN-HMM can significantly outperform strong, discriminatively trained Gaussian mixture model–HMM (GMM-HMM) systems and other six reported methods that can be applied to this challenge. We applied our trained DNN-HMM to identify distinct replication domain types, namely the early replication domain (ERD), the down transition zone (DTZ), the late replication domain (LRD) and the up transition zone (UTZ), using newly replicated DNA sequencing (Repli-Seq) data across 15 human cells. A subsequent integrative analysis revealed that these replication domains harbour unique genomic and epigenetic patterns, transcriptional activity and higher-order chromosomal structure. Our findings support the ‘replication-domain’ model, which states (1) that ERDs and LRDs, connected by UTZs and DTZs, are spatially compartmentalized structural and functional units of higher-order chromosomal structure, (2) that the adjacent DTZ-UTZ pairs form chromatin loops and (3) that intra-interactions within ERDs and LRDs tend to be short-range and long-range, respectively. Our model reveals an important chromatin organizational principle of the human genome and represents a critical step towards understanding the mechanisms regulating replication timing. Availability and implementation: Our DNN-HMM method and three additional algorithms can be freely accessed at https://github.com/wenjiegroup/DNN-HMM. The replication domain regions identified in this study are available in GEO under the accession ID GSE53984. Contact: shuwj@bmi.ac.cn or boxc@bmi.ac.cn Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4795613
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-47956132016-03-21 De novo identification of replication-timing domains in the human genome by deep learning Liu, Feng Ren, Chao Li, Hao Zhou, Pingkun Bo, Xiaochen Shu, Wenjie Bioinformatics Original Papers Motivation: The de novo identification of the initiation and termination zones—regions that replicate earlier or later than their upstream and downstream neighbours, respectively—remains a key challenge in DNA replication. Results: Building on advances in deep learning, we developed a novel hybrid architecture combining a pre-trained, deep neural network and a hidden Markov model (DNN-HMM) for the de novo identification of replication domains using replication timing profiles. Our results demonstrate that DNN-HMM can significantly outperform strong, discriminatively trained Gaussian mixture model–HMM (GMM-HMM) systems and other six reported methods that can be applied to this challenge. We applied our trained DNN-HMM to identify distinct replication domain types, namely the early replication domain (ERD), the down transition zone (DTZ), the late replication domain (LRD) and the up transition zone (UTZ), using newly replicated DNA sequencing (Repli-Seq) data across 15 human cells. A subsequent integrative analysis revealed that these replication domains harbour unique genomic and epigenetic patterns, transcriptional activity and higher-order chromosomal structure. Our findings support the ‘replication-domain’ model, which states (1) that ERDs and LRDs, connected by UTZs and DTZs, are spatially compartmentalized structural and functional units of higher-order chromosomal structure, (2) that the adjacent DTZ-UTZ pairs form chromatin loops and (3) that intra-interactions within ERDs and LRDs tend to be short-range and long-range, respectively. Our model reveals an important chromatin organizational principle of the human genome and represents a critical step towards understanding the mechanisms regulating replication timing. Availability and implementation: Our DNN-HMM method and three additional algorithms can be freely accessed at https://github.com/wenjiegroup/DNN-HMM. The replication domain regions identified in this study are available in GEO under the accession ID GSE53984. Contact: shuwj@bmi.ac.cn or boxc@bmi.ac.cn Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-03-01 2015-11-05 /pmc/articles/PMC4795613/ /pubmed/26545821 http://dx.doi.org/10.1093/bioinformatics/btv643 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Liu, Feng
Ren, Chao
Li, Hao
Zhou, Pingkun
Bo, Xiaochen
Shu, Wenjie
De novo identification of replication-timing domains in the human genome by deep learning
title De novo identification of replication-timing domains in the human genome by deep learning
title_full De novo identification of replication-timing domains in the human genome by deep learning
title_fullStr De novo identification of replication-timing domains in the human genome by deep learning
title_full_unstemmed De novo identification of replication-timing domains in the human genome by deep learning
title_short De novo identification of replication-timing domains in the human genome by deep learning
title_sort de novo identification of replication-timing domains in the human genome by deep learning
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4795613/
https://www.ncbi.nlm.nih.gov/pubmed/26545821
http://dx.doi.org/10.1093/bioinformatics/btv643
work_keys_str_mv AT liufeng denovoidentificationofreplicationtimingdomainsinthehumangenomebydeeplearning
AT renchao denovoidentificationofreplicationtimingdomainsinthehumangenomebydeeplearning
AT lihao denovoidentificationofreplicationtimingdomainsinthehumangenomebydeeplearning
AT zhoupingkun denovoidentificationofreplicationtimingdomainsinthehumangenomebydeeplearning
AT boxiaochen denovoidentificationofreplicationtimingdomainsinthehumangenomebydeeplearning
AT shuwenjie denovoidentificationofreplicationtimingdomainsinthehumangenomebydeeplearning