Cargando…
Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding
MOTIVATION: Experimental techniques for measuring chromatin accessibility are expensive and time consuming, appealing for the development of computational approaches to predict open chromatin regions from DNA sequences. Along this direction, existing methods fall into two classes: one based on handc...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870572/ https://www.ncbi.nlm.nih.gov/pubmed/28881969 http://dx.doi.org/10.1093/bioinformatics/btx234 |
_version_ | 1783309512269103104 |
---|---|
author | Min, Xu Zeng, Wanwen Chen, Ning Chen, Ting Jiang, Rui |
author_facet | Min, Xu Zeng, Wanwen Chen, Ning Chen, Ting Jiang, Rui |
author_sort | Min, Xu |
collection | PubMed |
description | MOTIVATION: Experimental techniques for measuring chromatin accessibility are expensive and time consuming, appealing for the development of computational approaches to predict open chromatin regions from DNA sequences. Along this direction, existing methods fall into two classes: one based on handcrafted k-mer features and the other based on convolutional neural networks. Although both categories have shown good performance in specific applications thus far, there still lacks a comprehensive framework to integrate useful k-mer co-occurrence information with recent advances in deep learning. RESULTS: We fill this gap by addressing the problem of chromatin accessibility prediction with a convolutional Long Short-Term Memory (LSTM) network with k-mer embedding. We first split DNA sequences into k-mers and pre-train k-mer embedding vectors based on the co-occurrence matrix of k-mers by using an unsupervised representation learning approach. We then construct a supervised deep learning architecture comprised of an embedding layer, three convolutional layers and a Bidirectional LSTM (BLSTM) layer for feature learning and classification. We demonstrate that our method gains high-quality fixed-length features from variable-length sequences and consistently outperforms baseline methods. We show that k-mer embedding can effectively enhance model performance by exploring different embedding strategies. We also prove the efficacy of both the convolution and the BLSTM layers by comparing two variations of the network architecture. We confirm the robustness of our model to hyper-parameters by performing sensitivity analysis. We hope our method can eventually reinforce our understanding of employing deep learning in genomic studies and shed light on research regarding mechanisms of chromatin accessibility. AVAILABILITY AND IMPLEMENTATION: The source code can be downloaded from https://github.com/minxueric/ismb2017_lstm. SUPPLEMENTARY INFORMATION: Supplementary materials are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-5870572 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-58705722018-04-05 Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding Min, Xu Zeng, Wanwen Chen, Ning Chen, Ting Jiang, Rui Bioinformatics Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017 MOTIVATION: Experimental techniques for measuring chromatin accessibility are expensive and time consuming, appealing for the development of computational approaches to predict open chromatin regions from DNA sequences. Along this direction, existing methods fall into two classes: one based on handcrafted k-mer features and the other based on convolutional neural networks. Although both categories have shown good performance in specific applications thus far, there still lacks a comprehensive framework to integrate useful k-mer co-occurrence information with recent advances in deep learning. RESULTS: We fill this gap by addressing the problem of chromatin accessibility prediction with a convolutional Long Short-Term Memory (LSTM) network with k-mer embedding. We first split DNA sequences into k-mers and pre-train k-mer embedding vectors based on the co-occurrence matrix of k-mers by using an unsupervised representation learning approach. We then construct a supervised deep learning architecture comprised of an embedding layer, three convolutional layers and a Bidirectional LSTM (BLSTM) layer for feature learning and classification. We demonstrate that our method gains high-quality fixed-length features from variable-length sequences and consistently outperforms baseline methods. We show that k-mer embedding can effectively enhance model performance by exploring different embedding strategies. We also prove the efficacy of both the convolution and the BLSTM layers by comparing two variations of the network architecture. We confirm the robustness of our model to hyper-parameters by performing sensitivity analysis. We hope our method can eventually reinforce our understanding of employing deep learning in genomic studies and shed light on research regarding mechanisms of chromatin accessibility. AVAILABILITY AND IMPLEMENTATION: The source code can be downloaded from https://github.com/minxueric/ismb2017_lstm. SUPPLEMENTARY INFORMATION: Supplementary materials are available at Bioinformatics online. Oxford University Press 2017-07-15 2017-07-12 /pmc/articles/PMC5870572/ /pubmed/28881969 http://dx.doi.org/10.1093/bioinformatics/btx234 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017 Min, Xu Zeng, Wanwen Chen, Ning Chen, Ting Jiang, Rui Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding |
title | Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding |
title_full | Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding |
title_fullStr | Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding |
title_full_unstemmed | Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding |
title_short | Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding |
title_sort | chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding |
topic | Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017 |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870572/ https://www.ncbi.nlm.nih.gov/pubmed/28881969 http://dx.doi.org/10.1093/bioinformatics/btx234 |
work_keys_str_mv | AT minxu chromatinaccessibilitypredictionviaconvolutionallongshorttermmemorynetworkswithkmerembedding AT zengwanwen chromatinaccessibilitypredictionviaconvolutionallongshorttermmemorynetworkswithkmerembedding AT chenning chromatinaccessibilitypredictionviaconvolutionallongshorttermmemorynetworkswithkmerembedding AT chenting chromatinaccessibilitypredictionviaconvolutionallongshorttermmemorynetworkswithkmerembedding AT jiangrui chromatinaccessibilitypredictionviaconvolutionallongshorttermmemorynetworkswithkmerembedding |