Cargando…

Modeling in-vivo protein-DNA binding by combining multiple-instance learning with a hybrid deep neural network

Modeling in-vivo protein-DNA binding is not only fundamental for further understanding of the regulatory mechanisms, but also a challenging task in computational biology. Deep-learning based methods have succeed in modeling in-vivo protein-DNA binding, but they often (1) follow the fully supervised...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Qinhu, Shen, Zhen, Huang, De-Shuang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6559991/
https://www.ncbi.nlm.nih.gov/pubmed/31186519
http://dx.doi.org/10.1038/s41598-019-44966-x
_version_ 1783425877722267648
author Zhang, Qinhu
Shen, Zhen
Huang, De-Shuang
author_facet Zhang, Qinhu
Shen, Zhen
Huang, De-Shuang
author_sort Zhang, Qinhu
collection PubMed
description Modeling in-vivo protein-DNA binding is not only fundamental for further understanding of the regulatory mechanisms, but also a challenging task in computational biology. Deep-learning based methods have succeed in modeling in-vivo protein-DNA binding, but they often (1) follow the fully supervised learning framework and overlook the weakly supervised information of genomic sequences that a bound DNA sequence may has multiple TFBS(s), and, (2) use one-hot encoding to encode DNA sequences and ignore the dependencies among nucleotides. In this paper, we propose a weakly supervised framework, which combines multiple-instance learning with a hybrid deep neural network and uses k-mer encoding to transform DNA sequences, for modeling in-vivo protein-DNA binding. Firstly, this framework segments sequences into multiple overlapping instances using a sliding window, and then encodes all instances into image-like inputs of high-order dependencies using k-mer encoding. Secondly, it separately computes a score for all instances in the same bag using a hybrid deep neural network that integrates convolutional and recurrent neural networks. Finally, it integrates the predicted values of all instances as the final prediction of this bag using the Noisy-and method. The experimental results on in-vivo datasets demonstrate the superior performance of the proposed framework. In addition, we also explore the performance of the proposed framework when using k-mer encoding, and demonstrate the performance of the Noisy-and method by comparing it with other fusion methods, and find that adding recurrent layers can improve the performance of the proposed framework.
format Online
Article
Text
id pubmed-6559991
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-65599912019-06-19 Modeling in-vivo protein-DNA binding by combining multiple-instance learning with a hybrid deep neural network Zhang, Qinhu Shen, Zhen Huang, De-Shuang Sci Rep Article Modeling in-vivo protein-DNA binding is not only fundamental for further understanding of the regulatory mechanisms, but also a challenging task in computational biology. Deep-learning based methods have succeed in modeling in-vivo protein-DNA binding, but they often (1) follow the fully supervised learning framework and overlook the weakly supervised information of genomic sequences that a bound DNA sequence may has multiple TFBS(s), and, (2) use one-hot encoding to encode DNA sequences and ignore the dependencies among nucleotides. In this paper, we propose a weakly supervised framework, which combines multiple-instance learning with a hybrid deep neural network and uses k-mer encoding to transform DNA sequences, for modeling in-vivo protein-DNA binding. Firstly, this framework segments sequences into multiple overlapping instances using a sliding window, and then encodes all instances into image-like inputs of high-order dependencies using k-mer encoding. Secondly, it separately computes a score for all instances in the same bag using a hybrid deep neural network that integrates convolutional and recurrent neural networks. Finally, it integrates the predicted values of all instances as the final prediction of this bag using the Noisy-and method. The experimental results on in-vivo datasets demonstrate the superior performance of the proposed framework. In addition, we also explore the performance of the proposed framework when using k-mer encoding, and demonstrate the performance of the Noisy-and method by comparing it with other fusion methods, and find that adding recurrent layers can improve the performance of the proposed framework. Nature Publishing Group UK 2019-06-11 /pmc/articles/PMC6559991/ /pubmed/31186519 http://dx.doi.org/10.1038/s41598-019-44966-x Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Zhang, Qinhu
Shen, Zhen
Huang, De-Shuang
Modeling in-vivo protein-DNA binding by combining multiple-instance learning with a hybrid deep neural network
title Modeling in-vivo protein-DNA binding by combining multiple-instance learning with a hybrid deep neural network
title_full Modeling in-vivo protein-DNA binding by combining multiple-instance learning with a hybrid deep neural network
title_fullStr Modeling in-vivo protein-DNA binding by combining multiple-instance learning with a hybrid deep neural network
title_full_unstemmed Modeling in-vivo protein-DNA binding by combining multiple-instance learning with a hybrid deep neural network
title_short Modeling in-vivo protein-DNA binding by combining multiple-instance learning with a hybrid deep neural network
title_sort modeling in-vivo protein-dna binding by combining multiple-instance learning with a hybrid deep neural network
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6559991/
https://www.ncbi.nlm.nih.gov/pubmed/31186519
http://dx.doi.org/10.1038/s41598-019-44966-x
work_keys_str_mv AT zhangqinhu modelinginvivoproteindnabindingbycombiningmultipleinstancelearningwithahybriddeepneuralnetwork
AT shenzhen modelinginvivoproteindnabindingbycombiningmultipleinstancelearningwithahybriddeepneuralnetwork
AT huangdeshuang modelinginvivoproteindnabindingbycombiningmultipleinstancelearningwithahybriddeepneuralnetwork