Cargando…

Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape

MOTIVATION: An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affini...

Descripción completa

Detalles Bibliográficos
Autores principales: Dai, Hanjun, Umarov, Ramzan, Kuwahara, Hiroyuki, Li, Yu, Song, Le, Gao, Xin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870668/
https://www.ncbi.nlm.nih.gov/pubmed/28961686
http://dx.doi.org/10.1093/bioinformatics/btx480
_version_ 1783309529792905216
author Dai, Hanjun
Umarov, Ramzan
Kuwahara, Hiroyuki
Li, Yu
Song, Le
Gao, Xin
author_facet Dai, Hanjun
Umarov, Ramzan
Kuwahara, Hiroyuki
Li, Yu
Song, Le
Gao, Xin
author_sort Dai, Hanjun
collection PubMed
description MOTIVATION: An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem. RESULTS: Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model which captures both position specific information and long-range dependency in the sequence. A cornerstone of our method is a novel message passing-like embedding algorithm, called Sequence2Vec, which maps these hidden Markov models into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large-scale TF-DNA datasets which were measured by different high-throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state-of-the-art binding affinity prediction methods. AVAILABILITY AND IMPLEMENTATION: Our program is freely available at https://github.com/ramzan1990/sequence2vec. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5870668
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-58706682018-04-05 Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape Dai, Hanjun Umarov, Ramzan Kuwahara, Hiroyuki Li, Yu Song, Le Gao, Xin Bioinformatics Original Papers MOTIVATION: An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem. RESULTS: Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model which captures both position specific information and long-range dependency in the sequence. A cornerstone of our method is a novel message passing-like embedding algorithm, called Sequence2Vec, which maps these hidden Markov models into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large-scale TF-DNA datasets which were measured by different high-throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state-of-the-art binding affinity prediction methods. AVAILABILITY AND IMPLEMENTATION: Our program is freely available at https://github.com/ramzan1990/sequence2vec. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-11-15 2017-07-27 /pmc/articles/PMC5870668/ /pubmed/28961686 http://dx.doi.org/10.1093/bioinformatics/btx480 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Dai, Hanjun
Umarov, Ramzan
Kuwahara, Hiroyuki
Li, Yu
Song, Le
Gao, Xin
Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape
title Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape
title_full Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape
title_fullStr Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape
title_full_unstemmed Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape
title_short Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape
title_sort sequence2vec: a novel embedding approach for modeling transcription factor binding affinity landscape
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870668/
https://www.ncbi.nlm.nih.gov/pubmed/28961686
http://dx.doi.org/10.1093/bioinformatics/btx480
work_keys_str_mv AT daihanjun sequence2vecanovelembeddingapproachformodelingtranscriptionfactorbindingaffinitylandscape
AT umarovramzan sequence2vecanovelembeddingapproachformodelingtranscriptionfactorbindingaffinitylandscape
AT kuwaharahiroyuki sequence2vecanovelembeddingapproachformodelingtranscriptionfactorbindingaffinitylandscape
AT liyu sequence2vecanovelembeddingapproachformodelingtranscriptionfactorbindingaffinitylandscape
AT songle sequence2vecanovelembeddingapproachformodelingtranscriptionfactorbindingaffinitylandscape
AT gaoxin sequence2vecanovelembeddingapproachformodelingtranscriptionfactorbindingaffinitylandscape