Cargando…

Deep neural networks identify sequence context features predictive of transcription factor binding

Transcription factors (TFs) bind DNA by recognizing specific sequence motifs, typically of length 6–12bp. A motif can occur many thousands of times in the human genome, but only a subset of those sites are actually bound. Here we present a machine learning framework leveraging existing convolutional...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zheng, An, Lamkin, Michael, Zhao, Hanqing, Wu, Cynthia, Su, Hao, Gymrek, Melissa
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8009085/ https://www.ncbi.nlm.nih.gov/pubmed/33796819 http://dx.doi.org/10.1038/s42256-020-00282-y

_version_	1783672813164429312
author	Zheng, An Lamkin, Michael Zhao, Hanqing Wu, Cynthia Su, Hao Gymrek, Melissa
author_facet	Zheng, An Lamkin, Michael Zhao, Hanqing Wu, Cynthia Su, Hao Gymrek, Melissa
author_sort	Zheng, An
collection	PubMed
description	Transcription factors (TFs) bind DNA by recognizing specific sequence motifs, typically of length 6–12bp. A motif can occur many thousands of times in the human genome, but only a subset of those sites are actually bound. Here we present a machine learning framework leveraging existing convolutional neural network architectures and model interpretation techniques to identify and interpret sequence context features most important for predicting whether a particular motif instance will be bound. We apply our framework to predict binding at motifs for 38 TFs in a lymphoblastoid cell line, score the importance of context sequences at base-pair resolution, and characterize context features most predictive of binding. We find that the choice of training data heavily influences classification accuracy and the relative importance of features such as open chromatin. Overall, our framework enables novel insights into features predictive of TF binding and is likely to inform future deep learning applications to interpret non-coding genetic variants.
format	Online Article Text
id	pubmed-8009085
institution	National Center for Biotechnology Information
language	English
publishDate	2021
record_format	MEDLINE/PubMed
spelling	pubmed-80090852021-08-01 Deep neural networks identify sequence context features predictive of transcription factor binding Zheng, An Lamkin, Michael Zhao, Hanqing Wu, Cynthia Su, Hao Gymrek, Melissa Nat Mach Intell Article Transcription factors (TFs) bind DNA by recognizing specific sequence motifs, typically of length 6–12bp. A motif can occur many thousands of times in the human genome, but only a subset of those sites are actually bound. Here we present a machine learning framework leveraging existing convolutional neural network architectures and model interpretation techniques to identify and interpret sequence context features most important for predicting whether a particular motif instance will be bound. We apply our framework to predict binding at motifs for 38 TFs in a lymphoblastoid cell line, score the importance of context sequences at base-pair resolution, and characterize context features most predictive of binding. We find that the choice of training data heavily influences classification accuracy and the relative importance of features such as open chromatin. Overall, our framework enables novel insights into features predictive of TF binding and is likely to inform future deep learning applications to interpret non-coding genetic variants. 2021-01-18 2021-02 /pmc/articles/PMC8009085/ /pubmed/33796819 http://dx.doi.org/10.1038/s42256-020-00282-y Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms
spellingShingle	Article Zheng, An Lamkin, Michael Zhao, Hanqing Wu, Cynthia Su, Hao Gymrek, Melissa Deep neural networks identify sequence context features predictive of transcription factor binding
title	Deep neural networks identify sequence context features predictive of transcription factor binding
title_full	Deep neural networks identify sequence context features predictive of transcription factor binding
title_fullStr	Deep neural networks identify sequence context features predictive of transcription factor binding
title_full_unstemmed	Deep neural networks identify sequence context features predictive of transcription factor binding
title_short	Deep neural networks identify sequence context features predictive of transcription factor binding
title_sort	deep neural networks identify sequence context features predictive of transcription factor binding
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8009085/ https://www.ncbi.nlm.nih.gov/pubmed/33796819 http://dx.doi.org/10.1038/s42256-020-00282-y
work_keys_str_mv	AT zhengan deepneuralnetworksidentifysequencecontextfeaturespredictiveoftranscriptionfactorbinding AT lamkinmichael deepneuralnetworksidentifysequencecontextfeaturespredictiveoftranscriptionfactorbinding AT zhaohanqing deepneuralnetworksidentifysequencecontextfeaturespredictiveoftranscriptionfactorbinding AT wucynthia deepneuralnetworksidentifysequencecontextfeaturespredictiveoftranscriptionfactorbinding AT suhao deepneuralnetworksidentifysequencecontextfeaturespredictiveoftranscriptionfactorbinding AT gymrekmelissa deepneuralnetworksidentifysequencecontextfeaturespredictiveoftranscriptionfactorbinding

Deep neural networks identify sequence context features predictive of transcription factor binding

Ejemplares similares