Cargando…

De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding

Transcription factors (TF) are proteins that bind DNA in a sequence-specific manner to regulate gene transcription. Despite their unique intrinsic sequence preferences, in vivo genomic occupancy profiles of TFs differ across cellular contexts. Hence, deciphering the sequence determinants of TF bindi...

Descripción completa

Detalles Bibliográficos
Autores principales: Alexandari, Amr M., Horton, Connor A., Shrikumar, Avanti, Shah, Nilay, Li, Eileen, Weilert, Melanie, Pufall, Miles A., Zeitlinger, Julia, Fordyce, Polly M., Kundaje, Anshul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10197627/
https://www.ncbi.nlm.nih.gov/pubmed/37214836
http://dx.doi.org/10.1101/2023.05.11.540401
_version_ 1785044585028780032
author Alexandari, Amr M.
Horton, Connor A.
Shrikumar, Avanti
Shah, Nilay
Li, Eileen
Weilert, Melanie
Pufall, Miles A.
Zeitlinger, Julia
Fordyce, Polly M.
Kundaje, Anshul
author_facet Alexandari, Amr M.
Horton, Connor A.
Shrikumar, Avanti
Shah, Nilay
Li, Eileen
Weilert, Melanie
Pufall, Miles A.
Zeitlinger, Julia
Fordyce, Polly M.
Kundaje, Anshul
author_sort Alexandari, Amr M.
collection PubMed
description Transcription factors (TF) are proteins that bind DNA in a sequence-specific manner to regulate gene transcription. Despite their unique intrinsic sequence preferences, in vivo genomic occupancy profiles of TFs differ across cellular contexts. Hence, deciphering the sequence determinants of TF binding, both intrinsic and context-specific, is essential to understand gene regulation and the impact of regulatory, non-coding genetic variation. Biophysical models trained on in vitro TF binding assays can estimate intrinsic affinity landscapes and predict occupancy based on TF concentration and affinity. However, these models cannot adequately explain context-specific, in vivo binding profiles. Conversely, deep learning models, trained on in vivo TF binding assays, effectively predict and explain genomic occupancy profiles as a function of complex regulatory sequence syntax, albeit without a clear biophysical interpretation. To reconcile these complementary models of in vitro and in vivo TF binding, we developed Affinity Distillation (AD), a method that extracts thermodynamic affinities de-novo from deep learning models of TF chromatin immunoprecipitation (ChIP) experiments by marginalizing away the influence of genomic sequence context. Applied to neural networks modeling diverse classes of yeast and mammalian TFs, AD predicts energetic impacts of sequence variation within and surrounding motifs on TF binding as measured by diverse in vitro assays with superior dynamic range and accuracy compared to motif-based methods. Furthermore, AD can accurately discern affinities of TF paralogs. Our results highlight thermodynamic affinity as a key determinant of in vivo binding, suggest that deep learning models of in vivo binding implicitly learn high-resolution affinity landscapes, and show that these affinities can be successfully distilled using AD. This new biophysical interpretation of deep learning models enables high-throughput in silico experiments to explore the influence of sequence context and variation on both intrinsic affinity and in vivo occupancy.
format Online
Article
Text
id pubmed-10197627
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-101976272023-05-20 De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding Alexandari, Amr M. Horton, Connor A. Shrikumar, Avanti Shah, Nilay Li, Eileen Weilert, Melanie Pufall, Miles A. Zeitlinger, Julia Fordyce, Polly M. Kundaje, Anshul bioRxiv Article Transcription factors (TF) are proteins that bind DNA in a sequence-specific manner to regulate gene transcription. Despite their unique intrinsic sequence preferences, in vivo genomic occupancy profiles of TFs differ across cellular contexts. Hence, deciphering the sequence determinants of TF binding, both intrinsic and context-specific, is essential to understand gene regulation and the impact of regulatory, non-coding genetic variation. Biophysical models trained on in vitro TF binding assays can estimate intrinsic affinity landscapes and predict occupancy based on TF concentration and affinity. However, these models cannot adequately explain context-specific, in vivo binding profiles. Conversely, deep learning models, trained on in vivo TF binding assays, effectively predict and explain genomic occupancy profiles as a function of complex regulatory sequence syntax, albeit without a clear biophysical interpretation. To reconcile these complementary models of in vitro and in vivo TF binding, we developed Affinity Distillation (AD), a method that extracts thermodynamic affinities de-novo from deep learning models of TF chromatin immunoprecipitation (ChIP) experiments by marginalizing away the influence of genomic sequence context. Applied to neural networks modeling diverse classes of yeast and mammalian TFs, AD predicts energetic impacts of sequence variation within and surrounding motifs on TF binding as measured by diverse in vitro assays with superior dynamic range and accuracy compared to motif-based methods. Furthermore, AD can accurately discern affinities of TF paralogs. Our results highlight thermodynamic affinity as a key determinant of in vivo binding, suggest that deep learning models of in vivo binding implicitly learn high-resolution affinity landscapes, and show that these affinities can be successfully distilled using AD. This new biophysical interpretation of deep learning models enables high-throughput in silico experiments to explore the influence of sequence context and variation on both intrinsic affinity and in vivo occupancy. Cold Spring Harbor Laboratory 2023-05-11 /pmc/articles/PMC10197627/ /pubmed/37214836 http://dx.doi.org/10.1101/2023.05.11.540401 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Alexandari, Amr M.
Horton, Connor A.
Shrikumar, Avanti
Shah, Nilay
Li, Eileen
Weilert, Melanie
Pufall, Miles A.
Zeitlinger, Julia
Fordyce, Polly M.
Kundaje, Anshul
De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding
title De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding
title_full De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding
title_fullStr De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding
title_full_unstemmed De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding
title_short De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding
title_sort de novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-dna binding
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10197627/
https://www.ncbi.nlm.nih.gov/pubmed/37214836
http://dx.doi.org/10.1101/2023.05.11.540401
work_keys_str_mv AT alexandariamrm denovodistillationofthermodynamicaffinityfromdeeplearningregulatorysequencemodelsofinvivoproteindnabinding
AT hortonconnora denovodistillationofthermodynamicaffinityfromdeeplearningregulatorysequencemodelsofinvivoproteindnabinding
AT shrikumaravanti denovodistillationofthermodynamicaffinityfromdeeplearningregulatorysequencemodelsofinvivoproteindnabinding
AT shahnilay denovodistillationofthermodynamicaffinityfromdeeplearningregulatorysequencemodelsofinvivoproteindnabinding
AT lieileen denovodistillationofthermodynamicaffinityfromdeeplearningregulatorysequencemodelsofinvivoproteindnabinding
AT weilertmelanie denovodistillationofthermodynamicaffinityfromdeeplearningregulatorysequencemodelsofinvivoproteindnabinding
AT pufallmilesa denovodistillationofthermodynamicaffinityfromdeeplearningregulatorysequencemodelsofinvivoproteindnabinding
AT zeitlingerjulia denovodistillationofthermodynamicaffinityfromdeeplearningregulatorysequencemodelsofinvivoproteindnabinding
AT fordycepollym denovodistillationofthermodynamicaffinityfromdeeplearningregulatorysequencemodelsofinvivoproteindnabinding
AT kundajeanshul denovodistillationofthermodynamicaffinityfromdeeplearningregulatorysequencemodelsofinvivoproteindnabinding