Cargando…

Active learning of enhancer and silencer regulatory grammar in photoreceptors

Cis-regulatory elements (CREs) direct gene expression in health and disease, and models that can accurately predict their activities from DNA sequences are crucial for biomedicine. Deep learning represents one emerging strategy to model the regulatory grammar that relates CRE sequence to function. H...

Descripción completa

Detalles Bibliográficos
Autores principales: Friedman, Ryan Z., Ramu, Avinash, Lichtarge, Sara, Myers, Connie A., Granas, David M., Gause, Maria, Corbo, Joseph C., Cohen, Barak A., White, Michael A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10473580/
https://www.ncbi.nlm.nih.gov/pubmed/37662358
http://dx.doi.org/10.1101/2023.08.21.554146
_version_ 1785100304028532736
author Friedman, Ryan Z.
Ramu, Avinash
Lichtarge, Sara
Myers, Connie A.
Granas, David M.
Gause, Maria
Corbo, Joseph C.
Cohen, Barak A.
White, Michael A.
author_facet Friedman, Ryan Z.
Ramu, Avinash
Lichtarge, Sara
Myers, Connie A.
Granas, David M.
Gause, Maria
Corbo, Joseph C.
Cohen, Barak A.
White, Michael A.
author_sort Friedman, Ryan Z.
collection PubMed
description Cis-regulatory elements (CREs) direct gene expression in health and disease, and models that can accurately predict their activities from DNA sequences are crucial for biomedicine. Deep learning represents one emerging strategy to model the regulatory grammar that relates CRE sequence to function. However, these models require training data on a scale that exceeds the number of CREs in the genome. We address this problem using active machine learning to iteratively train models on multiple rounds of synthetic DNA sequences assayed in live mammalian retinas. During each round of training the model actively selects sequence perturbations to assay, thereby efficiently generating informative training data. We iteratively trained a model that predicts the activities of sequences containing binding motifs for the photoreceptor transcription factor Cone-rod homeobox (CRX) using an order of magnitude less training data than current approaches. The model’s internal confidence estimates of its predictions are reliable guides for designing sequences with high activity. The model correctly identified critical sequence differences between active and inactive sequences with nearly identical transcription factor binding sites, and revealed order and spacing preferences for combinations of motifs. Our results establish active learning as an effective method to train accurate deep learning models of cis-regulatory function after exhausting naturally occurring training examples in the genome.
format Online
Article
Text
id pubmed-10473580
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-104735802023-09-02 Active learning of enhancer and silencer regulatory grammar in photoreceptors Friedman, Ryan Z. Ramu, Avinash Lichtarge, Sara Myers, Connie A. Granas, David M. Gause, Maria Corbo, Joseph C. Cohen, Barak A. White, Michael A. bioRxiv Article Cis-regulatory elements (CREs) direct gene expression in health and disease, and models that can accurately predict their activities from DNA sequences are crucial for biomedicine. Deep learning represents one emerging strategy to model the regulatory grammar that relates CRE sequence to function. However, these models require training data on a scale that exceeds the number of CREs in the genome. We address this problem using active machine learning to iteratively train models on multiple rounds of synthetic DNA sequences assayed in live mammalian retinas. During each round of training the model actively selects sequence perturbations to assay, thereby efficiently generating informative training data. We iteratively trained a model that predicts the activities of sequences containing binding motifs for the photoreceptor transcription factor Cone-rod homeobox (CRX) using an order of magnitude less training data than current approaches. The model’s internal confidence estimates of its predictions are reliable guides for designing sequences with high activity. The model correctly identified critical sequence differences between active and inactive sequences with nearly identical transcription factor binding sites, and revealed order and spacing preferences for combinations of motifs. Our results establish active learning as an effective method to train accurate deep learning models of cis-regulatory function after exhausting naturally occurring training examples in the genome. Cold Spring Harbor Laboratory 2023-08-22 /pmc/articles/PMC10473580/ /pubmed/37662358 http://dx.doi.org/10.1101/2023.08.21.554146 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Friedman, Ryan Z.
Ramu, Avinash
Lichtarge, Sara
Myers, Connie A.
Granas, David M.
Gause, Maria
Corbo, Joseph C.
Cohen, Barak A.
White, Michael A.
Active learning of enhancer and silencer regulatory grammar in photoreceptors
title Active learning of enhancer and silencer regulatory grammar in photoreceptors
title_full Active learning of enhancer and silencer regulatory grammar in photoreceptors
title_fullStr Active learning of enhancer and silencer regulatory grammar in photoreceptors
title_full_unstemmed Active learning of enhancer and silencer regulatory grammar in photoreceptors
title_short Active learning of enhancer and silencer regulatory grammar in photoreceptors
title_sort active learning of enhancer and silencer regulatory grammar in photoreceptors
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10473580/
https://www.ncbi.nlm.nih.gov/pubmed/37662358
http://dx.doi.org/10.1101/2023.08.21.554146
work_keys_str_mv AT friedmanryanz activelearningofenhancerandsilencerregulatorygrammarinphotoreceptors
AT ramuavinash activelearningofenhancerandsilencerregulatorygrammarinphotoreceptors
AT lichtargesara activelearningofenhancerandsilencerregulatorygrammarinphotoreceptors
AT myersconniea activelearningofenhancerandsilencerregulatorygrammarinphotoreceptors
AT granasdavidm activelearningofenhancerandsilencerregulatorygrammarinphotoreceptors
AT gausemaria activelearningofenhancerandsilencerregulatorygrammarinphotoreceptors
AT corbojosephc activelearningofenhancerandsilencerregulatorygrammarinphotoreceptors
AT cohenbaraka activelearningofenhancerandsilencerregulatorygrammarinphotoreceptors
AT whitemichaela activelearningofenhancerandsilencerregulatorygrammarinphotoreceptors