Cargando…

Deciphering eukaryotic gene-regulatory logic with 100 million random promoters

How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences th...

Descripción completa

Detalles Bibliográficos
Autores principales: de Boer, Carl G., Vaishnav, Eeshit Dhaval, Sadeh, Ronen, Abeyta, Esteban Luis, Friedman, Nir, Regev, Aviv
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954276/
https://www.ncbi.nlm.nih.gov/pubmed/31792407
http://dx.doi.org/10.1038/s41587-019-0315-8
_version_ 1783486778343161856
author de Boer, Carl G.
Vaishnav, Eeshit Dhaval
Sadeh, Ronen
Abeyta, Esteban Luis
Friedman, Nir
Regev, Aviv
author_facet de Boer, Carl G.
Vaishnav, Eeshit Dhaval
Sadeh, Ronen
Abeyta, Esteban Luis
Friedman, Nir
Regev, Aviv
author_sort de Boer, Carl G.
collection PubMed
description How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites. We use machine learning to build interpretable models of transcriptional regulation that predict ~94% of the expression driven from independent test promoters and ~89% of the expression driven from native yeast promoter fragments. These models allow us to characterize each TF’s specificity, activity, and interactions with chromatin. TF activity depends on binding-site strand, position, DNA helical face and chromatin context. Notably, expression level is influenced by weak regulatory interactions, which confound designed-sequence studies. Our analyses show that massive-throughput assays of fully random DNA can provide the big data necessary to develop complex, predictive models of gene regulation.
format Online
Article
Text
id pubmed-6954276
institution National Center for Biotechnology Information
language English
publishDate 2019
record_format MEDLINE/PubMed
spelling pubmed-69542762020-06-02 Deciphering eukaryotic gene-regulatory logic with 100 million random promoters de Boer, Carl G. Vaishnav, Eeshit Dhaval Sadeh, Ronen Abeyta, Esteban Luis Friedman, Nir Regev, Aviv Nat Biotechnol Article How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites. We use machine learning to build interpretable models of transcriptional regulation that predict ~94% of the expression driven from independent test promoters and ~89% of the expression driven from native yeast promoter fragments. These models allow us to characterize each TF’s specificity, activity, and interactions with chromatin. TF activity depends on binding-site strand, position, DNA helical face and chromatin context. Notably, expression level is influenced by weak regulatory interactions, which confound designed-sequence studies. Our analyses show that massive-throughput assays of fully random DNA can provide the big data necessary to develop complex, predictive models of gene regulation. 2019-12-02 2020-01 /pmc/articles/PMC6954276/ /pubmed/31792407 http://dx.doi.org/10.1038/s41587-019-0315-8 Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms
spellingShingle Article
de Boer, Carl G.
Vaishnav, Eeshit Dhaval
Sadeh, Ronen
Abeyta, Esteban Luis
Friedman, Nir
Regev, Aviv
Deciphering eukaryotic gene-regulatory logic with 100 million random promoters
title Deciphering eukaryotic gene-regulatory logic with 100 million random promoters
title_full Deciphering eukaryotic gene-regulatory logic with 100 million random promoters
title_fullStr Deciphering eukaryotic gene-regulatory logic with 100 million random promoters
title_full_unstemmed Deciphering eukaryotic gene-regulatory logic with 100 million random promoters
title_short Deciphering eukaryotic gene-regulatory logic with 100 million random promoters
title_sort deciphering eukaryotic gene-regulatory logic with 100 million random promoters
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954276/
https://www.ncbi.nlm.nih.gov/pubmed/31792407
http://dx.doi.org/10.1038/s41587-019-0315-8
work_keys_str_mv AT deboercarlg decipheringeukaryoticgeneregulatorylogicwith100millionrandompromoters
AT vaishnaveeshitdhaval decipheringeukaryoticgeneregulatorylogicwith100millionrandompromoters
AT sadehronen decipheringeukaryoticgeneregulatorylogicwith100millionrandompromoters
AT abeytaestebanluis decipheringeukaryoticgeneregulatorylogicwith100millionrandompromoters
AT friedmannir decipheringeukaryoticgeneregulatorylogicwith100millionrandompromoters
AT regevaviv decipheringeukaryoticgeneregulatorylogicwith100millionrandompromoters