Cargando…

Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping

Predicting effects of gene regulatory elements (GREs) is a longstanding challenge in biology. Machine learning may address this, but requires large datasets linking GREs to their quantitative function. However, experimental methods to generate such datasets are either application-specific or technic...

Descripción completa

Detalles Bibliográficos
Autores principales: Höllerer, Simon, Papaxanthos, Laetitia, Gumpinger, Anja Cathrin, Fischer, Katrin, Beisel, Christian, Borgwardt, Karsten, Benenson, Yaakov, Jeschek, Markus
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7363850/
https://www.ncbi.nlm.nih.gov/pubmed/32669542
http://dx.doi.org/10.1038/s41467-020-17222-4
_version_ 1783559722129948672
author Höllerer, Simon
Papaxanthos, Laetitia
Gumpinger, Anja Cathrin
Fischer, Katrin
Beisel, Christian
Borgwardt, Karsten
Benenson, Yaakov
Jeschek, Markus
author_facet Höllerer, Simon
Papaxanthos, Laetitia
Gumpinger, Anja Cathrin
Fischer, Katrin
Beisel, Christian
Borgwardt, Karsten
Benenson, Yaakov
Jeschek, Markus
author_sort Höllerer, Simon
collection PubMed
description Predicting effects of gene regulatory elements (GREs) is a longstanding challenge in biology. Machine learning may address this, but requires large datasets linking GREs to their quantitative function. However, experimental methods to generate such datasets are either application-specific or technically complex and error-prone. Here, we introduce DNA-based phenotypic recording as a widely applicable, practicable approach to generate large-scale sequence-function datasets. We use a site-specific recombinase to directly record a GRE’s effect in DNA, enabling readout of both sequence and quantitative function for extremely large GRE-sets via next-generation sequencing. We record translation kinetics of over 300,000 bacterial ribosome binding sites (RBSs) in >2.7 million sequence-function pairs in a single experiment. Further, we introduce a deep learning approach employing ensembling and uncertainty modelling that predicts RBS function with high accuracy, outperforming state-of-the-art methods. DNA-based phenotypic recording combined with deep learning represents a major advance in our ability to predict function from genetic sequence.
format Online
Article
Text
id pubmed-7363850
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-73638502020-07-20 Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping Höllerer, Simon Papaxanthos, Laetitia Gumpinger, Anja Cathrin Fischer, Katrin Beisel, Christian Borgwardt, Karsten Benenson, Yaakov Jeschek, Markus Nat Commun Article Predicting effects of gene regulatory elements (GREs) is a longstanding challenge in biology. Machine learning may address this, but requires large datasets linking GREs to their quantitative function. However, experimental methods to generate such datasets are either application-specific or technically complex and error-prone. Here, we introduce DNA-based phenotypic recording as a widely applicable, practicable approach to generate large-scale sequence-function datasets. We use a site-specific recombinase to directly record a GRE’s effect in DNA, enabling readout of both sequence and quantitative function for extremely large GRE-sets via next-generation sequencing. We record translation kinetics of over 300,000 bacterial ribosome binding sites (RBSs) in >2.7 million sequence-function pairs in a single experiment. Further, we introduce a deep learning approach employing ensembling and uncertainty modelling that predicts RBS function with high accuracy, outperforming state-of-the-art methods. DNA-based phenotypic recording combined with deep learning represents a major advance in our ability to predict function from genetic sequence. Nature Publishing Group UK 2020-07-15 /pmc/articles/PMC7363850/ /pubmed/32669542 http://dx.doi.org/10.1038/s41467-020-17222-4 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Höllerer, Simon
Papaxanthos, Laetitia
Gumpinger, Anja Cathrin
Fischer, Katrin
Beisel, Christian
Borgwardt, Karsten
Benenson, Yaakov
Jeschek, Markus
Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping
title Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping
title_full Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping
title_fullStr Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping
title_full_unstemmed Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping
title_short Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping
title_sort large-scale dna-based phenotypic recording and deep learning enable highly accurate sequence-function mapping
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7363850/
https://www.ncbi.nlm.nih.gov/pubmed/32669542
http://dx.doi.org/10.1038/s41467-020-17222-4
work_keys_str_mv AT hollerersimon largescalednabasedphenotypicrecordinganddeeplearningenablehighlyaccuratesequencefunctionmapping
AT papaxanthoslaetitia largescalednabasedphenotypicrecordinganddeeplearningenablehighlyaccuratesequencefunctionmapping
AT gumpingeranjacathrin largescalednabasedphenotypicrecordinganddeeplearningenablehighlyaccuratesequencefunctionmapping
AT fischerkatrin largescalednabasedphenotypicrecordinganddeeplearningenablehighlyaccuratesequencefunctionmapping
AT beiselchristian largescalednabasedphenotypicrecordinganddeeplearningenablehighlyaccuratesequencefunctionmapping
AT borgwardtkarsten largescalednabasedphenotypicrecordinganddeeplearningenablehighlyaccuratesequencefunctionmapping
AT benensonyaakov largescalednabasedphenotypicrecordinganddeeplearningenablehighlyaccuratesequencefunctionmapping
AT jeschekmarkus largescalednabasedphenotypicrecordinganddeeplearningenablehighlyaccuratesequencefunctionmapping