Cargando…

Effective gene expression prediction from sequence by integrating long-range interactions

How noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications in human genetics depend on improved solutions. Here, we report substantially improved gene expression prediction accuracy from DNA sequences through the use of a de...

Descripción completa

Detalles Bibliográficos
Autores principales: Avsec, Žiga, Agarwal, Vikram, Visentin, Daniel, Ledsam, Joseph R., Grabska-Barwinska, Agnieszka, Taylor, Kyle R., Assael, Yannis, Jumper, John, Kohli, Pushmeet, Kelley, David R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group US 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8490152/
https://www.ncbi.nlm.nih.gov/pubmed/34608324
http://dx.doi.org/10.1038/s41592-021-01252-x
_version_ 1784578468466393088
author Avsec, Žiga
Agarwal, Vikram
Visentin, Daniel
Ledsam, Joseph R.
Grabska-Barwinska, Agnieszka
Taylor, Kyle R.
Assael, Yannis
Jumper, John
Kohli, Pushmeet
Kelley, David R.
author_facet Avsec, Žiga
Agarwal, Vikram
Visentin, Daniel
Ledsam, Joseph R.
Grabska-Barwinska, Agnieszka
Taylor, Kyle R.
Assael, Yannis
Jumper, John
Kohli, Pushmeet
Kelley, David R.
author_sort Avsec, Žiga
collection PubMed
description How noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications in human genetics depend on improved solutions. Here, we report substantially improved gene expression prediction accuracy from DNA sequences through the use of a deep learning architecture, called Enformer, that is able to integrate information from long-range interactions (up to 100 kb away) in the genome. This improvement yielded more accurate variant effect predictions on gene expression for both natural genetic variants and saturation mutagenesis measured by massively parallel reporter assays. Furthermore, Enformer learned to predict enhancer–promoter interactions directly from the DNA sequence competitively with methods that take direct experimental data as input. We expect that these advances will enable more effective fine-mapping of human disease associations and provide a framework to interpret cis-regulatory evolution.
format Online
Article
Text
id pubmed-8490152
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group US
record_format MEDLINE/PubMed
spelling pubmed-84901522021-10-14 Effective gene expression prediction from sequence by integrating long-range interactions Avsec, Žiga Agarwal, Vikram Visentin, Daniel Ledsam, Joseph R. Grabska-Barwinska, Agnieszka Taylor, Kyle R. Assael, Yannis Jumper, John Kohli, Pushmeet Kelley, David R. Nat Methods Article How noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications in human genetics depend on improved solutions. Here, we report substantially improved gene expression prediction accuracy from DNA sequences through the use of a deep learning architecture, called Enformer, that is able to integrate information from long-range interactions (up to 100 kb away) in the genome. This improvement yielded more accurate variant effect predictions on gene expression for both natural genetic variants and saturation mutagenesis measured by massively parallel reporter assays. Furthermore, Enformer learned to predict enhancer–promoter interactions directly from the DNA sequence competitively with methods that take direct experimental data as input. We expect that these advances will enable more effective fine-mapping of human disease associations and provide a framework to interpret cis-regulatory evolution. Nature Publishing Group US 2021-10-04 2021 /pmc/articles/PMC8490152/ /pubmed/34608324 http://dx.doi.org/10.1038/s41592-021-01252-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Avsec, Žiga
Agarwal, Vikram
Visentin, Daniel
Ledsam, Joseph R.
Grabska-Barwinska, Agnieszka
Taylor, Kyle R.
Assael, Yannis
Jumper, John
Kohli, Pushmeet
Kelley, David R.
Effective gene expression prediction from sequence by integrating long-range interactions
title Effective gene expression prediction from sequence by integrating long-range interactions
title_full Effective gene expression prediction from sequence by integrating long-range interactions
title_fullStr Effective gene expression prediction from sequence by integrating long-range interactions
title_full_unstemmed Effective gene expression prediction from sequence by integrating long-range interactions
title_short Effective gene expression prediction from sequence by integrating long-range interactions
title_sort effective gene expression prediction from sequence by integrating long-range interactions
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8490152/
https://www.ncbi.nlm.nih.gov/pubmed/34608324
http://dx.doi.org/10.1038/s41592-021-01252-x
work_keys_str_mv AT avsecziga effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions
AT agarwalvikram effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions
AT visentindaniel effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions
AT ledsamjosephr effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions
AT grabskabarwinskaagnieszka effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions
AT taylorkyler effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions
AT assaelyannis effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions
AT jumperjohn effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions
AT kohlipushmeet effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions
AT kelleydavidr effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions