Cargando…
Effective gene expression prediction from sequence by integrating long-range interactions
How noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications in human genetics depend on improved solutions. Here, we report substantially improved gene expression prediction accuracy from DNA sequences through the use of a de...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group US
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8490152/ https://www.ncbi.nlm.nih.gov/pubmed/34608324 http://dx.doi.org/10.1038/s41592-021-01252-x |
_version_ | 1784578468466393088 |
---|---|
author | Avsec, Žiga Agarwal, Vikram Visentin, Daniel Ledsam, Joseph R. Grabska-Barwinska, Agnieszka Taylor, Kyle R. Assael, Yannis Jumper, John Kohli, Pushmeet Kelley, David R. |
author_facet | Avsec, Žiga Agarwal, Vikram Visentin, Daniel Ledsam, Joseph R. Grabska-Barwinska, Agnieszka Taylor, Kyle R. Assael, Yannis Jumper, John Kohli, Pushmeet Kelley, David R. |
author_sort | Avsec, Žiga |
collection | PubMed |
description | How noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications in human genetics depend on improved solutions. Here, we report substantially improved gene expression prediction accuracy from DNA sequences through the use of a deep learning architecture, called Enformer, that is able to integrate information from long-range interactions (up to 100 kb away) in the genome. This improvement yielded more accurate variant effect predictions on gene expression for both natural genetic variants and saturation mutagenesis measured by massively parallel reporter assays. Furthermore, Enformer learned to predict enhancer–promoter interactions directly from the DNA sequence competitively with methods that take direct experimental data as input. We expect that these advances will enable more effective fine-mapping of human disease associations and provide a framework to interpret cis-regulatory evolution. |
format | Online Article Text |
id | pubmed-8490152 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group US |
record_format | MEDLINE/PubMed |
spelling | pubmed-84901522021-10-14 Effective gene expression prediction from sequence by integrating long-range interactions Avsec, Žiga Agarwal, Vikram Visentin, Daniel Ledsam, Joseph R. Grabska-Barwinska, Agnieszka Taylor, Kyle R. Assael, Yannis Jumper, John Kohli, Pushmeet Kelley, David R. Nat Methods Article How noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications in human genetics depend on improved solutions. Here, we report substantially improved gene expression prediction accuracy from DNA sequences through the use of a deep learning architecture, called Enformer, that is able to integrate information from long-range interactions (up to 100 kb away) in the genome. This improvement yielded more accurate variant effect predictions on gene expression for both natural genetic variants and saturation mutagenesis measured by massively parallel reporter assays. Furthermore, Enformer learned to predict enhancer–promoter interactions directly from the DNA sequence competitively with methods that take direct experimental data as input. We expect that these advances will enable more effective fine-mapping of human disease associations and provide a framework to interpret cis-regulatory evolution. Nature Publishing Group US 2021-10-04 2021 /pmc/articles/PMC8490152/ /pubmed/34608324 http://dx.doi.org/10.1038/s41592-021-01252-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Avsec, Žiga Agarwal, Vikram Visentin, Daniel Ledsam, Joseph R. Grabska-Barwinska, Agnieszka Taylor, Kyle R. Assael, Yannis Jumper, John Kohli, Pushmeet Kelley, David R. Effective gene expression prediction from sequence by integrating long-range interactions |
title | Effective gene expression prediction from sequence by integrating long-range interactions |
title_full | Effective gene expression prediction from sequence by integrating long-range interactions |
title_fullStr | Effective gene expression prediction from sequence by integrating long-range interactions |
title_full_unstemmed | Effective gene expression prediction from sequence by integrating long-range interactions |
title_short | Effective gene expression prediction from sequence by integrating long-range interactions |
title_sort | effective gene expression prediction from sequence by integrating long-range interactions |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8490152/ https://www.ncbi.nlm.nih.gov/pubmed/34608324 http://dx.doi.org/10.1038/s41592-021-01252-x |
work_keys_str_mv | AT avsecziga effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions AT agarwalvikram effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions AT visentindaniel effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions AT ledsamjosephr effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions AT grabskabarwinskaagnieszka effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions AT taylorkyler effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions AT assaelyannis effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions AT jumperjohn effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions AT kohlipushmeet effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions AT kelleydavidr effectivegeneexpressionpredictionfromsequencebyintegratinglongrangeinteractions |