Cargando…

Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk

A key challenge for human genetics, precision medicine, and evolutionary biology is deciphering the regulatory code of gene expression, including understanding the transcriptional effects of genome variation. Yet this is extremely difficult due to the enormous scale of the noncoding mutation space....

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Jian, Theesfeld, Chandra L., Yao, Kevin, Chen, Kathleen M., Wong, Aaron K., Troyanskaya, Olga G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6094955/
https://www.ncbi.nlm.nih.gov/pubmed/30013180
http://dx.doi.org/10.1038/s41588-018-0160-6
_version_ 1783347892787871744
author Zhou, Jian
Theesfeld, Chandra L.
Yao, Kevin
Chen, Kathleen M.
Wong, Aaron K.
Troyanskaya, Olga G.
author_facet Zhou, Jian
Theesfeld, Chandra L.
Yao, Kevin
Chen, Kathleen M.
Wong, Aaron K.
Troyanskaya, Olga G.
author_sort Zhou, Jian
collection PubMed
description A key challenge for human genetics, precision medicine, and evolutionary biology is deciphering the regulatory code of gene expression, including understanding the transcriptional effects of genome variation. Yet this is extremely difficult due to the enormous scale of the noncoding mutation space. We developed a deep-learning-based framework, ExPecto, that can accurately predict, ab initio from DNA sequence, the tissue-specific transcriptional effects of mutations, including rare or never observed. We prioritized causal variants within disease/trait-associated loci from all publicly-available GWAS studies, and experimentally validated predictions for four immune-related diseases. Exploiting the scalability of ExPecto, we characterized the regulatory mutation space for all human Pol II-transcribed genes by in silico saturation mutagenesis, profiling >140 million promoter-proximal mutations. This enables probing of evolutionary constraints on gene expression and ab initio prediction of mutation disease effect, making ExPecto an end-to-end computational framework for in silico prediction of expression and disease risk.
format Online
Article
Text
id pubmed-6094955
institution National Center for Biotechnology Information
language English
publishDate 2018
record_format MEDLINE/PubMed
spelling pubmed-60949552019-01-16 Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk Zhou, Jian Theesfeld, Chandra L. Yao, Kevin Chen, Kathleen M. Wong, Aaron K. Troyanskaya, Olga G. Nat Genet Article A key challenge for human genetics, precision medicine, and evolutionary biology is deciphering the regulatory code of gene expression, including understanding the transcriptional effects of genome variation. Yet this is extremely difficult due to the enormous scale of the noncoding mutation space. We developed a deep-learning-based framework, ExPecto, that can accurately predict, ab initio from DNA sequence, the tissue-specific transcriptional effects of mutations, including rare or never observed. We prioritized causal variants within disease/trait-associated loci from all publicly-available GWAS studies, and experimentally validated predictions for four immune-related diseases. Exploiting the scalability of ExPecto, we characterized the regulatory mutation space for all human Pol II-transcribed genes by in silico saturation mutagenesis, profiling >140 million promoter-proximal mutations. This enables probing of evolutionary constraints on gene expression and ab initio prediction of mutation disease effect, making ExPecto an end-to-end computational framework for in silico prediction of expression and disease risk. 2018-07-16 2018-08 /pmc/articles/PMC6094955/ /pubmed/30013180 http://dx.doi.org/10.1038/s41588-018-0160-6 Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms
spellingShingle Article
Zhou, Jian
Theesfeld, Chandra L.
Yao, Kevin
Chen, Kathleen M.
Wong, Aaron K.
Troyanskaya, Olga G.
Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk
title Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk
title_full Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk
title_fullStr Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk
title_full_unstemmed Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk
title_short Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk
title_sort deep learning sequence-based ab initio prediction of variant effects on expression and disease risk
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6094955/
https://www.ncbi.nlm.nih.gov/pubmed/30013180
http://dx.doi.org/10.1038/s41588-018-0160-6
work_keys_str_mv AT zhoujian deeplearningsequencebasedabinitiopredictionofvarianteffectsonexpressionanddiseaserisk
AT theesfeldchandral deeplearningsequencebasedabinitiopredictionofvarianteffectsonexpressionanddiseaserisk
AT yaokevin deeplearningsequencebasedabinitiopredictionofvarianteffectsonexpressionanddiseaserisk
AT chenkathleenm deeplearningsequencebasedabinitiopredictionofvarianteffectsonexpressionanddiseaserisk
AT wongaaronk deeplearningsequencebasedabinitiopredictionofvarianteffectsonexpressionanddiseaserisk
AT troyanskayaolgag deeplearningsequencebasedabinitiopredictionofvarianteffectsonexpressionanddiseaserisk