Cargando…
A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules
Genome wide association studies (GWASs) for complex traits have implicated thousands of genetic loci. Most GWAS-nominated variants lie in noncoding regions, complicating the systematic translation of these findings into functional understanding. Here, we leverage convolutional neural networks to ass...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9041867/ https://www.ncbi.nlm.nih.gov/pubmed/35421087 http://dx.doi.org/10.1371/journal.pcbi.1010028 |
_version_ | 1784694586129514496 |
---|---|
author | Abdalla, Moustafa Abdalla, Mohamed |
author_facet | Abdalla, Moustafa Abdalla, Mohamed |
author_sort | Abdalla, Moustafa |
collection | PubMed |
description | Genome wide association studies (GWASs) for complex traits have implicated thousands of genetic loci. Most GWAS-nominated variants lie in noncoding regions, complicating the systematic translation of these findings into functional understanding. Here, we leverage convolutional neural networks to assist in this challenge. Our computational framework, peaBrain, models the transcriptional machinery of a tissue as a two-stage process: first, predicting the mean tissue specific abundance of all genes and second, incorporating the transcriptomic consequences of genotype variation to predict individual abundance on a subject-by-subject basis. We demonstrate that peaBrain accounts for the majority (>50%) of variance observed in mean transcript abundance across most tissues and outperforms regularized linear models in predicting the consequences of individual genotype variation. We highlight the validity of the peaBrain model by calculating non-coding impact scores that correlate with nucleotide evolutionary constraint that are also predictive of disease-associated variation and allele-specific transcription factor binding. We further show how these tissue-specific peaBrain scores can be leveraged to pinpoint functional tissues underlying complex traits, outperforming methods that depend on colocalization of eQTL and GWAS signals. We subsequently: (a) derive continuous dense embeddings of genes for downstream applications; (b) highlight the utility of the model in predicting transcriptomic impact of small molecules and shRNA (on par with in vitro experimental replication of external test sets); (c) explore how peaBrain can be used to model difficult-to-study processes (such as neural induction); and (d) identify putatively functional eQTLs that are missed by high-throughput experimental approaches. |
format | Online Article Text |
id | pubmed-9041867 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-90418672022-04-27 A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules Abdalla, Moustafa Abdalla, Mohamed PLoS Comput Biol Research Article Genome wide association studies (GWASs) for complex traits have implicated thousands of genetic loci. Most GWAS-nominated variants lie in noncoding regions, complicating the systematic translation of these findings into functional understanding. Here, we leverage convolutional neural networks to assist in this challenge. Our computational framework, peaBrain, models the transcriptional machinery of a tissue as a two-stage process: first, predicting the mean tissue specific abundance of all genes and second, incorporating the transcriptomic consequences of genotype variation to predict individual abundance on a subject-by-subject basis. We demonstrate that peaBrain accounts for the majority (>50%) of variance observed in mean transcript abundance across most tissues and outperforms regularized linear models in predicting the consequences of individual genotype variation. We highlight the validity of the peaBrain model by calculating non-coding impact scores that correlate with nucleotide evolutionary constraint that are also predictive of disease-associated variation and allele-specific transcription factor binding. We further show how these tissue-specific peaBrain scores can be leveraged to pinpoint functional tissues underlying complex traits, outperforming methods that depend on colocalization of eQTL and GWAS signals. We subsequently: (a) derive continuous dense embeddings of genes for downstream applications; (b) highlight the utility of the model in predicting transcriptomic impact of small molecules and shRNA (on par with in vitro experimental replication of external test sets); (c) explore how peaBrain can be used to model difficult-to-study processes (such as neural induction); and (d) identify putatively functional eQTLs that are missed by high-throughput experimental approaches. Public Library of Science 2022-04-14 /pmc/articles/PMC9041867/ /pubmed/35421087 http://dx.doi.org/10.1371/journal.pcbi.1010028 Text en © 2022 Abdalla, Abdalla https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Abdalla, Moustafa Abdalla, Mohamed A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules |
title | A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules |
title_full | A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules |
title_fullStr | A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules |
title_full_unstemmed | A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules |
title_short | A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules |
title_sort | general framework for predicting the transcriptomic consequences of non-coding variation and small molecules |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9041867/ https://www.ncbi.nlm.nih.gov/pubmed/35421087 http://dx.doi.org/10.1371/journal.pcbi.1010028 |
work_keys_str_mv | AT abdallamoustafa ageneralframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules AT abdallamohamed ageneralframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules AT abdallamoustafa generalframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules AT abdallamohamed generalframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules |