Cargando…

A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules

Genome wide association studies (GWASs) for complex traits have implicated thousands of genetic loci. Most GWAS-nominated variants lie in noncoding regions, complicating the systematic translation of these findings into functional understanding. Here, we leverage convolutional neural networks to ass...

Descripción completa

Detalles Bibliográficos
Autores principales: Abdalla, Moustafa, Abdalla, Mohamed
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9041867/
https://www.ncbi.nlm.nih.gov/pubmed/35421087
http://dx.doi.org/10.1371/journal.pcbi.1010028
_version_ 1784694586129514496
author Abdalla, Moustafa
Abdalla, Mohamed
author_facet Abdalla, Moustafa
Abdalla, Mohamed
author_sort Abdalla, Moustafa
collection PubMed
description Genome wide association studies (GWASs) for complex traits have implicated thousands of genetic loci. Most GWAS-nominated variants lie in noncoding regions, complicating the systematic translation of these findings into functional understanding. Here, we leverage convolutional neural networks to assist in this challenge. Our computational framework, peaBrain, models the transcriptional machinery of a tissue as a two-stage process: first, predicting the mean tissue specific abundance of all genes and second, incorporating the transcriptomic consequences of genotype variation to predict individual abundance on a subject-by-subject basis. We demonstrate that peaBrain accounts for the majority (>50%) of variance observed in mean transcript abundance across most tissues and outperforms regularized linear models in predicting the consequences of individual genotype variation. We highlight the validity of the peaBrain model by calculating non-coding impact scores that correlate with nucleotide evolutionary constraint that are also predictive of disease-associated variation and allele-specific transcription factor binding. We further show how these tissue-specific peaBrain scores can be leveraged to pinpoint functional tissues underlying complex traits, outperforming methods that depend on colocalization of eQTL and GWAS signals. We subsequently: (a) derive continuous dense embeddings of genes for downstream applications; (b) highlight the utility of the model in predicting transcriptomic impact of small molecules and shRNA (on par with in vitro experimental replication of external test sets); (c) explore how peaBrain can be used to model difficult-to-study processes (such as neural induction); and (d) identify putatively functional eQTLs that are missed by high-throughput experimental approaches.
format Online
Article
Text
id pubmed-9041867
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-90418672022-04-27 A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules Abdalla, Moustafa Abdalla, Mohamed PLoS Comput Biol Research Article Genome wide association studies (GWASs) for complex traits have implicated thousands of genetic loci. Most GWAS-nominated variants lie in noncoding regions, complicating the systematic translation of these findings into functional understanding. Here, we leverage convolutional neural networks to assist in this challenge. Our computational framework, peaBrain, models the transcriptional machinery of a tissue as a two-stage process: first, predicting the mean tissue specific abundance of all genes and second, incorporating the transcriptomic consequences of genotype variation to predict individual abundance on a subject-by-subject basis. We demonstrate that peaBrain accounts for the majority (>50%) of variance observed in mean transcript abundance across most tissues and outperforms regularized linear models in predicting the consequences of individual genotype variation. We highlight the validity of the peaBrain model by calculating non-coding impact scores that correlate with nucleotide evolutionary constraint that are also predictive of disease-associated variation and allele-specific transcription factor binding. We further show how these tissue-specific peaBrain scores can be leveraged to pinpoint functional tissues underlying complex traits, outperforming methods that depend on colocalization of eQTL and GWAS signals. We subsequently: (a) derive continuous dense embeddings of genes for downstream applications; (b) highlight the utility of the model in predicting transcriptomic impact of small molecules and shRNA (on par with in vitro experimental replication of external test sets); (c) explore how peaBrain can be used to model difficult-to-study processes (such as neural induction); and (d) identify putatively functional eQTLs that are missed by high-throughput experimental approaches. Public Library of Science 2022-04-14 /pmc/articles/PMC9041867/ /pubmed/35421087 http://dx.doi.org/10.1371/journal.pcbi.1010028 Text en © 2022 Abdalla, Abdalla https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Abdalla, Moustafa
Abdalla, Mohamed
A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules
title A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules
title_full A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules
title_fullStr A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules
title_full_unstemmed A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules
title_short A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules
title_sort general framework for predicting the transcriptomic consequences of non-coding variation and small molecules
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9041867/
https://www.ncbi.nlm.nih.gov/pubmed/35421087
http://dx.doi.org/10.1371/journal.pcbi.1010028
work_keys_str_mv AT abdallamoustafa ageneralframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules
AT abdallamohamed ageneralframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules
AT abdallamoustafa generalframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules
AT abdallamohamed generalframeworkforpredictingthetranscriptomicconsequencesofnoncodingvariationandsmallmolecules