Cargando…

Telescoping bimodal latent Dirichlet allocation to identify expression QTLs across tissues

Expression quantitative trait loci (eQTLs), or single-nucleotide polymorphisms that affect average gene expression levels, provide important insights into context-specific gene regulation. Classic eQTL analyses use one-to-one association tests, which test gene–variant pairs individually and ignore c...

Descripción completa

Detalles Bibliográficos
Autores principales: Gewirtz, Ariel DH, Townes, F William, Engelhardt, Barbara E
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Life Science Alliance LLC 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9387650/
https://www.ncbi.nlm.nih.gov/pubmed/35977827
http://dx.doi.org/10.26508/lsa.202101297
_version_ 1784770059723341824
author Gewirtz, Ariel DH
Townes, F William
Engelhardt, Barbara E
author_facet Gewirtz, Ariel DH
Townes, F William
Engelhardt, Barbara E
author_sort Gewirtz, Ariel DH
collection PubMed
description Expression quantitative trait loci (eQTLs), or single-nucleotide polymorphisms that affect average gene expression levels, provide important insights into context-specific gene regulation. Classic eQTL analyses use one-to-one association tests, which test gene–variant pairs individually and ignore correlations induced by gene regulatory networks and linkage disequilibrium. Probabilistic topic models, such as latent Dirichlet allocation, estimate latent topics for a collection of count observations. Prior multimodal frameworks that bridge genotype and expression data assume matched sample numbers between modalities. However, many data sets have a nested structure where one individual has several associated gene expression samples and a single germline genotype vector. Here, we build a telescoping bimodal latent Dirichlet allocation (TBLDA) framework to learn shared topics across gene expression and genotype data that allows multiple RNA sequencing samples to correspond to a single individual’s genotype. By using raw count data, our model avoids possible adulteration via normalization procedures. Ancestral structure is captured in a genotype-specific latent space, effectively removing it from shared components. Using GTEx v8 expression data across 10 tissues and genotype data, we show that the estimated topics capture meaningful and robust biological signal in both modalities and identify associations within and across tissue types. We identify 4,645 cis-eQTLs and 995 trans-eQTLs by conducting eQTL mapping between the most informative features in each topic. Our TBLDA model is able to identify associations using raw sequencing count data when the samples in two separate data modalities are matched one-to-many, as is often the case in biological data. Our code is freely available at https://github.com/gewirtz/TBLDA.
format Online
Article
Text
id pubmed-9387650
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Life Science Alliance LLC
record_format MEDLINE/PubMed
spelling pubmed-93876502022-09-02 Telescoping bimodal latent Dirichlet allocation to identify expression QTLs across tissues Gewirtz, Ariel DH Townes, F William Engelhardt, Barbara E Life Sci Alliance Research Articles Expression quantitative trait loci (eQTLs), or single-nucleotide polymorphisms that affect average gene expression levels, provide important insights into context-specific gene regulation. Classic eQTL analyses use one-to-one association tests, which test gene–variant pairs individually and ignore correlations induced by gene regulatory networks and linkage disequilibrium. Probabilistic topic models, such as latent Dirichlet allocation, estimate latent topics for a collection of count observations. Prior multimodal frameworks that bridge genotype and expression data assume matched sample numbers between modalities. However, many data sets have a nested structure where one individual has several associated gene expression samples and a single germline genotype vector. Here, we build a telescoping bimodal latent Dirichlet allocation (TBLDA) framework to learn shared topics across gene expression and genotype data that allows multiple RNA sequencing samples to correspond to a single individual’s genotype. By using raw count data, our model avoids possible adulteration via normalization procedures. Ancestral structure is captured in a genotype-specific latent space, effectively removing it from shared components. Using GTEx v8 expression data across 10 tissues and genotype data, we show that the estimated topics capture meaningful and robust biological signal in both modalities and identify associations within and across tissue types. We identify 4,645 cis-eQTLs and 995 trans-eQTLs by conducting eQTL mapping between the most informative features in each topic. Our TBLDA model is able to identify associations using raw sequencing count data when the samples in two separate data modalities are matched one-to-many, as is often the case in biological data. Our code is freely available at https://github.com/gewirtz/TBLDA. Life Science Alliance LLC 2022-08-17 /pmc/articles/PMC9387650/ /pubmed/35977827 http://dx.doi.org/10.26508/lsa.202101297 Text en © 2022 Gewirtz et al. https://creativecommons.org/licenses/by/4.0/This article is available under a Creative Commons License (Attribution 4.0 International, as described at https://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Articles
Gewirtz, Ariel DH
Townes, F William
Engelhardt, Barbara E
Telescoping bimodal latent Dirichlet allocation to identify expression QTLs across tissues
title Telescoping bimodal latent Dirichlet allocation to identify expression QTLs across tissues
title_full Telescoping bimodal latent Dirichlet allocation to identify expression QTLs across tissues
title_fullStr Telescoping bimodal latent Dirichlet allocation to identify expression QTLs across tissues
title_full_unstemmed Telescoping bimodal latent Dirichlet allocation to identify expression QTLs across tissues
title_short Telescoping bimodal latent Dirichlet allocation to identify expression QTLs across tissues
title_sort telescoping bimodal latent dirichlet allocation to identify expression qtls across tissues
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9387650/
https://www.ncbi.nlm.nih.gov/pubmed/35977827
http://dx.doi.org/10.26508/lsa.202101297
work_keys_str_mv AT gewirtzarieldh telescopingbimodallatentdirichletallocationtoidentifyexpressionqtlsacrosstissues
AT townesfwilliam telescopingbimodallatentdirichletallocationtoidentifyexpressionqtlsacrosstissues
AT engelhardtbarbarae telescopingbimodallatentdirichletallocationtoidentifyexpressionqtlsacrosstissues