Cargando…

Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data

Transcriptome wide association studies (TWAS) can be used as a powerful method to identify and interpret the underlying biological mechanisms behind GWAS by mapping gene expression levels with phenotypes. In TWAS, gene expression is often imputed from individual-level genotypes of regulatory variant...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Andrew E., Kang, Hyun Min
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8830793/
https://www.ncbi.nlm.nih.gov/pubmed/35100255
http://dx.doi.org/10.1371/journal.pgen.1009571
_version_ 1784648351149457408
author Liu, Andrew E.
Kang, Hyun Min
author_facet Liu, Andrew E.
Kang, Hyun Min
author_sort Liu, Andrew E.
collection PubMed
description Transcriptome wide association studies (TWAS) can be used as a powerful method to identify and interpret the underlying biological mechanisms behind GWAS by mapping gene expression levels with phenotypes. In TWAS, gene expression is often imputed from individual-level genotypes of regulatory variants identified from external resources, such as Genotype-Tissue Expression (GTEx) Project. In this setting, a straightforward approach to impute expression levels of a specific tissue is to use the model trained from the same tissue type. When multiple tissues are available for the same subjects, it has been demonstrated that training imputation models from multiple tissue types improves the accuracy because of shared eQTLs between the tissues and increase in effective sample size. However, existing joint-tissue methods require access of genotype and expression data across all tissues. Moreover, they cannot leverage the abundance of various expression datasets across various tissues for non-overlapping individuals. Here, we explore the optimal way to combine imputed levels across training models from multiple tissues and datasets in a flexible manner using summary-level data. Our proposed method (SWAM) combines arbitrary number of transcriptome imputation models to linearly optimize the imputation accuracy given a target tissue. By integrating models across tissues and/or individuals, SWAM can improve the accuracy of transcriptome imputation or to improve power to TWAS while only requiring individual-level data from a single reference cohort. To evaluate the accuracy of SWAM, we combined 49 tissue-specific gene expression imputation models from the GTEx Project as well as from a large eQTL study of Depression Susceptibility Genes and Networks (DGN) Project and tested imputation accuracy in GEUVADIS lymphoblastoid cell lines samples. We also extend our meta-imputation method to meta-TWAS to leverage multiple tissues in TWAS analysis with summary-level statistics. Our results capitalize on the importance of integrating multiple tissues to unravel regulatory impacts of genetic variants on complex traits.
format Online
Article
Text
id pubmed-8830793
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-88307932022-02-11 Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data Liu, Andrew E. Kang, Hyun Min PLoS Genet Research Article Transcriptome wide association studies (TWAS) can be used as a powerful method to identify and interpret the underlying biological mechanisms behind GWAS by mapping gene expression levels with phenotypes. In TWAS, gene expression is often imputed from individual-level genotypes of regulatory variants identified from external resources, such as Genotype-Tissue Expression (GTEx) Project. In this setting, a straightforward approach to impute expression levels of a specific tissue is to use the model trained from the same tissue type. When multiple tissues are available for the same subjects, it has been demonstrated that training imputation models from multiple tissue types improves the accuracy because of shared eQTLs between the tissues and increase in effective sample size. However, existing joint-tissue methods require access of genotype and expression data across all tissues. Moreover, they cannot leverage the abundance of various expression datasets across various tissues for non-overlapping individuals. Here, we explore the optimal way to combine imputed levels across training models from multiple tissues and datasets in a flexible manner using summary-level data. Our proposed method (SWAM) combines arbitrary number of transcriptome imputation models to linearly optimize the imputation accuracy given a target tissue. By integrating models across tissues and/or individuals, SWAM can improve the accuracy of transcriptome imputation or to improve power to TWAS while only requiring individual-level data from a single reference cohort. To evaluate the accuracy of SWAM, we combined 49 tissue-specific gene expression imputation models from the GTEx Project as well as from a large eQTL study of Depression Susceptibility Genes and Networks (DGN) Project and tested imputation accuracy in GEUVADIS lymphoblastoid cell lines samples. We also extend our meta-imputation method to meta-TWAS to leverage multiple tissues in TWAS analysis with summary-level statistics. Our results capitalize on the importance of integrating multiple tissues to unravel regulatory impacts of genetic variants on complex traits. Public Library of Science 2022-01-31 /pmc/articles/PMC8830793/ /pubmed/35100255 http://dx.doi.org/10.1371/journal.pgen.1009571 Text en © 2022 Liu, Kang https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Liu, Andrew E.
Kang, Hyun Min
Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data
title Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data
title_full Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data
title_fullStr Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data
title_full_unstemmed Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data
title_short Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data
title_sort meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8830793/
https://www.ncbi.nlm.nih.gov/pubmed/35100255
http://dx.doi.org/10.1371/journal.pgen.1009571
work_keys_str_mv AT liuandrewe metaimputationoftranscriptomefromgenotypesacrossmultipledatasetsbyleveragingpubliclyavailablesummaryleveldata
AT kanghyunmin metaimputationoftranscriptomefromgenotypesacrossmultipledatasetsbyleveragingpubliclyavailablesummaryleveldata