Cargando…
Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data
Transcriptome wide association studies (TWAS) can be used as a powerful method to identify and interpret the underlying biological mechanisms behind GWAS by mapping gene expression levels with phenotypes. In TWAS, gene expression is often imputed from individual-level genotypes of regulatory variant...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8830793/ https://www.ncbi.nlm.nih.gov/pubmed/35100255 http://dx.doi.org/10.1371/journal.pgen.1009571 |
_version_ | 1784648351149457408 |
---|---|
author | Liu, Andrew E. Kang, Hyun Min |
author_facet | Liu, Andrew E. Kang, Hyun Min |
author_sort | Liu, Andrew E. |
collection | PubMed |
description | Transcriptome wide association studies (TWAS) can be used as a powerful method to identify and interpret the underlying biological mechanisms behind GWAS by mapping gene expression levels with phenotypes. In TWAS, gene expression is often imputed from individual-level genotypes of regulatory variants identified from external resources, such as Genotype-Tissue Expression (GTEx) Project. In this setting, a straightforward approach to impute expression levels of a specific tissue is to use the model trained from the same tissue type. When multiple tissues are available for the same subjects, it has been demonstrated that training imputation models from multiple tissue types improves the accuracy because of shared eQTLs between the tissues and increase in effective sample size. However, existing joint-tissue methods require access of genotype and expression data across all tissues. Moreover, they cannot leverage the abundance of various expression datasets across various tissues for non-overlapping individuals. Here, we explore the optimal way to combine imputed levels across training models from multiple tissues and datasets in a flexible manner using summary-level data. Our proposed method (SWAM) combines arbitrary number of transcriptome imputation models to linearly optimize the imputation accuracy given a target tissue. By integrating models across tissues and/or individuals, SWAM can improve the accuracy of transcriptome imputation or to improve power to TWAS while only requiring individual-level data from a single reference cohort. To evaluate the accuracy of SWAM, we combined 49 tissue-specific gene expression imputation models from the GTEx Project as well as from a large eQTL study of Depression Susceptibility Genes and Networks (DGN) Project and tested imputation accuracy in GEUVADIS lymphoblastoid cell lines samples. We also extend our meta-imputation method to meta-TWAS to leverage multiple tissues in TWAS analysis with summary-level statistics. Our results capitalize on the importance of integrating multiple tissues to unravel regulatory impacts of genetic variants on complex traits. |
format | Online Article Text |
id | pubmed-8830793 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-88307932022-02-11 Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data Liu, Andrew E. Kang, Hyun Min PLoS Genet Research Article Transcriptome wide association studies (TWAS) can be used as a powerful method to identify and interpret the underlying biological mechanisms behind GWAS by mapping gene expression levels with phenotypes. In TWAS, gene expression is often imputed from individual-level genotypes of regulatory variants identified from external resources, such as Genotype-Tissue Expression (GTEx) Project. In this setting, a straightforward approach to impute expression levels of a specific tissue is to use the model trained from the same tissue type. When multiple tissues are available for the same subjects, it has been demonstrated that training imputation models from multiple tissue types improves the accuracy because of shared eQTLs between the tissues and increase in effective sample size. However, existing joint-tissue methods require access of genotype and expression data across all tissues. Moreover, they cannot leverage the abundance of various expression datasets across various tissues for non-overlapping individuals. Here, we explore the optimal way to combine imputed levels across training models from multiple tissues and datasets in a flexible manner using summary-level data. Our proposed method (SWAM) combines arbitrary number of transcriptome imputation models to linearly optimize the imputation accuracy given a target tissue. By integrating models across tissues and/or individuals, SWAM can improve the accuracy of transcriptome imputation or to improve power to TWAS while only requiring individual-level data from a single reference cohort. To evaluate the accuracy of SWAM, we combined 49 tissue-specific gene expression imputation models from the GTEx Project as well as from a large eQTL study of Depression Susceptibility Genes and Networks (DGN) Project and tested imputation accuracy in GEUVADIS lymphoblastoid cell lines samples. We also extend our meta-imputation method to meta-TWAS to leverage multiple tissues in TWAS analysis with summary-level statistics. Our results capitalize on the importance of integrating multiple tissues to unravel regulatory impacts of genetic variants on complex traits. Public Library of Science 2022-01-31 /pmc/articles/PMC8830793/ /pubmed/35100255 http://dx.doi.org/10.1371/journal.pgen.1009571 Text en © 2022 Liu, Kang https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Liu, Andrew E. Kang, Hyun Min Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data |
title | Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data |
title_full | Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data |
title_fullStr | Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data |
title_full_unstemmed | Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data |
title_short | Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data |
title_sort | meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8830793/ https://www.ncbi.nlm.nih.gov/pubmed/35100255 http://dx.doi.org/10.1371/journal.pgen.1009571 |
work_keys_str_mv | AT liuandrewe metaimputationoftranscriptomefromgenotypesacrossmultipledatasetsbyleveragingpubliclyavailablesummaryleveldata AT kanghyunmin metaimputationoftranscriptomefromgenotypesacrossmultipledatasetsbyleveragingpubliclyavailablesummaryleveldata |