Cargando…

Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data

Transcriptome wide association studies (TWAS) can be used as a powerful method to identify and interpret the underlying biological mechanisms behind GWAS by mapping gene expression levels with phenotypes. In TWAS, gene expression is often imputed from individual-level genotypes of regulatory variant...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liu, Andrew E., Kang, Hyun Min
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8830793/ https://www.ncbi.nlm.nih.gov/pubmed/35100255 http://dx.doi.org/10.1371/journal.pgen.1009571

_version_	1784648351149457408
author	Liu, Andrew E. Kang, Hyun Min
author_facet	Liu, Andrew E. Kang, Hyun Min
author_sort	Liu, Andrew E.
collection	PubMed
description	Transcriptome wide association studies (TWAS) can be used as a powerful method to identify and interpret the underlying biological mechanisms behind GWAS by mapping gene expression levels with phenotypes. In TWAS, gene expression is often imputed from individual-level genotypes of regulatory variants identified from external resources, such as Genotype-Tissue Expression (GTEx) Project. In this setting, a straightforward approach to impute expression levels of a specific tissue is to use the model trained from the same tissue type. When multiple tissues are available for the same subjects, it has been demonstrated that training imputation models from multiple tissue types improves the accuracy because of shared eQTLs between the tissues and increase in effective sample size. However, existing joint-tissue methods require access of genotype and expression data across all tissues. Moreover, they cannot leverage the abundance of various expression datasets across various tissues for non-overlapping individuals. Here, we explore the optimal way to combine imputed levels across training models from multiple tissues and datasets in a flexible manner using summary-level data. Our proposed method (SWAM) combines arbitrary number of transcriptome imputation models to linearly optimize the imputation accuracy given a target tissue. By integrating models across tissues and/or individuals, SWAM can improve the accuracy of transcriptome imputation or to improve power to TWAS while only requiring individual-level data from a single reference cohort. To evaluate the accuracy of SWAM, we combined 49 tissue-specific gene expression imputation models from the GTEx Project as well as from a large eQTL study of Depression Susceptibility Genes and Networks (DGN) Project and tested imputation accuracy in GEUVADIS lymphoblastoid cell lines samples. We also extend our meta-imputation method to meta-TWAS to leverage multiple tissues in TWAS analysis with summary-level statistics. Our results capitalize on the importance of integrating multiple tissues to unravel regulatory impacts of genetic variants on complex traits.
format	Online Article Text
id	pubmed-8830793
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-88307932022-02-11 Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data Liu, Andrew E. Kang, Hyun Min PLoS Genet Research Article Transcriptome wide association studies (TWAS) can be used as a powerful method to identify and interpret the underlying biological mechanisms behind GWAS by mapping gene expression levels with phenotypes. In TWAS, gene expression is often imputed from individual-level genotypes of regulatory variants identified from external resources, such as Genotype-Tissue Expression (GTEx) Project. In this setting, a straightforward approach to impute expression levels of a specific tissue is to use the model trained from the same tissue type. When multiple tissues are available for the same subjects, it has been demonstrated that training imputation models from multiple tissue types improves the accuracy because of shared eQTLs between the tissues and increase in effective sample size. However, existing joint-tissue methods require access of genotype and expression data across all tissues. Moreover, they cannot leverage the abundance of various expression datasets across various tissues for non-overlapping individuals. Here, we explore the optimal way to combine imputed levels across training models from multiple tissues and datasets in a flexible manner using summary-level data. Our proposed method (SWAM) combines arbitrary number of transcriptome imputation models to linearly optimize the imputation accuracy given a target tissue. By integrating models across tissues and/or individuals, SWAM can improve the accuracy of transcriptome imputation or to improve power to TWAS while only requiring individual-level data from a single reference cohort. To evaluate the accuracy of SWAM, we combined 49 tissue-specific gene expression imputation models from the GTEx Project as well as from a large eQTL study of Depression Susceptibility Genes and Networks (DGN) Project and tested imputation accuracy in GEUVADIS lymphoblastoid cell lines samples. We also extend our meta-imputation method to meta-TWAS to leverage multiple tissues in TWAS analysis with summary-level statistics. Our results capitalize on the importance of integrating multiple tissues to unravel regulatory impacts of genetic variants on complex traits. Public Library of Science 2022-01-31 /pmc/articles/PMC8830793/ /pubmed/35100255 http://dx.doi.org/10.1371/journal.pgen.1009571 Text en © 2022 Liu, Kang https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Liu, Andrew E. Kang, Hyun Min Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data
title	Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data
title_full	Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data
title_fullStr	Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data
title_full_unstemmed	Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data
title_short	Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data
title_sort	meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8830793/ https://www.ncbi.nlm.nih.gov/pubmed/35100255 http://dx.doi.org/10.1371/journal.pgen.1009571
work_keys_str_mv	AT liuandrewe metaimputationoftranscriptomefromgenotypesacrossmultipledatasetsbyleveragingpubliclyavailablesummaryleveldata AT kanghyunmin metaimputationoftranscriptomefromgenotypesacrossmultipledatasetsbyleveragingpubliclyavailablesummaryleveldata

Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data

Ejemplares similares