Cargando…

Estimating colocalization probability from limited summary statistics

BACKGROUND: Colocalization is a statistical method used in genetics to determine whether the same variant is causal for multiple phenotypes, for example, complex traits and gene expression. It provides stronger mechanistic evidence than shared significance, which can be produced through separate cau...

Descripción completa

Detalles Bibliográficos
Autores principales: King, Emily A., Dunbar, Fengjiao, Davis, Justin Wade, Degner, Jacob F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8130535/
https://www.ncbi.nlm.nih.gov/pubmed/34000989
http://dx.doi.org/10.1186/s12859-021-04170-z
_version_ 1783694548952678400
author King, Emily A.
Dunbar, Fengjiao
Davis, Justin Wade
Degner, Jacob F.
author_facet King, Emily A.
Dunbar, Fengjiao
Davis, Justin Wade
Degner, Jacob F.
author_sort King, Emily A.
collection PubMed
description BACKGROUND: Colocalization is a statistical method used in genetics to determine whether the same variant is causal for multiple phenotypes, for example, complex traits and gene expression. It provides stronger mechanistic evidence than shared significance, which can be produced through separate causal variants in linkage disequilibrium. Current colocalization methods require full summary statistics for both traits, limiting their use with the majority of reported GWAS associations (e.g. GWAS Catalog). We propose a new approximation to the popular coloc method that can be applied when limited summary statistics are available. Our method (POint EstiMation of Colocalization, POEMColoc) imputes missing summary statistics for one or both traits using LD structure in a reference panel, and performs colocalization using the imputed summary statistics. RESULTS: We evaluate the performance of POEMColoc using real (UK Biobank phenotypes and GTEx eQTL) and simulated datasets. We show good correlation between posterior probabilities of colocalization computed from imputed and observed datasets and similar accuracy in simulation. We evaluate scenarios that might reduce performance and show that multiple independent causal variants in a region and imputation from a limited subset of typed variants have a larger effect while mismatched ancestry in the reference panel has a modest effect. Further, we find that POEMColoc is a better approximation of coloc when the imputed association statistics are from a well powered study (e.g., relatively larger sample size or effect size). Applying POEMColoc to estimate colocalization of GWAS Catalog entries and GTEx eQTL, we find evidence for colocalization of 150,000 trait-gene-tissue triplets. CONCLUSIONS: We find that colocalization analysis performed with full summary statistics can be closely approximated when only the summary statistics of the top SNP are available for one or both traits. When applied to the full GWAS Catalog and GTEx eQTL, we find that colocalized trait-gene pairs are enriched in tissues relevant to disease etiology and for matches to approved drug mechanisms. POEMColoc R package is available at https://github.com/AbbVie-ComputationalGenomics/POEMColoc. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04170-z.
format Online
Article
Text
id pubmed-8130535
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-81305352021-05-19 Estimating colocalization probability from limited summary statistics King, Emily A. Dunbar, Fengjiao Davis, Justin Wade Degner, Jacob F. BMC Bioinformatics Methodology Article BACKGROUND: Colocalization is a statistical method used in genetics to determine whether the same variant is causal for multiple phenotypes, for example, complex traits and gene expression. It provides stronger mechanistic evidence than shared significance, which can be produced through separate causal variants in linkage disequilibrium. Current colocalization methods require full summary statistics for both traits, limiting their use with the majority of reported GWAS associations (e.g. GWAS Catalog). We propose a new approximation to the popular coloc method that can be applied when limited summary statistics are available. Our method (POint EstiMation of Colocalization, POEMColoc) imputes missing summary statistics for one or both traits using LD structure in a reference panel, and performs colocalization using the imputed summary statistics. RESULTS: We evaluate the performance of POEMColoc using real (UK Biobank phenotypes and GTEx eQTL) and simulated datasets. We show good correlation between posterior probabilities of colocalization computed from imputed and observed datasets and similar accuracy in simulation. We evaluate scenarios that might reduce performance and show that multiple independent causal variants in a region and imputation from a limited subset of typed variants have a larger effect while mismatched ancestry in the reference panel has a modest effect. Further, we find that POEMColoc is a better approximation of coloc when the imputed association statistics are from a well powered study (e.g., relatively larger sample size or effect size). Applying POEMColoc to estimate colocalization of GWAS Catalog entries and GTEx eQTL, we find evidence for colocalization of 150,000 trait-gene-tissue triplets. CONCLUSIONS: We find that colocalization analysis performed with full summary statistics can be closely approximated when only the summary statistics of the top SNP are available for one or both traits. When applied to the full GWAS Catalog and GTEx eQTL, we find that colocalized trait-gene pairs are enriched in tissues relevant to disease etiology and for matches to approved drug mechanisms. POEMColoc R package is available at https://github.com/AbbVie-ComputationalGenomics/POEMColoc. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04170-z. BioMed Central 2021-05-17 /pmc/articles/PMC8130535/ /pubmed/34000989 http://dx.doi.org/10.1186/s12859-021-04170-z Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
King, Emily A.
Dunbar, Fengjiao
Davis, Justin Wade
Degner, Jacob F.
Estimating colocalization probability from limited summary statistics
title Estimating colocalization probability from limited summary statistics
title_full Estimating colocalization probability from limited summary statistics
title_fullStr Estimating colocalization probability from limited summary statistics
title_full_unstemmed Estimating colocalization probability from limited summary statistics
title_short Estimating colocalization probability from limited summary statistics
title_sort estimating colocalization probability from limited summary statistics
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8130535/
https://www.ncbi.nlm.nih.gov/pubmed/34000989
http://dx.doi.org/10.1186/s12859-021-04170-z
work_keys_str_mv AT kingemilya estimatingcolocalizationprobabilityfromlimitedsummarystatistics
AT dunbarfengjiao estimatingcolocalizationprobabilityfromlimitedsummarystatistics
AT davisjustinwade estimatingcolocalizationprobabilityfromlimitedsummarystatistics
AT degnerjacobf estimatingcolocalizationprobabilityfromlimitedsummarystatistics