Cargando…

A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework

BACKGROUND: There is considerable evidence that many complex traits have a partially shared genetic basis, termed pleiotropy. It is therefore useful to consider integrating genome-wide association study (GWAS) data across several traits, usually at the summary statistic level. A major practical chal...

Descripción completa

Detalles Bibliográficos
Autores principales: LeBlanc, Marissa, Zuber, Verena, Thompson, Wesley K., Andreassen, Ole A., Frigessi, Arnoldo, Andreassen, Bettina Kulle
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6019513/
https://www.ncbi.nlm.nih.gov/pubmed/29940862
http://dx.doi.org/10.1186/s12864-018-4859-7
_version_ 1783335140154408960
author LeBlanc, Marissa
Zuber, Verena
Thompson, Wesley K.
Andreassen, Ole A.
Frigessi, Arnoldo
Andreassen, Bettina Kulle
author_facet LeBlanc, Marissa
Zuber, Verena
Thompson, Wesley K.
Andreassen, Ole A.
Frigessi, Arnoldo
Andreassen, Bettina Kulle
author_sort LeBlanc, Marissa
collection PubMed
description BACKGROUND: There is considerable evidence that many complex traits have a partially shared genetic basis, termed pleiotropy. It is therefore useful to consider integrating genome-wide association study (GWAS) data across several traits, usually at the summary statistic level. A major practical challenge arises when these GWAS have overlapping subjects. This is particularly an issue when estimating pleiotropy using methods that condition the significance of one trait on the signficance of a second, such as the covariate-modulated false discovery rate (cmfdr). RESULTS: We propose a method for correcting for sample overlap at the summary statistic level. We quantify the expected amount of spurious correlation between the summary statistics from two GWAS due to sample overlap, and use this estimated correlation in a simple linear correction that adjusts the joint distribution of test statistics from the two GWAS. The correction is appropriate for GWAS with case-control or quantitative outcomes. Our simulations and data example show that without correcting for sample overlap, the cmfdr is not properly controlled, leading to an excessive number of false discoveries and an excessive false discovery proportion. Our correction for sample overlap is effective in that it restores proper control of the false discovery rate, at very little loss in power. CONCLUSIONS: With our proposed correction, it is possible to integrate GWAS summary statistics with overlapping samples in a statistical framework that is dependent on the joint distribution of the two GWAS. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-4859-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6019513
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-60195132018-07-06 A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework LeBlanc, Marissa Zuber, Verena Thompson, Wesley K. Andreassen, Ole A. Frigessi, Arnoldo Andreassen, Bettina Kulle BMC Genomics Methodology Article BACKGROUND: There is considerable evidence that many complex traits have a partially shared genetic basis, termed pleiotropy. It is therefore useful to consider integrating genome-wide association study (GWAS) data across several traits, usually at the summary statistic level. A major practical challenge arises when these GWAS have overlapping subjects. This is particularly an issue when estimating pleiotropy using methods that condition the significance of one trait on the signficance of a second, such as the covariate-modulated false discovery rate (cmfdr). RESULTS: We propose a method for correcting for sample overlap at the summary statistic level. We quantify the expected amount of spurious correlation between the summary statistics from two GWAS due to sample overlap, and use this estimated correlation in a simple linear correction that adjusts the joint distribution of test statistics from the two GWAS. The correction is appropriate for GWAS with case-control or quantitative outcomes. Our simulations and data example show that without correcting for sample overlap, the cmfdr is not properly controlled, leading to an excessive number of false discoveries and an excessive false discovery proportion. Our correction for sample overlap is effective in that it restores proper control of the false discovery rate, at very little loss in power. CONCLUSIONS: With our proposed correction, it is possible to integrate GWAS summary statistics with overlapping samples in a statistical framework that is dependent on the joint distribution of the two GWAS. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-4859-7) contains supplementary material, which is available to authorized users. BioMed Central 2018-06-25 /pmc/articles/PMC6019513/ /pubmed/29940862 http://dx.doi.org/10.1186/s12864-018-4859-7 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
LeBlanc, Marissa
Zuber, Verena
Thompson, Wesley K.
Andreassen, Ole A.
Frigessi, Arnoldo
Andreassen, Bettina Kulle
A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework
title A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework
title_full A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework
title_fullStr A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework
title_full_unstemmed A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework
title_short A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework
title_sort correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6019513/
https://www.ncbi.nlm.nih.gov/pubmed/29940862
http://dx.doi.org/10.1186/s12864-018-4859-7
work_keys_str_mv AT leblancmarissa acorrectionforsampleoverlapingenomewideassociationstudiesinapolygenicpleiotropyinformedframework
AT zuberverena acorrectionforsampleoverlapingenomewideassociationstudiesinapolygenicpleiotropyinformedframework
AT thompsonwesleyk acorrectionforsampleoverlapingenomewideassociationstudiesinapolygenicpleiotropyinformedframework
AT andreassenolea acorrectionforsampleoverlapingenomewideassociationstudiesinapolygenicpleiotropyinformedframework
AT acorrectionforsampleoverlapingenomewideassociationstudiesinapolygenicpleiotropyinformedframework
AT frigessiarnoldo acorrectionforsampleoverlapingenomewideassociationstudiesinapolygenicpleiotropyinformedframework
AT andreassenbettinakulle acorrectionforsampleoverlapingenomewideassociationstudiesinapolygenicpleiotropyinformedframework
AT leblancmarissa correctionforsampleoverlapingenomewideassociationstudiesinapolygenicpleiotropyinformedframework
AT zuberverena correctionforsampleoverlapingenomewideassociationstudiesinapolygenicpleiotropyinformedframework
AT thompsonwesleyk correctionforsampleoverlapingenomewideassociationstudiesinapolygenicpleiotropyinformedframework
AT andreassenolea correctionforsampleoverlapingenomewideassociationstudiesinapolygenicpleiotropyinformedframework
AT correctionforsampleoverlapingenomewideassociationstudiesinapolygenicpleiotropyinformedframework
AT frigessiarnoldo correctionforsampleoverlapingenomewideassociationstudiesinapolygenicpleiotropyinformedframework
AT andreassenbettinakulle correctionforsampleoverlapingenomewideassociationstudiesinapolygenicpleiotropyinformedframework