Cargando…

PhenoSpD: an integrated toolkit for phenotypic correlation estimation and multiple testing correction using GWAS summary statistics

BACKGROUND: Identifying phenotypic correlations between complex traits and diseases can provide useful etiological insights. Restricted access to much individual-level phenotype data makes it difficult to estimate large-scale phenotypic correlation across the human phenome. Two state-of-the-art meth...

Descripción completa

Detalles Bibliográficos
Autores principales: Zheng, Jie, Richardson, Tom G, Millard, Louise A C, Hemani, Gibran, Elsworth, Benjamin L, Raistrick, Christopher A, Vilhjalmsson, Bjarni, Neale, Benjamin M, Haycock, Philip C, Smith, George Davey, Gaunt, Tom R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6109640/
https://www.ncbi.nlm.nih.gov/pubmed/30165448
http://dx.doi.org/10.1093/gigascience/giy090
_version_ 1783350358369632256
author Zheng, Jie
Richardson, Tom G
Millard, Louise A C
Hemani, Gibran
Elsworth, Benjamin L
Raistrick, Christopher A
Vilhjalmsson, Bjarni
Neale, Benjamin M
Haycock, Philip C
Smith, George Davey
Gaunt, Tom R
author_facet Zheng, Jie
Richardson, Tom G
Millard, Louise A C
Hemani, Gibran
Elsworth, Benjamin L
Raistrick, Christopher A
Vilhjalmsson, Bjarni
Neale, Benjamin M
Haycock, Philip C
Smith, George Davey
Gaunt, Tom R
author_sort Zheng, Jie
collection PubMed
description BACKGROUND: Identifying phenotypic correlations between complex traits and diseases can provide useful etiological insights. Restricted access to much individual-level phenotype data makes it difficult to estimate large-scale phenotypic correlation across the human phenome. Two state-of-the-art methods, metaCCA and LD score regression, provide an alternative approach to estimate phenotypic correlation using only genome-wide association study (GWAS) summary results. RESULTS: Here, we present an integrated R toolkit, PhenoSpD, to use LD score regression to estimate phenotypic correlations using GWAS summary statistics and to utilize the estimated phenotypic correlations to inform correction of multiple testing for complex human traits using the spectral decomposition of matrices (SpD). The simulations suggest that it is possible to identify nonindependence of phenotypes using samples with partial overlap; as overlap decreases, the estimated phenotypic correlations will attenuate toward zero and multiple testing correction will be more stringent than in perfectly overlapping samples. Also, in contrast to LD score regression, metaCCA will provide approximate genetic correlations rather than phenotypic correlation, which limits its application for multiple testing correction. In a case study, PhenoSpD using UK Biobank GWAS results suggested 399.6 independent tests among 487 human traits, which is close to the 352.4 independent tests estimated using true phenotypic correlation. We further applied PhenoSpD to an estimated 5,618 pair-wise phenotypic correlations among 107 metabolites using GWAS summary statistics from Kettunen's publication and PhenoSpD suggested the equivalent of 33.5 independent tests for these metabolites. CONCLUSIONS: PhenoSpD extends the use of summary-level results, providing a simple and conservative way to reduce dimensionality for complex human traits using GWAS summary statistics. This is particularly valuable in the age of large-scale biobank and consortia studies, where GWAS results are much more accessible than individual-level data.
format Online
Article
Text
id pubmed-6109640
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61096402018-08-30 PhenoSpD: an integrated toolkit for phenotypic correlation estimation and multiple testing correction using GWAS summary statistics Zheng, Jie Richardson, Tom G Millard, Louise A C Hemani, Gibran Elsworth, Benjamin L Raistrick, Christopher A Vilhjalmsson, Bjarni Neale, Benjamin M Haycock, Philip C Smith, George Davey Gaunt, Tom R Gigascience Technical Note BACKGROUND: Identifying phenotypic correlations between complex traits and diseases can provide useful etiological insights. Restricted access to much individual-level phenotype data makes it difficult to estimate large-scale phenotypic correlation across the human phenome. Two state-of-the-art methods, metaCCA and LD score regression, provide an alternative approach to estimate phenotypic correlation using only genome-wide association study (GWAS) summary results. RESULTS: Here, we present an integrated R toolkit, PhenoSpD, to use LD score regression to estimate phenotypic correlations using GWAS summary statistics and to utilize the estimated phenotypic correlations to inform correction of multiple testing for complex human traits using the spectral decomposition of matrices (SpD). The simulations suggest that it is possible to identify nonindependence of phenotypes using samples with partial overlap; as overlap decreases, the estimated phenotypic correlations will attenuate toward zero and multiple testing correction will be more stringent than in perfectly overlapping samples. Also, in contrast to LD score regression, metaCCA will provide approximate genetic correlations rather than phenotypic correlation, which limits its application for multiple testing correction. In a case study, PhenoSpD using UK Biobank GWAS results suggested 399.6 independent tests among 487 human traits, which is close to the 352.4 independent tests estimated using true phenotypic correlation. We further applied PhenoSpD to an estimated 5,618 pair-wise phenotypic correlations among 107 metabolites using GWAS summary statistics from Kettunen's publication and PhenoSpD suggested the equivalent of 33.5 independent tests for these metabolites. CONCLUSIONS: PhenoSpD extends the use of summary-level results, providing a simple and conservative way to reduce dimensionality for complex human traits using GWAS summary statistics. This is particularly valuable in the age of large-scale biobank and consortia studies, where GWAS results are much more accessible than individual-level data. Oxford University Press 2018-08-24 /pmc/articles/PMC6109640/ /pubmed/30165448 http://dx.doi.org/10.1093/gigascience/giy090 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Zheng, Jie
Richardson, Tom G
Millard, Louise A C
Hemani, Gibran
Elsworth, Benjamin L
Raistrick, Christopher A
Vilhjalmsson, Bjarni
Neale, Benjamin M
Haycock, Philip C
Smith, George Davey
Gaunt, Tom R
PhenoSpD: an integrated toolkit for phenotypic correlation estimation and multiple testing correction using GWAS summary statistics
title PhenoSpD: an integrated toolkit for phenotypic correlation estimation and multiple testing correction using GWAS summary statistics
title_full PhenoSpD: an integrated toolkit for phenotypic correlation estimation and multiple testing correction using GWAS summary statistics
title_fullStr PhenoSpD: an integrated toolkit for phenotypic correlation estimation and multiple testing correction using GWAS summary statistics
title_full_unstemmed PhenoSpD: an integrated toolkit for phenotypic correlation estimation and multiple testing correction using GWAS summary statistics
title_short PhenoSpD: an integrated toolkit for phenotypic correlation estimation and multiple testing correction using GWAS summary statistics
title_sort phenospd: an integrated toolkit for phenotypic correlation estimation and multiple testing correction using gwas summary statistics
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6109640/
https://www.ncbi.nlm.nih.gov/pubmed/30165448
http://dx.doi.org/10.1093/gigascience/giy090
work_keys_str_mv AT zhengjie phenospdanintegratedtoolkitforphenotypiccorrelationestimationandmultipletestingcorrectionusinggwassummarystatistics
AT richardsontomg phenospdanintegratedtoolkitforphenotypiccorrelationestimationandmultipletestingcorrectionusinggwassummarystatistics
AT millardlouiseac phenospdanintegratedtoolkitforphenotypiccorrelationestimationandmultipletestingcorrectionusinggwassummarystatistics
AT hemanigibran phenospdanintegratedtoolkitforphenotypiccorrelationestimationandmultipletestingcorrectionusinggwassummarystatistics
AT elsworthbenjaminl phenospdanintegratedtoolkitforphenotypiccorrelationestimationandmultipletestingcorrectionusinggwassummarystatistics
AT raistrickchristophera phenospdanintegratedtoolkitforphenotypiccorrelationestimationandmultipletestingcorrectionusinggwassummarystatistics
AT vilhjalmssonbjarni phenospdanintegratedtoolkitforphenotypiccorrelationestimationandmultipletestingcorrectionusinggwassummarystatistics
AT nealebenjaminm phenospdanintegratedtoolkitforphenotypiccorrelationestimationandmultipletestingcorrectionusinggwassummarystatistics
AT haycockphilipc phenospdanintegratedtoolkitforphenotypiccorrelationestimationandmultipletestingcorrectionusinggwassummarystatistics
AT smithgeorgedavey phenospdanintegratedtoolkitforphenotypiccorrelationestimationandmultipletestingcorrectionusinggwassummarystatistics
AT gaunttomr phenospdanintegratedtoolkitforphenotypiccorrelationestimationandmultipletestingcorrectionusinggwassummarystatistics