Cargando…

Leveraging correlations between variants in polygenic risk scores to detect heterogeneity in GWAS cohorts

Evidence from both GWAS and clinical observation has suggested that certain psychiatric, metabolic, and autoimmune diseases are heterogeneous, comprising multiple subtypes with distinct genomic etiologies and Polygenic Risk Scores (PRS). However, the presence of subtypes within many phenotypes is fr...

Descripción completa

Detalles Bibliográficos
Autores principales: Yuan, Jie, Xing, Henry, Lamy, Alexandre Louis, Lencz, Todd, Pe’er, Itsik
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7529195/
https://www.ncbi.nlm.nih.gov/pubmed/32956347
http://dx.doi.org/10.1371/journal.pgen.1009015
_version_ 1783589385067823104
author Yuan, Jie
Xing, Henry
Lamy, Alexandre Louis
Lencz, Todd
Pe’er, Itsik
author_facet Yuan, Jie
Xing, Henry
Lamy, Alexandre Louis
Lencz, Todd
Pe’er, Itsik
author_sort Yuan, Jie
collection PubMed
description Evidence from both GWAS and clinical observation has suggested that certain psychiatric, metabolic, and autoimmune diseases are heterogeneous, comprising multiple subtypes with distinct genomic etiologies and Polygenic Risk Scores (PRS). However, the presence of subtypes within many phenotypes is frequently unknown. We present CLiP (Correlated Liability Predictors), a method to detect heterogeneity in single GWAS cohorts. CLiP calculates a weighted sum of correlations between SNPs contributing to a PRS on the case/control liability scale. We demonstrate mathematically and through simulation that among i.i.d. homogeneous cases generated by a liability threshold model, significant anti-correlations are expected between otherwise independent predictors due to ascertainment on the hidden liability score. In the presence of heterogeneity from distinct etiologies, confounding by covariates, or mislabeling, these correlation patterns are altered predictably. We further extend our method to two additional association study designs: CLiP-X for quantitative predictors in applications such as transcriptome-wide association, and CLiP-Y for quantitative phenotypes, where there is no clear distinction between cases and controls. Through simulations, we demonstrate that CLiP and its extensions reliably distinguish between homogeneous and heterogeneous cohorts when the PRS explains as low as 3% of variance on the liability scale and cohorts comprise 50, 000 − 100, 000 samples, an increasingly practical size for modern GWAS. We apply CLiP to heterogeneity detection in schizophrenia cohorts totaling > 50, 000 cases and controls collected by the Psychiatric Genomics Consortium. We observe significant heterogeneity in mega-analysis of the combined PGC data (p-value 8.54 × 0(−4)), as well as in individual cohorts meta-analyzed using Fisher’s method (p-value 0.03), based on significantly associated variants. We also apply CLiP-Y to detect heterogeneity in neuroticism in over 10, 000 individuals from the UK Biobank and detect heterogeneity with a p-value of 1.68 × 10(−9). Scores were not significantly reduced when partitioning by known subclusters (“Depression” and “Worry”), suggesting that these factors are not the primary source of observed heterogeneity.
format Online
Article
Text
id pubmed-7529195
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-75291952020-10-02 Leveraging correlations between variants in polygenic risk scores to detect heterogeneity in GWAS cohorts Yuan, Jie Xing, Henry Lamy, Alexandre Louis Lencz, Todd Pe’er, Itsik PLoS Genet Research Article Evidence from both GWAS and clinical observation has suggested that certain psychiatric, metabolic, and autoimmune diseases are heterogeneous, comprising multiple subtypes with distinct genomic etiologies and Polygenic Risk Scores (PRS). However, the presence of subtypes within many phenotypes is frequently unknown. We present CLiP (Correlated Liability Predictors), a method to detect heterogeneity in single GWAS cohorts. CLiP calculates a weighted sum of correlations between SNPs contributing to a PRS on the case/control liability scale. We demonstrate mathematically and through simulation that among i.i.d. homogeneous cases generated by a liability threshold model, significant anti-correlations are expected between otherwise independent predictors due to ascertainment on the hidden liability score. In the presence of heterogeneity from distinct etiologies, confounding by covariates, or mislabeling, these correlation patterns are altered predictably. We further extend our method to two additional association study designs: CLiP-X for quantitative predictors in applications such as transcriptome-wide association, and CLiP-Y for quantitative phenotypes, where there is no clear distinction between cases and controls. Through simulations, we demonstrate that CLiP and its extensions reliably distinguish between homogeneous and heterogeneous cohorts when the PRS explains as low as 3% of variance on the liability scale and cohorts comprise 50, 000 − 100, 000 samples, an increasingly practical size for modern GWAS. We apply CLiP to heterogeneity detection in schizophrenia cohorts totaling > 50, 000 cases and controls collected by the Psychiatric Genomics Consortium. We observe significant heterogeneity in mega-analysis of the combined PGC data (p-value 8.54 × 0(−4)), as well as in individual cohorts meta-analyzed using Fisher’s method (p-value 0.03), based on significantly associated variants. We also apply CLiP-Y to detect heterogeneity in neuroticism in over 10, 000 individuals from the UK Biobank and detect heterogeneity with a p-value of 1.68 × 10(−9). Scores were not significantly reduced when partitioning by known subclusters (“Depression” and “Worry”), suggesting that these factors are not the primary source of observed heterogeneity. Public Library of Science 2020-09-21 /pmc/articles/PMC7529195/ /pubmed/32956347 http://dx.doi.org/10.1371/journal.pgen.1009015 Text en © 2020 Yuan et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Yuan, Jie
Xing, Henry
Lamy, Alexandre Louis
Lencz, Todd
Pe’er, Itsik
Leveraging correlations between variants in polygenic risk scores to detect heterogeneity in GWAS cohorts
title Leveraging correlations between variants in polygenic risk scores to detect heterogeneity in GWAS cohorts
title_full Leveraging correlations between variants in polygenic risk scores to detect heterogeneity in GWAS cohorts
title_fullStr Leveraging correlations between variants in polygenic risk scores to detect heterogeneity in GWAS cohorts
title_full_unstemmed Leveraging correlations between variants in polygenic risk scores to detect heterogeneity in GWAS cohorts
title_short Leveraging correlations between variants in polygenic risk scores to detect heterogeneity in GWAS cohorts
title_sort leveraging correlations between variants in polygenic risk scores to detect heterogeneity in gwas cohorts
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7529195/
https://www.ncbi.nlm.nih.gov/pubmed/32956347
http://dx.doi.org/10.1371/journal.pgen.1009015
work_keys_str_mv AT yuanjie leveragingcorrelationsbetweenvariantsinpolygenicriskscorestodetectheterogeneityingwascohorts
AT xinghenry leveragingcorrelationsbetweenvariantsinpolygenicriskscorestodetectheterogeneityingwascohorts
AT lamyalexandrelouis leveragingcorrelationsbetweenvariantsinpolygenicriskscorestodetectheterogeneityingwascohorts
AT leveragingcorrelationsbetweenvariantsinpolygenicriskscorestodetectheterogeneityingwascohorts
AT lencztodd leveragingcorrelationsbetweenvariantsinpolygenicriskscorestodetectheterogeneityingwascohorts
AT peeritsik leveragingcorrelationsbetweenvariantsinpolygenicriskscorestodetectheterogeneityingwascohorts