Cargando…

Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores

Publicly available genome-wide association studies (GWAS) summary statistics exhibit uneven quality, which can impact the validity of follow-up analyses. First, we present an overview of possible misspecifications that come with GWAS summary statistics. Then, in both simulations and real-data analys...

Descripción completa

Detalles Bibliográficos
Autores principales: Privé, Florian, Arbel, Julyan, Aschard, Hugues, Vilhjálmsson, Bjarni J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9465343/
https://www.ncbi.nlm.nih.gov/pubmed/36105883
http://dx.doi.org/10.1016/j.xhgg.2022.100136
_version_ 1784787775513427968
author Privé, Florian
Arbel, Julyan
Aschard, Hugues
Vilhjálmsson, Bjarni J.
author_facet Privé, Florian
Arbel, Julyan
Aschard, Hugues
Vilhjálmsson, Bjarni J.
author_sort Privé, Florian
collection PubMed
description Publicly available genome-wide association studies (GWAS) summary statistics exhibit uneven quality, which can impact the validity of follow-up analyses. First, we present an overview of possible misspecifications that come with GWAS summary statistics. Then, in both simulations and real-data analyses, we show that additional information such as imputation INFO scores, allele frequencies, and per-variant sample sizes in GWAS summary statistics can be used to detect possible issues and correct for misspecifications in the GWAS summary statistics. One important motivation for us is to improve the predictive performance of polygenic scores built from these summary statistics. Unfortunately, owing to the lack of reporting standards for GWAS summary statistics, this additional information is not systematically reported. We also show that using well-matched linkage disequilibrium (LD) references can improve model fit and translate into more accurate prediction. Finally, we discuss how to make polygenic score methods such as lassosum and LDpred2 more robust to these misspecifications to improve their predictive power.
format Online
Article
Text
id pubmed-9465343
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-94653432022-09-13 Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores Privé, Florian Arbel, Julyan Aschard, Hugues Vilhjálmsson, Bjarni J. HGG Adv Article Publicly available genome-wide association studies (GWAS) summary statistics exhibit uneven quality, which can impact the validity of follow-up analyses. First, we present an overview of possible misspecifications that come with GWAS summary statistics. Then, in both simulations and real-data analyses, we show that additional information such as imputation INFO scores, allele frequencies, and per-variant sample sizes in GWAS summary statistics can be used to detect possible issues and correct for misspecifications in the GWAS summary statistics. One important motivation for us is to improve the predictive performance of polygenic scores built from these summary statistics. Unfortunately, owing to the lack of reporting standards for GWAS summary statistics, this additional information is not systematically reported. We also show that using well-matched linkage disequilibrium (LD) references can improve model fit and translate into more accurate prediction. Finally, we discuss how to make polygenic score methods such as lassosum and LDpred2 more robust to these misspecifications to improve their predictive power. Elsevier 2022-08-18 /pmc/articles/PMC9465343/ /pubmed/36105883 http://dx.doi.org/10.1016/j.xhgg.2022.100136 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Privé, Florian
Arbel, Julyan
Aschard, Hugues
Vilhjálmsson, Bjarni J.
Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores
title Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores
title_full Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores
title_fullStr Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores
title_full_unstemmed Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores
title_short Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores
title_sort identifying and correcting for misspecifications in gwas summary statistics and polygenic scores
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9465343/
https://www.ncbi.nlm.nih.gov/pubmed/36105883
http://dx.doi.org/10.1016/j.xhgg.2022.100136
work_keys_str_mv AT priveflorian identifyingandcorrectingformisspecificationsingwassummarystatisticsandpolygenicscores
AT arbeljulyan identifyingandcorrectingformisspecificationsingwassummarystatisticsandpolygenicscores
AT aschardhugues identifyingandcorrectingformisspecificationsingwassummarystatisticsandpolygenicscores
AT vilhjalmssonbjarnij identifyingandcorrectingformisspecificationsingwassummarystatisticsandpolygenicscores