Cargando…

EraSOR: a software tool to eliminate inflation caused by sample overlap in polygenic score analyses

BACKGROUND: Polygenic risk score (PRS) analyses are now routinely applied across biomedical research. However, as PRS studies grow in size, there is an increased risk of sample overlap between the genome-wide association study (GWAS) from which the PRS is derived and the “target sample,” in which PR...

Descripción completa

Detalles Bibliográficos
Autores principales: Choi, Shing Wan, Mak, Timothy Shin Heng, Hoggart, Clive J, O'Reilly, Paul F
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10273836/
https://www.ncbi.nlm.nih.gov/pubmed/37326441
http://dx.doi.org/10.1093/gigascience/giad043
_version_ 1785059727004139520
author Choi, Shing Wan
Mak, Timothy Shin Heng
Hoggart, Clive J
O'Reilly, Paul F
author_facet Choi, Shing Wan
Mak, Timothy Shin Heng
Hoggart, Clive J
O'Reilly, Paul F
author_sort Choi, Shing Wan
collection PubMed
description BACKGROUND: Polygenic risk score (PRS) analyses are now routinely applied across biomedical research. However, as PRS studies grow in size, there is an increased risk of sample overlap between the genome-wide association study (GWAS) from which the PRS is derived and the “target sample,” in which PRSs are computed and hypotheses are tested. Despite the wide recognition of the sample overlap problem, its potential impact on the results from PRS studies has not yet been quantified, and no analytical solution has been provided. FINDINGS: Here, we first conduct a comprehensive investigation into the scale of the sample overlap problem, finding that PRS results can be substantially inflated even in the presence of minimal overlap. Next, we introduce a method and software, EraSOR (Erase Sample Overlap and Relatedness), which eliminates the inflation caused by sample overlap (and close relatedness) in almost all settings tested here. CONCLUSIONS: EraSOR could be useful in PRS studies (with target sample >1,000) similar to those investigated here, either (i) to mitigate the potential effects of known or unknown intercohort overlap and close relatedness or (ii) as a sensitivity tool to highlight the possible presence of sample overlap before its direct removal, when possible, or else to provide a lower bound on PRS analysis results after accounting for potential sample overlap.
format Online
Article
Text
id pubmed-10273836
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-102738362023-06-17 EraSOR: a software tool to eliminate inflation caused by sample overlap in polygenic score analyses Choi, Shing Wan Mak, Timothy Shin Heng Hoggart, Clive J O'Reilly, Paul F Gigascience Technical Note BACKGROUND: Polygenic risk score (PRS) analyses are now routinely applied across biomedical research. However, as PRS studies grow in size, there is an increased risk of sample overlap between the genome-wide association study (GWAS) from which the PRS is derived and the “target sample,” in which PRSs are computed and hypotheses are tested. Despite the wide recognition of the sample overlap problem, its potential impact on the results from PRS studies has not yet been quantified, and no analytical solution has been provided. FINDINGS: Here, we first conduct a comprehensive investigation into the scale of the sample overlap problem, finding that PRS results can be substantially inflated even in the presence of minimal overlap. Next, we introduce a method and software, EraSOR (Erase Sample Overlap and Relatedness), which eliminates the inflation caused by sample overlap (and close relatedness) in almost all settings tested here. CONCLUSIONS: EraSOR could be useful in PRS studies (with target sample >1,000) similar to those investigated here, either (i) to mitigate the potential effects of known or unknown intercohort overlap and close relatedness or (ii) as a sensitivity tool to highlight the possible presence of sample overlap before its direct removal, when possible, or else to provide a lower bound on PRS analysis results after accounting for potential sample overlap. Oxford University Press 2023-06-16 /pmc/articles/PMC10273836/ /pubmed/37326441 http://dx.doi.org/10.1093/gigascience/giad043 Text en © The Author(s) 2023. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Choi, Shing Wan
Mak, Timothy Shin Heng
Hoggart, Clive J
O'Reilly, Paul F
EraSOR: a software tool to eliminate inflation caused by sample overlap in polygenic score analyses
title EraSOR: a software tool to eliminate inflation caused by sample overlap in polygenic score analyses
title_full EraSOR: a software tool to eliminate inflation caused by sample overlap in polygenic score analyses
title_fullStr EraSOR: a software tool to eliminate inflation caused by sample overlap in polygenic score analyses
title_full_unstemmed EraSOR: a software tool to eliminate inflation caused by sample overlap in polygenic score analyses
title_short EraSOR: a software tool to eliminate inflation caused by sample overlap in polygenic score analyses
title_sort erasor: a software tool to eliminate inflation caused by sample overlap in polygenic score analyses
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10273836/
https://www.ncbi.nlm.nih.gov/pubmed/37326441
http://dx.doi.org/10.1093/gigascience/giad043
work_keys_str_mv AT choishingwan erasorasoftwaretooltoeliminateinflationcausedbysampleoverlapinpolygenicscoreanalyses
AT maktimothyshinheng erasorasoftwaretooltoeliminateinflationcausedbysampleoverlapinpolygenicscoreanalyses
AT hoggartclivej erasorasoftwaretooltoeliminateinflationcausedbysampleoverlapinpolygenicscoreanalyses
AT oreillypaulf erasorasoftwaretooltoeliminateinflationcausedbysampleoverlapinpolygenicscoreanalyses