Cargando…

PRSice-2: Polygenic Risk Score software for biobank-scale data

BACKGROUND: Polygenic risk score (PRS) analyses have become an integral part of biomedical research, exploited to gain insights into shared aetiology among traits, to control for genomic profile in experimental studies, and to strengthen causal inference, among a range of applications. Substantial e...

Descripción completa

Detalles Bibliográficos
Autores principales: Choi, Shing Wan, O'Reilly, Paul F
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6629542/
https://www.ncbi.nlm.nih.gov/pubmed/31307061
http://dx.doi.org/10.1093/gigascience/giz082
_version_ 1783435117139591168
author Choi, Shing Wan
O'Reilly, Paul F
author_facet Choi, Shing Wan
O'Reilly, Paul F
author_sort Choi, Shing Wan
collection PubMed
description BACKGROUND: Polygenic risk score (PRS) analyses have become an integral part of biomedical research, exploited to gain insights into shared aetiology among traits, to control for genomic profile in experimental studies, and to strengthen causal inference, among a range of applications. Substantial efforts are now devoted to biobank projects to collect large genetic and phenotypic data, providing unprecedented opportunity for genetic discovery and applications. To process the large-scale data provided by such biobank resources, highly efficient and scalable methods and software are required. RESULTS: Here we introduce PRSice-2, an efficient and scalable software program for automating and simplifying PRS analyses on large-scale data. PRSice-2 handles both genotyped and imputed data, provides empirical association P-values free from inflation due to overfitting, supports different inheritance models, and can evaluate multiple continuous and binary target traits simultaneously. We demonstrate that PRSice-2 is dramatically faster and more memory-efficient than PRSice-1 and alternative PRS software, LDpred and lassosum, while having comparable predictive power. CONCLUSION: PRSice-2's combination of efficiency and power will be increasingly important as data sizes grow and as the applications of PRS become more sophisticated, e.g., when incorporated into high-dimensional or gene set–based analyses. PRSice-2 is written in C++, with an R script for plotting, and is freely available for download from http://PRSice.info.
format Online
Article
Text
id pubmed-6629542
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-66295422019-07-18 PRSice-2: Polygenic Risk Score software for biobank-scale data Choi, Shing Wan O'Reilly, Paul F Gigascience Technical Note BACKGROUND: Polygenic risk score (PRS) analyses have become an integral part of biomedical research, exploited to gain insights into shared aetiology among traits, to control for genomic profile in experimental studies, and to strengthen causal inference, among a range of applications. Substantial efforts are now devoted to biobank projects to collect large genetic and phenotypic data, providing unprecedented opportunity for genetic discovery and applications. To process the large-scale data provided by such biobank resources, highly efficient and scalable methods and software are required. RESULTS: Here we introduce PRSice-2, an efficient and scalable software program for automating and simplifying PRS analyses on large-scale data. PRSice-2 handles both genotyped and imputed data, provides empirical association P-values free from inflation due to overfitting, supports different inheritance models, and can evaluate multiple continuous and binary target traits simultaneously. We demonstrate that PRSice-2 is dramatically faster and more memory-efficient than PRSice-1 and alternative PRS software, LDpred and lassosum, while having comparable predictive power. CONCLUSION: PRSice-2's combination of efficiency and power will be increasingly important as data sizes grow and as the applications of PRS become more sophisticated, e.g., when incorporated into high-dimensional or gene set–based analyses. PRSice-2 is written in C++, with an R script for plotting, and is freely available for download from http://PRSice.info. Oxford University Press 2019-07-15 /pmc/articles/PMC6629542/ /pubmed/31307061 http://dx.doi.org/10.1093/gigascience/giz082 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Choi, Shing Wan
O'Reilly, Paul F
PRSice-2: Polygenic Risk Score software for biobank-scale data
title PRSice-2: Polygenic Risk Score software for biobank-scale data
title_full PRSice-2: Polygenic Risk Score software for biobank-scale data
title_fullStr PRSice-2: Polygenic Risk Score software for biobank-scale data
title_full_unstemmed PRSice-2: Polygenic Risk Score software for biobank-scale data
title_short PRSice-2: Polygenic Risk Score software for biobank-scale data
title_sort prsice-2: polygenic risk score software for biobank-scale data
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6629542/
https://www.ncbi.nlm.nih.gov/pubmed/31307061
http://dx.doi.org/10.1093/gigascience/giz082
work_keys_str_mv AT choishingwan prsice2polygenicriskscoresoftwareforbiobankscaledata
AT oreillypaulf prsice2polygenicriskscoresoftwareforbiobankscaledata