Cargando…

PRS-on-Spark (PRSoS): a novel, efficient and flexible approach for generating polygenic risk scores

BACKGROUND: Polygenic risk scores (PRS) describe the genomic contribution to complex phenotypes and consistently account for a larger proportion of variance in outcome than single nucleotide polymorphisms (SNPs) alone. However, there is little consensus on the optimal data input for generating PRS,...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Lawrence M., Yao, Nelson, Garg, Elika, Zhu, Yuecai, Nguyen, Thao T. T., Pokhvisneva, Irina, Hari Dass, Shantala A., Unternaehrer, Eva, Gaudreau, Hélène, Forest, Marie, McEwen, Lisa M., MacIsaac, Julia L., Kobor, Michael S., Greenwood, Celia M. T., Silveira, Patricia P., Meaney, Michael J., O’Donnell, Kieran J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6083617/
https://www.ncbi.nlm.nih.gov/pubmed/30089455
http://dx.doi.org/10.1186/s12859-018-2289-9
_version_ 1783346013311860736
author Chen, Lawrence M.
Yao, Nelson
Garg, Elika
Zhu, Yuecai
Nguyen, Thao T. T.
Pokhvisneva, Irina
Hari Dass, Shantala A.
Unternaehrer, Eva
Gaudreau, Hélène
Forest, Marie
McEwen, Lisa M.
MacIsaac, Julia L.
Kobor, Michael S.
Greenwood, Celia M. T.
Silveira, Patricia P.
Meaney, Michael J.
O’Donnell, Kieran J.
author_facet Chen, Lawrence M.
Yao, Nelson
Garg, Elika
Zhu, Yuecai
Nguyen, Thao T. T.
Pokhvisneva, Irina
Hari Dass, Shantala A.
Unternaehrer, Eva
Gaudreau, Hélène
Forest, Marie
McEwen, Lisa M.
MacIsaac, Julia L.
Kobor, Michael S.
Greenwood, Celia M. T.
Silveira, Patricia P.
Meaney, Michael J.
O’Donnell, Kieran J.
author_sort Chen, Lawrence M.
collection PubMed
description BACKGROUND: Polygenic risk scores (PRS) describe the genomic contribution to complex phenotypes and consistently account for a larger proportion of variance in outcome than single nucleotide polymorphisms (SNPs) alone. However, there is little consensus on the optimal data input for generating PRS, and existing approaches largely preclude the use of imputed posterior probabilities and strand-ambiguous SNPs i.e., A/T or C/G polymorphisms. Our ability to predict complex traits that arise from the additive effects of a large number of SNPs would likely benefit from a more inclusive approach. RESULTS: We developed PRS-on-Spark (PRSoS), a software implemented in Apache Spark and Python that accommodates different data inputs and strand-ambiguous SNPs to calculate PRS. We compared performance between PRSoS and an existing software (PRSice v1.25) for generating PRS for major depressive disorder using a community cohort (N = 264). We found PRSoS to perform faster than PRSice v1.25 when PRS were generated for a large number of SNPs (~ 17 million SNPs; t = 42.865, p = 5.43E-04). We also show that the use of imputed posterior probabilities and the inclusion of strand-ambiguous SNPs increase the proportion of variance explained by a PRS for major depressive disorder (from 4.3% to 4.8%). CONCLUSIONS: PRSoS provides the user with the ability to generate PRS using an inclusive and efficient approach that considers a larger number of SNPs than conventional approaches. We show that a PRS for major depressive disorder that includes strand-ambiguous SNPs, calculated using PRSoS, accounts for the largest proportion of variance in symptoms of depression in a community cohort, demonstrating the utility of this approach. The availability of this software will help users develop more informative PRS for a variety of complex phenotypes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2289-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6083617
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-60836172018-08-16 PRS-on-Spark (PRSoS): a novel, efficient and flexible approach for generating polygenic risk scores Chen, Lawrence M. Yao, Nelson Garg, Elika Zhu, Yuecai Nguyen, Thao T. T. Pokhvisneva, Irina Hari Dass, Shantala A. Unternaehrer, Eva Gaudreau, Hélène Forest, Marie McEwen, Lisa M. MacIsaac, Julia L. Kobor, Michael S. Greenwood, Celia M. T. Silveira, Patricia P. Meaney, Michael J. O’Donnell, Kieran J. BMC Bioinformatics Software BACKGROUND: Polygenic risk scores (PRS) describe the genomic contribution to complex phenotypes and consistently account for a larger proportion of variance in outcome than single nucleotide polymorphisms (SNPs) alone. However, there is little consensus on the optimal data input for generating PRS, and existing approaches largely preclude the use of imputed posterior probabilities and strand-ambiguous SNPs i.e., A/T or C/G polymorphisms. Our ability to predict complex traits that arise from the additive effects of a large number of SNPs would likely benefit from a more inclusive approach. RESULTS: We developed PRS-on-Spark (PRSoS), a software implemented in Apache Spark and Python that accommodates different data inputs and strand-ambiguous SNPs to calculate PRS. We compared performance between PRSoS and an existing software (PRSice v1.25) for generating PRS for major depressive disorder using a community cohort (N = 264). We found PRSoS to perform faster than PRSice v1.25 when PRS were generated for a large number of SNPs (~ 17 million SNPs; t = 42.865, p = 5.43E-04). We also show that the use of imputed posterior probabilities and the inclusion of strand-ambiguous SNPs increase the proportion of variance explained by a PRS for major depressive disorder (from 4.3% to 4.8%). CONCLUSIONS: PRSoS provides the user with the ability to generate PRS using an inclusive and efficient approach that considers a larger number of SNPs than conventional approaches. We show that a PRS for major depressive disorder that includes strand-ambiguous SNPs, calculated using PRSoS, accounts for the largest proportion of variance in symptoms of depression in a community cohort, demonstrating the utility of this approach. The availability of this software will help users develop more informative PRS for a variety of complex phenotypes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2289-9) contains supplementary material, which is available to authorized users. BioMed Central 2018-08-08 /pmc/articles/PMC6083617/ /pubmed/30089455 http://dx.doi.org/10.1186/s12859-018-2289-9 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Chen, Lawrence M.
Yao, Nelson
Garg, Elika
Zhu, Yuecai
Nguyen, Thao T. T.
Pokhvisneva, Irina
Hari Dass, Shantala A.
Unternaehrer, Eva
Gaudreau, Hélène
Forest, Marie
McEwen, Lisa M.
MacIsaac, Julia L.
Kobor, Michael S.
Greenwood, Celia M. T.
Silveira, Patricia P.
Meaney, Michael J.
O’Donnell, Kieran J.
PRS-on-Spark (PRSoS): a novel, efficient and flexible approach for generating polygenic risk scores
title PRS-on-Spark (PRSoS): a novel, efficient and flexible approach for generating polygenic risk scores
title_full PRS-on-Spark (PRSoS): a novel, efficient and flexible approach for generating polygenic risk scores
title_fullStr PRS-on-Spark (PRSoS): a novel, efficient and flexible approach for generating polygenic risk scores
title_full_unstemmed PRS-on-Spark (PRSoS): a novel, efficient and flexible approach for generating polygenic risk scores
title_short PRS-on-Spark (PRSoS): a novel, efficient and flexible approach for generating polygenic risk scores
title_sort prs-on-spark (prsos): a novel, efficient and flexible approach for generating polygenic risk scores
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6083617/
https://www.ncbi.nlm.nih.gov/pubmed/30089455
http://dx.doi.org/10.1186/s12859-018-2289-9
work_keys_str_mv AT chenlawrencem prsonsparkprsosanovelefficientandflexibleapproachforgeneratingpolygenicriskscores
AT yaonelson prsonsparkprsosanovelefficientandflexibleapproachforgeneratingpolygenicriskscores
AT gargelika prsonsparkprsosanovelefficientandflexibleapproachforgeneratingpolygenicriskscores
AT zhuyuecai prsonsparkprsosanovelefficientandflexibleapproachforgeneratingpolygenicriskscores
AT nguyenthaott prsonsparkprsosanovelefficientandflexibleapproachforgeneratingpolygenicriskscores
AT pokhvisnevairina prsonsparkprsosanovelefficientandflexibleapproachforgeneratingpolygenicriskscores
AT haridassshantalaa prsonsparkprsosanovelefficientandflexibleapproachforgeneratingpolygenicriskscores
AT unternaehrereva prsonsparkprsosanovelefficientandflexibleapproachforgeneratingpolygenicriskscores
AT gaudreauhelene prsonsparkprsosanovelefficientandflexibleapproachforgeneratingpolygenicriskscores
AT forestmarie prsonsparkprsosanovelefficientandflexibleapproachforgeneratingpolygenicriskscores
AT mcewenlisam prsonsparkprsosanovelefficientandflexibleapproachforgeneratingpolygenicriskscores
AT macisaacjulial prsonsparkprsosanovelefficientandflexibleapproachforgeneratingpolygenicriskscores
AT kobormichaels prsonsparkprsosanovelefficientandflexibleapproachforgeneratingpolygenicriskscores
AT greenwoodceliamt prsonsparkprsosanovelefficientandflexibleapproachforgeneratingpolygenicriskscores
AT silveirapatriciap prsonsparkprsosanovelefficientandflexibleapproachforgeneratingpolygenicriskscores
AT meaneymichaelj prsonsparkprsosanovelefficientandflexibleapproachforgeneratingpolygenicriskscores
AT odonnellkieranj prsonsparkprsosanovelefficientandflexibleapproachforgeneratingpolygenicriskscores