Cargando…

Inflated expectations: Rare-variant association analysis using public controls

The use of publicly available sequencing datasets as controls (hereafter, “public controls”) in studies of rare variant disease associations has great promise but can increase the risk of false-positive discovery. The specific factors that could contribute to inflated distribution of test statistics...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Jung, Karyadi, Danielle M., Hartley, Stephen W., Zhu, Bin, Wang, Mingyi, Wu, Dongjing, Song, Lei, Armstrong, Gregory T., Bhatia, Smita, Robison, Leslie L., Yasui, Yutaka, Carter, Brian, Sampson, Joshua N., Freedman, Neal D., Goldstein, Alisa M., Mirabello, Lisa, Chanock, Stephen J., Morton, Lindsay M., Savage, Sharon A., Stewart, Douglas R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9876209/
https://www.ncbi.nlm.nih.gov/pubmed/36696392
http://dx.doi.org/10.1371/journal.pone.0280951
_version_ 1784878115560882176
author Kim, Jung
Karyadi, Danielle M.
Hartley, Stephen W.
Zhu, Bin
Wang, Mingyi
Wu, Dongjing
Song, Lei
Armstrong, Gregory T.
Bhatia, Smita
Robison, Leslie L.
Yasui, Yutaka
Carter, Brian
Sampson, Joshua N.
Freedman, Neal D.
Goldstein, Alisa M.
Mirabello, Lisa
Chanock, Stephen J.
Morton, Lindsay M.
Savage, Sharon A.
Stewart, Douglas R.
author_facet Kim, Jung
Karyadi, Danielle M.
Hartley, Stephen W.
Zhu, Bin
Wang, Mingyi
Wu, Dongjing
Song, Lei
Armstrong, Gregory T.
Bhatia, Smita
Robison, Leslie L.
Yasui, Yutaka
Carter, Brian
Sampson, Joshua N.
Freedman, Neal D.
Goldstein, Alisa M.
Mirabello, Lisa
Chanock, Stephen J.
Morton, Lindsay M.
Savage, Sharon A.
Stewart, Douglas R.
author_sort Kim, Jung
collection PubMed
description The use of publicly available sequencing datasets as controls (hereafter, “public controls”) in studies of rare variant disease associations has great promise but can increase the risk of false-positive discovery. The specific factors that could contribute to inflated distribution of test statistics have not been systematically examined. Here, we leveraged both public controls, gnomAD v2.1 and several datasets sequenced in our laboratory to systematically investigate factors that could contribute to the false-positive discovery, as measured by λ(Δ95), a measure to quantify the degree of inflation in statistical significance. Analyses of datasets in this investigation found that 1) the significantly inflated distribution of test statistics decreased substantially when the same variant caller and filtering pipelines were employed, 2) differences in library prep kits and sequencers did not affect the false-positive discovery rate and, 3) joint vs. separate variant-calling of cases and controls did not contribute to the inflation of test statistics. Currently available methods do not adequately adjust for the high false-positive discovery. These results, especially if replicated, emphasize the risks of using public controls for rare-variant association tests in which individual-level data and the computational pipeline are not readily accessible, which prevents the use of the same variant-calling and filtering pipelines on both cases and controls. A plausible solution exists with the emergence of cloud-based computing, which can make it possible to bring containerized analytical pipelines to the data (rather than the data to the pipeline) and could avert or minimize these issues. It is suggested that future reports account for this issue and provide this as a limitation in reporting new findings based on studies that cannot practically analyze all data on a single pipeline.
format Online
Article
Text
id pubmed-9876209
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-98762092023-01-26 Inflated expectations: Rare-variant association analysis using public controls Kim, Jung Karyadi, Danielle M. Hartley, Stephen W. Zhu, Bin Wang, Mingyi Wu, Dongjing Song, Lei Armstrong, Gregory T. Bhatia, Smita Robison, Leslie L. Yasui, Yutaka Carter, Brian Sampson, Joshua N. Freedman, Neal D. Goldstein, Alisa M. Mirabello, Lisa Chanock, Stephen J. Morton, Lindsay M. Savage, Sharon A. Stewart, Douglas R. PLoS One Research Article The use of publicly available sequencing datasets as controls (hereafter, “public controls”) in studies of rare variant disease associations has great promise but can increase the risk of false-positive discovery. The specific factors that could contribute to inflated distribution of test statistics have not been systematically examined. Here, we leveraged both public controls, gnomAD v2.1 and several datasets sequenced in our laboratory to systematically investigate factors that could contribute to the false-positive discovery, as measured by λ(Δ95), a measure to quantify the degree of inflation in statistical significance. Analyses of datasets in this investigation found that 1) the significantly inflated distribution of test statistics decreased substantially when the same variant caller and filtering pipelines were employed, 2) differences in library prep kits and sequencers did not affect the false-positive discovery rate and, 3) joint vs. separate variant-calling of cases and controls did not contribute to the inflation of test statistics. Currently available methods do not adequately adjust for the high false-positive discovery. These results, especially if replicated, emphasize the risks of using public controls for rare-variant association tests in which individual-level data and the computational pipeline are not readily accessible, which prevents the use of the same variant-calling and filtering pipelines on both cases and controls. A plausible solution exists with the emergence of cloud-based computing, which can make it possible to bring containerized analytical pipelines to the data (rather than the data to the pipeline) and could avert or minimize these issues. It is suggested that future reports account for this issue and provide this as a limitation in reporting new findings based on studies that cannot practically analyze all data on a single pipeline. Public Library of Science 2023-01-25 /pmc/articles/PMC9876209/ /pubmed/36696392 http://dx.doi.org/10.1371/journal.pone.0280951 Text en https://creativecommons.org/publicdomain/zero/1.0/This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Kim, Jung
Karyadi, Danielle M.
Hartley, Stephen W.
Zhu, Bin
Wang, Mingyi
Wu, Dongjing
Song, Lei
Armstrong, Gregory T.
Bhatia, Smita
Robison, Leslie L.
Yasui, Yutaka
Carter, Brian
Sampson, Joshua N.
Freedman, Neal D.
Goldstein, Alisa M.
Mirabello, Lisa
Chanock, Stephen J.
Morton, Lindsay M.
Savage, Sharon A.
Stewart, Douglas R.
Inflated expectations: Rare-variant association analysis using public controls
title Inflated expectations: Rare-variant association analysis using public controls
title_full Inflated expectations: Rare-variant association analysis using public controls
title_fullStr Inflated expectations: Rare-variant association analysis using public controls
title_full_unstemmed Inflated expectations: Rare-variant association analysis using public controls
title_short Inflated expectations: Rare-variant association analysis using public controls
title_sort inflated expectations: rare-variant association analysis using public controls
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9876209/
https://www.ncbi.nlm.nih.gov/pubmed/36696392
http://dx.doi.org/10.1371/journal.pone.0280951
work_keys_str_mv AT kimjung inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols
AT karyadidaniellem inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols
AT hartleystephenw inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols
AT zhubin inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols
AT wangmingyi inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols
AT wudongjing inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols
AT songlei inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols
AT armstronggregoryt inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols
AT bhatiasmita inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols
AT robisonlesliel inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols
AT yasuiyutaka inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols
AT carterbrian inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols
AT sampsonjoshuan inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols
AT freedmanneald inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols
AT goldsteinalisam inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols
AT mirabellolisa inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols
AT chanockstephenj inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols
AT mortonlindsaym inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols
AT savagesharona inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols
AT stewartdouglasr inflatedexpectationsrarevariantassociationanalysisusingpubliccontrols