Cargando…

Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates

Plasmode is a term coined several years ago to describe data sets that are derived from real data but for which some truth is known. Omic techniques, most especially microarray and genomewide association studies, have catalyzed a new zeitgeist of data sharing that is making data and data sets public...

Descripción completa

Detalles Bibliográficos
Autores principales: Gadbury, Gary L., Xiang, Qinfang, Yang, Lin, Barnes, Stephen, Page, Grier P., Allison, David B.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2409977/
https://www.ncbi.nlm.nih.gov/pubmed/18566659
http://dx.doi.org/10.1371/journal.pgen.1000098
_version_ 1782155914618339328
author Gadbury, Gary L.
Xiang, Qinfang
Yang, Lin
Barnes, Stephen
Page, Grier P.
Allison, David B.
author_facet Gadbury, Gary L.
Xiang, Qinfang
Yang, Lin
Barnes, Stephen
Page, Grier P.
Allison, David B.
author_sort Gadbury, Gary L.
collection PubMed
description Plasmode is a term coined several years ago to describe data sets that are derived from real data but for which some truth is known. Omic techniques, most especially microarray and genomewide association studies, have catalyzed a new zeitgeist of data sharing that is making data and data sets publicly available on an unprecedented scale. Coupling such data resources with a science of plasmode use would allow statistical methodologists to vet proposed techniques empirically (as opposed to only theoretically) and with data that are by definition realistic and representative. We illustrate the technique of empirical statistics by consideration of a common task when analyzing high dimensional data: the simultaneous testing of hundreds or thousands of hypotheses to determine which, if any, show statistical significance warranting follow-on research. The now-common practice of multiple testing in high dimensional experiment (HDE) settings has generated new methods for detecting statistically significant results. Although such methods have heretofore been subject to comparative performance analysis using simulated data, simulating data that realistically reflect data from an actual HDE remains a challenge. We describe a simulation procedure using actual data from an HDE where some truth regarding parameters of interest is known. We use the procedure to compare estimates for the proportion of true null hypotheses, the false discovery rate (FDR), and a local version of FDR obtained from 15 different statistical methods.
format Text
id pubmed-2409977
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-24099772008-06-20 Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates Gadbury, Gary L. Xiang, Qinfang Yang, Lin Barnes, Stephen Page, Grier P. Allison, David B. PLoS Genet Research Article Plasmode is a term coined several years ago to describe data sets that are derived from real data but for which some truth is known. Omic techniques, most especially microarray and genomewide association studies, have catalyzed a new zeitgeist of data sharing that is making data and data sets publicly available on an unprecedented scale. Coupling such data resources with a science of plasmode use would allow statistical methodologists to vet proposed techniques empirically (as opposed to only theoretically) and with data that are by definition realistic and representative. We illustrate the technique of empirical statistics by consideration of a common task when analyzing high dimensional data: the simultaneous testing of hundreds or thousands of hypotheses to determine which, if any, show statistical significance warranting follow-on research. The now-common practice of multiple testing in high dimensional experiment (HDE) settings has generated new methods for detecting statistically significant results. Although such methods have heretofore been subject to comparative performance analysis using simulated data, simulating data that realistically reflect data from an actual HDE remains a challenge. We describe a simulation procedure using actual data from an HDE where some truth regarding parameters of interest is known. We use the procedure to compare estimates for the proportion of true null hypotheses, the false discovery rate (FDR), and a local version of FDR obtained from 15 different statistical methods. Public Library of Science 2008-06-20 /pmc/articles/PMC2409977/ /pubmed/18566659 http://dx.doi.org/10.1371/journal.pgen.1000098 Text en Gadbury et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Gadbury, Gary L.
Xiang, Qinfang
Yang, Lin
Barnes, Stephen
Page, Grier P.
Allison, David B.
Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates
title Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates
title_full Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates
title_fullStr Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates
title_full_unstemmed Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates
title_short Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates
title_sort evaluating statistical methods using plasmode data sets in the age of massive public databases: an illustration using false discovery rates
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2409977/
https://www.ncbi.nlm.nih.gov/pubmed/18566659
http://dx.doi.org/10.1371/journal.pgen.1000098
work_keys_str_mv AT gadburygaryl evaluatingstatisticalmethodsusingplasmodedatasetsintheageofmassivepublicdatabasesanillustrationusingfalsediscoveryrates
AT xiangqinfang evaluatingstatisticalmethodsusingplasmodedatasetsintheageofmassivepublicdatabasesanillustrationusingfalsediscoveryrates
AT yanglin evaluatingstatisticalmethodsusingplasmodedatasetsintheageofmassivepublicdatabasesanillustrationusingfalsediscoveryrates
AT barnesstephen evaluatingstatisticalmethodsusingplasmodedatasetsintheageofmassivepublicdatabasesanillustrationusingfalsediscoveryrates
AT pagegrierp evaluatingstatisticalmethodsusingplasmodedatasetsintheageofmassivepublicdatabasesanillustrationusingfalsediscoveryrates
AT allisondavidb evaluatingstatisticalmethodsusingplasmodedatasetsintheageofmassivepublicdatabasesanillustrationusingfalsediscoveryrates