Cargando…

A statistical approach to selecting and confirming validation targets in -omics experiments

BACKGROUND: Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global gene...

Descripción completa

Detalles Bibliográficos
Autores principales: Leek, Jeffrey T, Taub, Margaret A, Rasgon, Jason L
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3568710/
https://www.ncbi.nlm.nih.gov/pubmed/22738145
http://dx.doi.org/10.1186/1471-2105-13-150
_version_ 1782258802691670016
author Leek, Jeffrey T
Taub, Margaret A
Rasgon, Jason L
author_facet Leek, Jeffrey T
Taub, Margaret A
Rasgon, Jason L
author_sort Leek, Jeffrey T
collection PubMed
description BACKGROUND: Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets. RESULTS: Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result. CONCLUSIONS: For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results.
format Online
Article
Text
id pubmed-3568710
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35687102013-02-12 A statistical approach to selecting and confirming validation targets in -omics experiments Leek, Jeffrey T Taub, Margaret A Rasgon, Jason L BMC Bioinformatics Methodology Article BACKGROUND: Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets. RESULTS: Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result. CONCLUSIONS: For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results. BioMed Central 2012-06-27 /pmc/articles/PMC3568710/ /pubmed/22738145 http://dx.doi.org/10.1186/1471-2105-13-150 Text en Copyright ©2012 Leek et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Leek, Jeffrey T
Taub, Margaret A
Rasgon, Jason L
A statistical approach to selecting and confirming validation targets in -omics experiments
title A statistical approach to selecting and confirming validation targets in -omics experiments
title_full A statistical approach to selecting and confirming validation targets in -omics experiments
title_fullStr A statistical approach to selecting and confirming validation targets in -omics experiments
title_full_unstemmed A statistical approach to selecting and confirming validation targets in -omics experiments
title_short A statistical approach to selecting and confirming validation targets in -omics experiments
title_sort statistical approach to selecting and confirming validation targets in -omics experiments
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3568710/
https://www.ncbi.nlm.nih.gov/pubmed/22738145
http://dx.doi.org/10.1186/1471-2105-13-150
work_keys_str_mv AT leekjeffreyt astatisticalapproachtoselectingandconfirmingvalidationtargetsinomicsexperiments
AT taubmargareta astatisticalapproachtoselectingandconfirmingvalidationtargetsinomicsexperiments
AT rasgonjasonl astatisticalapproachtoselectingandconfirmingvalidationtargetsinomicsexperiments
AT leekjeffreyt statisticalapproachtoselectingandconfirmingvalidationtargetsinomicsexperiments
AT taubmargareta statisticalapproachtoselectingandconfirmingvalidationtargetsinomicsexperiments
AT rasgonjasonl statisticalapproachtoselectingandconfirmingvalidationtargetsinomicsexperiments