Cargando…
Signals Among Signals: Prioritizing Nongenetic Associations in Massive Data Sets
Massive data sets are often regarded as a panacea to the underpowered studies of the past. At the same time, it is becoming clear that in many of these data sets in which thousands of variables are measured across hundreds of thousands or millions of individuals, almost any desired relationship can...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6494664/ https://www.ncbi.nlm.nih.gov/pubmed/30877292 http://dx.doi.org/10.1093/aje/kwz031 |
_version_ | 1783415262482006016 |
---|---|
author | Manrai, Arjun K Ioannidis, John P A Patel, Chirag J |
author_facet | Manrai, Arjun K Ioannidis, John P A Patel, Chirag J |
author_sort | Manrai, Arjun K |
collection | PubMed |
description | Massive data sets are often regarded as a panacea to the underpowered studies of the past. At the same time, it is becoming clear that in many of these data sets in which thousands of variables are measured across hundreds of thousands or millions of individuals, almost any desired relationship can be inferred with a suitable combination of covariates or analytic choices. Inspired by the genome-wide association study analysis paradigm that has transformed human genetics, X-wide association studies or “XWAS” have emerged as a popular approach to systematically analyzing nongenetic data sets and guarding against false positives. However, these studies often yield hundreds or thousands of associations characterized by modest effect sizes and miniscule P values. Many of these associations will be spurious and emerge due to confounding and other biases. One way of characterizing confounding in the genomics paradigm is the genomic inflation factor. An analogous “X-wide inflation factor,” denoted λ(X), can be defined and applied to published XWAS. Effects that arise in XWAS may be prioritized using replication, triangulation, quantification of measurement error, contextualization of each effect in the distribution of all effect sizes within a field, and pre-registration. Criteria like those of Bradford Hill need to be reconsidered in light of exposure-wide epidemiology to prioritize signals among signals. |
format | Online Article Text |
id | pubmed-6494664 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-64946642019-05-07 Signals Among Signals: Prioritizing Nongenetic Associations in Massive Data Sets Manrai, Arjun K Ioannidis, John P A Patel, Chirag J Am J Epidemiol Commentary Massive data sets are often regarded as a panacea to the underpowered studies of the past. At the same time, it is becoming clear that in many of these data sets in which thousands of variables are measured across hundreds of thousands or millions of individuals, almost any desired relationship can be inferred with a suitable combination of covariates or analytic choices. Inspired by the genome-wide association study analysis paradigm that has transformed human genetics, X-wide association studies or “XWAS” have emerged as a popular approach to systematically analyzing nongenetic data sets and guarding against false positives. However, these studies often yield hundreds or thousands of associations characterized by modest effect sizes and miniscule P values. Many of these associations will be spurious and emerge due to confounding and other biases. One way of characterizing confounding in the genomics paradigm is the genomic inflation factor. An analogous “X-wide inflation factor,” denoted λ(X), can be defined and applied to published XWAS. Effects that arise in XWAS may be prioritized using replication, triangulation, quantification of measurement error, contextualization of each effect in the distribution of all effect sizes within a field, and pre-registration. Criteria like those of Bradford Hill need to be reconsidered in light of exposure-wide epidemiology to prioritize signals among signals. Oxford University Press 2019-05 2019-03-16 /pmc/articles/PMC6494664/ /pubmed/30877292 http://dx.doi.org/10.1093/aje/kwz031 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. http://creativecommons.org/licenses/by-nc/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journalpermissions@oup.com. |
spellingShingle | Commentary Manrai, Arjun K Ioannidis, John P A Patel, Chirag J Signals Among Signals: Prioritizing Nongenetic Associations in Massive Data Sets |
title | Signals Among Signals: Prioritizing Nongenetic Associations in Massive Data Sets |
title_full | Signals Among Signals: Prioritizing Nongenetic Associations in Massive Data Sets |
title_fullStr | Signals Among Signals: Prioritizing Nongenetic Associations in Massive Data Sets |
title_full_unstemmed | Signals Among Signals: Prioritizing Nongenetic Associations in Massive Data Sets |
title_short | Signals Among Signals: Prioritizing Nongenetic Associations in Massive Data Sets |
title_sort | signals among signals: prioritizing nongenetic associations in massive data sets |
topic | Commentary |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6494664/ https://www.ncbi.nlm.nih.gov/pubmed/30877292 http://dx.doi.org/10.1093/aje/kwz031 |
work_keys_str_mv | AT manraiarjunk signalsamongsignalsprioritizingnongeneticassociationsinmassivedatasets AT ioannidisjohnpa signalsamongsignalsprioritizingnongeneticassociationsinmassivedatasets AT patelchiragj signalsamongsignalsprioritizingnongeneticassociationsinmassivedatasets |