Cargando…
Data splitting as a countermeasure against hypothesis fishing: with a case study of predictors for low back pain
There is growing concern in the scientific community that many published scientific findings may represent spurious patterns that are not reproducible in independent data sets. A reason for this is that significance levels or confidence intervals are often applied to secondary variables or sub-sampl...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Springer Netherlands
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2270357/ https://www.ncbi.nlm.nih.gov/pubmed/18288577 http://dx.doi.org/10.1007/s10654-008-9230-x |
_version_ | 1782151735539662848 |
---|---|
author | Dahl, Fredrik A. Grotle, Margreth Šaltytė Benth, Jūratė Natvig, Bård |
author_facet | Dahl, Fredrik A. Grotle, Margreth Šaltytė Benth, Jūratė Natvig, Bård |
author_sort | Dahl, Fredrik A. |
collection | PubMed |
description | There is growing concern in the scientific community that many published scientific findings may represent spurious patterns that are not reproducible in independent data sets. A reason for this is that significance levels or confidence intervals are often applied to secondary variables or sub-samples within the trial, in addition to the primary hypotheses (multiple hypotheses). This problem is likely to be extensive for population-based surveys, in which epidemiological hypotheses are derived after seeing the data set (hypothesis fishing). We recommend a data-splitting procedure to counteract this methodological problem, in which one part of the data set is used for identifying hypotheses, and the other is used for hypothesis testing. The procedure is similar to two-stage analysis of microarray data. We illustrate the process using a real data set related to predictors of low back pain at 14-year follow-up in a population initially free of low back pain. “Widespreadness” of pain (pain reported in several other places than the low back) was a statistically significant predictor, while smoking was not, despite its strong association with low back pain in the first half of the data set. We argue that the application of data splitting, in which an independent party handles the data set, will achieve for epidemiological surveys what pre-registration has done for clinical studies. |
format | Text |
id | pubmed-2270357 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | Springer Netherlands |
record_format | MEDLINE/PubMed |
spelling | pubmed-22703572008-03-21 Data splitting as a countermeasure against hypothesis fishing: with a case study of predictors for low back pain Dahl, Fredrik A. Grotle, Margreth Šaltytė Benth, Jūratė Natvig, Bård Eur J Epidemiol Methods There is growing concern in the scientific community that many published scientific findings may represent spurious patterns that are not reproducible in independent data sets. A reason for this is that significance levels or confidence intervals are often applied to secondary variables or sub-samples within the trial, in addition to the primary hypotheses (multiple hypotheses). This problem is likely to be extensive for population-based surveys, in which epidemiological hypotheses are derived after seeing the data set (hypothesis fishing). We recommend a data-splitting procedure to counteract this methodological problem, in which one part of the data set is used for identifying hypotheses, and the other is used for hypothesis testing. The procedure is similar to two-stage analysis of microarray data. We illustrate the process using a real data set related to predictors of low back pain at 14-year follow-up in a population initially free of low back pain. “Widespreadness” of pain (pain reported in several other places than the low back) was a statistically significant predictor, while smoking was not, despite its strong association with low back pain in the first half of the data set. We argue that the application of data splitting, in which an independent party handles the data set, will achieve for epidemiological surveys what pre-registration has done for clinical studies. Springer Netherlands 2008-02-21 2008-04 /pmc/articles/PMC2270357/ /pubmed/18288577 http://dx.doi.org/10.1007/s10654-008-9230-x Text en © The Author(s) 2008 |
spellingShingle | Methods Dahl, Fredrik A. Grotle, Margreth Šaltytė Benth, Jūratė Natvig, Bård Data splitting as a countermeasure against hypothesis fishing: with a case study of predictors for low back pain |
title | Data splitting as a countermeasure against hypothesis fishing: with a case study of predictors for low back pain |
title_full | Data splitting as a countermeasure against hypothesis fishing: with a case study of predictors for low back pain |
title_fullStr | Data splitting as a countermeasure against hypothesis fishing: with a case study of predictors for low back pain |
title_full_unstemmed | Data splitting as a countermeasure against hypothesis fishing: with a case study of predictors for low back pain |
title_short | Data splitting as a countermeasure against hypothesis fishing: with a case study of predictors for low back pain |
title_sort | data splitting as a countermeasure against hypothesis fishing: with a case study of predictors for low back pain |
topic | Methods |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2270357/ https://www.ncbi.nlm.nih.gov/pubmed/18288577 http://dx.doi.org/10.1007/s10654-008-9230-x |
work_keys_str_mv | AT dahlfredrika datasplittingasacountermeasureagainsthypothesisfishingwithacasestudyofpredictorsforlowbackpain AT grotlemargreth datasplittingasacountermeasureagainsthypothesisfishingwithacasestudyofpredictorsforlowbackpain AT saltytebenthjurate datasplittingasacountermeasureagainsthypothesisfishingwithacasestudyofpredictorsforlowbackpain AT natvigbard datasplittingasacountermeasureagainsthypothesisfishingwithacasestudyofpredictorsforlowbackpain |