Cargando…

Data splitting as a countermeasure against hypothesis fishing: with a case study of predictors for low back pain

There is growing concern in the scientific community that many published scientific findings may represent spurious patterns that are not reproducible in independent data sets. A reason for this is that significance levels or confidence intervals are often applied to secondary variables or sub-sampl...

Descripción completa

Detalles Bibliográficos
Autores principales: Dahl, Fredrik A., Grotle, Margreth, Šaltytė Benth, Jūratė, Natvig, Bård
Formato: Texto
Lenguaje:English
Publicado: Springer Netherlands 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2270357/
https://www.ncbi.nlm.nih.gov/pubmed/18288577
http://dx.doi.org/10.1007/s10654-008-9230-x
_version_ 1782151735539662848
author Dahl, Fredrik A.
Grotle, Margreth
Šaltytė Benth, Jūratė
Natvig, Bård
author_facet Dahl, Fredrik A.
Grotle, Margreth
Šaltytė Benth, Jūratė
Natvig, Bård
author_sort Dahl, Fredrik A.
collection PubMed
description There is growing concern in the scientific community that many published scientific findings may represent spurious patterns that are not reproducible in independent data sets. A reason for this is that significance levels or confidence intervals are often applied to secondary variables or sub-samples within the trial, in addition to the primary hypotheses (multiple hypotheses). This problem is likely to be extensive for population-based surveys, in which epidemiological hypotheses are derived after seeing the data set (hypothesis fishing). We recommend a data-splitting procedure to counteract this methodological problem, in which one part of the data set is used for identifying hypotheses, and the other is used for hypothesis testing. The procedure is similar to two-stage analysis of microarray data. We illustrate the process using a real data set related to predictors of low back pain at 14-year follow-up in a population initially free of low back pain. “Widespreadness” of pain (pain reported in several other places than the low back) was a statistically significant predictor, while smoking was not, despite its strong association with low back pain in the first half of the data set. We argue that the application of data splitting, in which an independent party handles the data set, will achieve for epidemiological surveys what pre-registration has done for clinical studies.
format Text
id pubmed-2270357
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Springer Netherlands
record_format MEDLINE/PubMed
spelling pubmed-22703572008-03-21 Data splitting as a countermeasure against hypothesis fishing: with a case study of predictors for low back pain Dahl, Fredrik A. Grotle, Margreth Šaltytė Benth, Jūratė Natvig, Bård Eur J Epidemiol Methods There is growing concern in the scientific community that many published scientific findings may represent spurious patterns that are not reproducible in independent data sets. A reason for this is that significance levels or confidence intervals are often applied to secondary variables or sub-samples within the trial, in addition to the primary hypotheses (multiple hypotheses). This problem is likely to be extensive for population-based surveys, in which epidemiological hypotheses are derived after seeing the data set (hypothesis fishing). We recommend a data-splitting procedure to counteract this methodological problem, in which one part of the data set is used for identifying hypotheses, and the other is used for hypothesis testing. The procedure is similar to two-stage analysis of microarray data. We illustrate the process using a real data set related to predictors of low back pain at 14-year follow-up in a population initially free of low back pain. “Widespreadness” of pain (pain reported in several other places than the low back) was a statistically significant predictor, while smoking was not, despite its strong association with low back pain in the first half of the data set. We argue that the application of data splitting, in which an independent party handles the data set, will achieve for epidemiological surveys what pre-registration has done for clinical studies. Springer Netherlands 2008-02-21 2008-04 /pmc/articles/PMC2270357/ /pubmed/18288577 http://dx.doi.org/10.1007/s10654-008-9230-x Text en © The Author(s) 2008
spellingShingle Methods
Dahl, Fredrik A.
Grotle, Margreth
Šaltytė Benth, Jūratė
Natvig, Bård
Data splitting as a countermeasure against hypothesis fishing: with a case study of predictors for low back pain
title Data splitting as a countermeasure against hypothesis fishing: with a case study of predictors for low back pain
title_full Data splitting as a countermeasure against hypothesis fishing: with a case study of predictors for low back pain
title_fullStr Data splitting as a countermeasure against hypothesis fishing: with a case study of predictors for low back pain
title_full_unstemmed Data splitting as a countermeasure against hypothesis fishing: with a case study of predictors for low back pain
title_short Data splitting as a countermeasure against hypothesis fishing: with a case study of predictors for low back pain
title_sort data splitting as a countermeasure against hypothesis fishing: with a case study of predictors for low back pain
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2270357/
https://www.ncbi.nlm.nih.gov/pubmed/18288577
http://dx.doi.org/10.1007/s10654-008-9230-x
work_keys_str_mv AT dahlfredrika datasplittingasacountermeasureagainsthypothesisfishingwithacasestudyofpredictorsforlowbackpain
AT grotlemargreth datasplittingasacountermeasureagainsthypothesisfishingwithacasestudyofpredictorsforlowbackpain
AT saltytebenthjurate datasplittingasacountermeasureagainsthypothesisfishingwithacasestudyofpredictorsforlowbackpain
AT natvigbard datasplittingasacountermeasureagainsthypothesisfishingwithacasestudyofpredictorsforlowbackpain