Cargando…

Variable selection under multiple imputation using the bootstrap in a prognostic study

BACKGROUND: Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable s...

Descripción completa

Detalles Bibliográficos
Autores principales: Heymans, Martijn W, van Buuren, Stef, Knol, Dirk L, van Mechelen, Willem, de Vet, Henrica CW
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1945032/
https://www.ncbi.nlm.nih.gov/pubmed/17629912
http://dx.doi.org/10.1186/1471-2288-7-33
_version_ 1782134483470778368
author Heymans, Martijn W
van Buuren, Stef
Knol, Dirk L
van Mechelen, Willem
de Vet, Henrica CW
author_facet Heymans, Martijn W
van Buuren, Stef
Knol, Dirk L
van Mechelen, Willem
de Vet, Henrica CW
author_sort Heymans, Martijn W
collection PubMed
description BACKGROUND: Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable selection. METHOD: In our prospective cohort study we merged data from three different randomized controlled trials (RCTs) to assess prognostic variables for chronicity of low back pain. Among the outcome and prognostic variables data were missing in the range of 0 and 48.1%. We used four methods to investigate the influence of respectively sampling and imputation variation: MI only, bootstrap only, and two methods that combine MI and bootstrapping. Variables were selected based on the inclusion frequency of each prognostic variable, i.e. the proportion of times that the variable appeared in the model. The discriminative and calibrative abilities of prognostic models developed by the four methods were assessed at different inclusion levels. RESULTS: We found that the effect of imputation variation on the inclusion frequency was larger than the effect of sampling variation. When MI and bootstrapping were combined at the range of 0% (full model) to 90% of variable selection, bootstrap corrected c-index values of 0.70 to 0.71 and slope values of 0.64 to 0.86 were found. CONCLUSION: We recommend to account for both imputation and sampling variation in sets of missing data. The new procedure of combining MI with bootstrapping for variable selection, results in multivariable prognostic models with good performance and is therefore attractive to apply on data sets with missing values.
format Text
id pubmed-1945032
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-19450322007-08-11 Variable selection under multiple imputation using the bootstrap in a prognostic study Heymans, Martijn W van Buuren, Stef Knol, Dirk L van Mechelen, Willem de Vet, Henrica CW BMC Med Res Methodol Research Article BACKGROUND: Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable selection. METHOD: In our prospective cohort study we merged data from three different randomized controlled trials (RCTs) to assess prognostic variables for chronicity of low back pain. Among the outcome and prognostic variables data were missing in the range of 0 and 48.1%. We used four methods to investigate the influence of respectively sampling and imputation variation: MI only, bootstrap only, and two methods that combine MI and bootstrapping. Variables were selected based on the inclusion frequency of each prognostic variable, i.e. the proportion of times that the variable appeared in the model. The discriminative and calibrative abilities of prognostic models developed by the four methods were assessed at different inclusion levels. RESULTS: We found that the effect of imputation variation on the inclusion frequency was larger than the effect of sampling variation. When MI and bootstrapping were combined at the range of 0% (full model) to 90% of variable selection, bootstrap corrected c-index values of 0.70 to 0.71 and slope values of 0.64 to 0.86 were found. CONCLUSION: We recommend to account for both imputation and sampling variation in sets of missing data. The new procedure of combining MI with bootstrapping for variable selection, results in multivariable prognostic models with good performance and is therefore attractive to apply on data sets with missing values. BioMed Central 2007-07-13 /pmc/articles/PMC1945032/ /pubmed/17629912 http://dx.doi.org/10.1186/1471-2288-7-33 Text en Copyright © 2007 Heymans et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Heymans, Martijn W
van Buuren, Stef
Knol, Dirk L
van Mechelen, Willem
de Vet, Henrica CW
Variable selection under multiple imputation using the bootstrap in a prognostic study
title Variable selection under multiple imputation using the bootstrap in a prognostic study
title_full Variable selection under multiple imputation using the bootstrap in a prognostic study
title_fullStr Variable selection under multiple imputation using the bootstrap in a prognostic study
title_full_unstemmed Variable selection under multiple imputation using the bootstrap in a prognostic study
title_short Variable selection under multiple imputation using the bootstrap in a prognostic study
title_sort variable selection under multiple imputation using the bootstrap in a prognostic study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1945032/
https://www.ncbi.nlm.nih.gov/pubmed/17629912
http://dx.doi.org/10.1186/1471-2288-7-33
work_keys_str_mv AT heymansmartijnw variableselectionundermultipleimputationusingthebootstrapinaprognosticstudy
AT vanbuurenstef variableselectionundermultipleimputationusingthebootstrapinaprognosticstudy
AT knoldirkl variableselectionundermultipleimputationusingthebootstrapinaprognosticstudy
AT vanmechelenwillem variableselectionundermultipleimputationusingthebootstrapinaprognosticstudy
AT devethenricacw variableselectionundermultipleimputationusingthebootstrapinaprognosticstudy