Cargando…

A comparison of model selection methods for prediction in the presence of multiply imputed data

Many approaches for variable selection with multiply imputed data in the development of a prognostic model have been proposed. However, no method prevails as uniformly best. We conducted a simulation study with a binary outcome and a logistic regression model to compare two classes of variable selec...

Descripción completa

Detalles Bibliográficos
Autores principales:	Thao, Le Thi Phuong, Geskus, Ronald
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley and Sons Inc. 2018
Materias:	Statistical Advances for Clinical Trials
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6492211/ https://www.ncbi.nlm.nih.gov/pubmed/30353591 http://dx.doi.org/10.1002/bimj.201700232

_version_	1783415105900249088
author	Thao, Le Thi Phuong Geskus, Ronald
author_facet	Thao, Le Thi Phuong Geskus, Ronald
author_sort	Thao, Le Thi Phuong
collection	PubMed
description	Many approaches for variable selection with multiply imputed data in the development of a prognostic model have been proposed. However, no method prevails as uniformly best. We conducted a simulation study with a binary outcome and a logistic regression model to compare two classes of variable selection methods in the presence of MI data: (I) Model selection on bootstrap data, using backward elimination based on AIC or lasso, and fit the final model based on the most frequently (e.g. [Formula: see text]) selected variables over all MI and bootstrap data sets; (II) Model selection on original MI data, using lasso. The final model is obtained by (i) averaging estimates of variables that were selected in any MI data set or (ii) in 50% of the MI data; (iii) performing lasso on the stacked MI data, and (iv) as in (iii) but using individual weights as determined by the fraction of missingness. In all lasso models, we used both the optimal penalty and the 1‐se rule. We considered recalibrating models to correct for overshrinkage due to the suboptimal penalty by refitting the linear predictor or all individual variables. We applied the methods on a real dataset of 951 adult patients with tuberculous meningitis to predict mortality within nine months. Overall, applying lasso selection with the 1‐se penalty shows the best performance, both in approach I and II. Stacking MI data is an attractive approach because it does not require choosing a selection threshold when combining results from separate MI data sets
format	Online Article Text
id	pubmed-6492211
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	John Wiley and Sons Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-64922112019-05-07 A comparison of model selection methods for prediction in the presence of multiply imputed data Thao, Le Thi Phuong Geskus, Ronald Biom J Statistical Advances for Clinical Trials Many approaches for variable selection with multiply imputed data in the development of a prognostic model have been proposed. However, no method prevails as uniformly best. We conducted a simulation study with a binary outcome and a logistic regression model to compare two classes of variable selection methods in the presence of MI data: (I) Model selection on bootstrap data, using backward elimination based on AIC or lasso, and fit the final model based on the most frequently (e.g. [Formula: see text]) selected variables over all MI and bootstrap data sets; (II) Model selection on original MI data, using lasso. The final model is obtained by (i) averaging estimates of variables that were selected in any MI data set or (ii) in 50% of the MI data; (iii) performing lasso on the stacked MI data, and (iv) as in (iii) but using individual weights as determined by the fraction of missingness. In all lasso models, we used both the optimal penalty and the 1‐se rule. We considered recalibrating models to correct for overshrinkage due to the suboptimal penalty by refitting the linear predictor or all individual variables. We applied the methods on a real dataset of 951 adult patients with tuberculous meningitis to predict mortality within nine months. Overall, applying lasso selection with the 1‐se penalty shows the best performance, both in approach I and II. Stacking MI data is an attractive approach because it does not require choosing a selection threshold when combining results from separate MI data sets John Wiley and Sons Inc. 2018-10-23 2019-03 /pmc/articles/PMC6492211/ /pubmed/30353591 http://dx.doi.org/10.1002/bimj.201700232 Text en © 2018 The Authors. Biometrical Journal Published by WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Statistical Advances for Clinical Trials Thao, Le Thi Phuong Geskus, Ronald A comparison of model selection methods for prediction in the presence of multiply imputed data
title	A comparison of model selection methods for prediction in the presence of multiply imputed data
title_full	A comparison of model selection methods for prediction in the presence of multiply imputed data
title_fullStr	A comparison of model selection methods for prediction in the presence of multiply imputed data
title_full_unstemmed	A comparison of model selection methods for prediction in the presence of multiply imputed data
title_short	A comparison of model selection methods for prediction in the presence of multiply imputed data
title_sort	comparison of model selection methods for prediction in the presence of multiply imputed data
topic	Statistical Advances for Clinical Trials
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6492211/ https://www.ncbi.nlm.nih.gov/pubmed/30353591 http://dx.doi.org/10.1002/bimj.201700232
work_keys_str_mv	AT thaolethiphuong acomparisonofmodelselectionmethodsforpredictioninthepresenceofmultiplyimputeddata AT geskusronald acomparisonofmodelselectionmethodsforpredictioninthepresenceofmultiplyimputeddata AT thaolethiphuong comparisonofmodelselectionmethodsforpredictioninthepresenceofmultiplyimputeddata AT geskusronald comparisonofmodelselectionmethodsforpredictioninthepresenceofmultiplyimputeddata

A comparison of model selection methods for prediction in the presence of multiply imputed data

Ejemplares similares