Cargando…
Effect of Variable Selection Strategy on the Performance of Prognostic Models When Using Multiple Imputation
BACKGROUND: Variable selection is an important issue when developing prognostic models. Missing data occur frequently in clinical research. Multiple imputation is increasingly used to address the presence of missing data in clinical research. The effect of different variable selection strategies wit...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Lippincott Williams & Wilkins
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7665277/ https://www.ncbi.nlm.nih.gov/pubmed/31718298 http://dx.doi.org/10.1161/CIRCOUTCOMES.119.005927 |
_version_ | 1783609988615241728 |
---|---|
author | Austin, Peter C. Lee, Douglas S. Ko, Dennis T. White, Ian R. |
author_facet | Austin, Peter C. Lee, Douglas S. Ko, Dennis T. White, Ian R. |
author_sort | Austin, Peter C. |
collection | PubMed |
description | BACKGROUND: Variable selection is an important issue when developing prognostic models. Missing data occur frequently in clinical research. Multiple imputation is increasingly used to address the presence of missing data in clinical research. The effect of different variable selection strategies with multiply imputed data on the external performance of derived prognostic models has not been well examined. METHODS AND RESULTS: We used backward variable selection with 9 different ways to handle multiply imputed data in a derivation sample to develop logistic regression models for predicting death within 1 year of hospitalization with an acute myocardial infarction. We assessed the prognostic accuracy of each derived model in a temporally distinct validation sample. The derivation and validation samples consisted of 11 524 patients hospitalized between 1999 and 2001 and 7889 patients hospitalized between 2004 and 2005, respectively. We considered 41 candidate predictor variables. Missing data occurred frequently, with only 13% of patients in the derivation sample and 31% of patients in the validation sample having complete data. Regardless of the significance level for variable selection, the prognostic model developed using only the complete cases in the derivation sample had substantially worse performance in the validation sample than did the models for which variables were selected using the multiply imputed versions of the derivation sample. The other 8 approaches to handling multiply imputed data resulted in prognostic models with performance similar to one another. CONCLUSIONS: Ignoring missing data and using only subjects with complete data can result in the derivation of prognostic models with poor performance. Multiple imputation should be used to account for missing data when developing prognostic models. |
format | Online Article Text |
id | pubmed-7665277 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Lippincott Williams & Wilkins |
record_format | MEDLINE/PubMed |
spelling | pubmed-76652772020-11-16 Effect of Variable Selection Strategy on the Performance of Prognostic Models When Using Multiple Imputation Austin, Peter C. Lee, Douglas S. Ko, Dennis T. White, Ian R. Circ Cardiovasc Qual Outcomes Methods Paper BACKGROUND: Variable selection is an important issue when developing prognostic models. Missing data occur frequently in clinical research. Multiple imputation is increasingly used to address the presence of missing data in clinical research. The effect of different variable selection strategies with multiply imputed data on the external performance of derived prognostic models has not been well examined. METHODS AND RESULTS: We used backward variable selection with 9 different ways to handle multiply imputed data in a derivation sample to develop logistic regression models for predicting death within 1 year of hospitalization with an acute myocardial infarction. We assessed the prognostic accuracy of each derived model in a temporally distinct validation sample. The derivation and validation samples consisted of 11 524 patients hospitalized between 1999 and 2001 and 7889 patients hospitalized between 2004 and 2005, respectively. We considered 41 candidate predictor variables. Missing data occurred frequently, with only 13% of patients in the derivation sample and 31% of patients in the validation sample having complete data. Regardless of the significance level for variable selection, the prognostic model developed using only the complete cases in the derivation sample had substantially worse performance in the validation sample than did the models for which variables were selected using the multiply imputed versions of the derivation sample. The other 8 approaches to handling multiply imputed data resulted in prognostic models with performance similar to one another. CONCLUSIONS: Ignoring missing data and using only subjects with complete data can result in the derivation of prognostic models with poor performance. Multiple imputation should be used to account for missing data when developing prognostic models. Lippincott Williams & Wilkins 2019-11-13 /pmc/articles/PMC7665277/ /pubmed/31718298 http://dx.doi.org/10.1161/CIRCOUTCOMES.119.005927 Text en © 2019 The Authors. Circulation: Cardiovascular Quality and Outcomes is published on behalf of the American Heart Association, Inc., by Wolters Kluwer Health, Inc. This is an open access article under the terms of the Creative Commons Attribution (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution, and reproduction in any medium, provided that the original work is properly cited. |
spellingShingle | Methods Paper Austin, Peter C. Lee, Douglas S. Ko, Dennis T. White, Ian R. Effect of Variable Selection Strategy on the Performance of Prognostic Models When Using Multiple Imputation |
title | Effect of Variable Selection Strategy on the Performance of Prognostic Models When Using Multiple Imputation |
title_full | Effect of Variable Selection Strategy on the Performance of Prognostic Models When Using Multiple Imputation |
title_fullStr | Effect of Variable Selection Strategy on the Performance of Prognostic Models When Using Multiple Imputation |
title_full_unstemmed | Effect of Variable Selection Strategy on the Performance of Prognostic Models When Using Multiple Imputation |
title_short | Effect of Variable Selection Strategy on the Performance of Prognostic Models When Using Multiple Imputation |
title_sort | effect of variable selection strategy on the performance of prognostic models when using multiple imputation |
topic | Methods Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7665277/ https://www.ncbi.nlm.nih.gov/pubmed/31718298 http://dx.doi.org/10.1161/CIRCOUTCOMES.119.005927 |
work_keys_str_mv | AT austinpeterc effectofvariableselectionstrategyontheperformanceofprognosticmodelswhenusingmultipleimputation AT leedouglass effectofvariableselectionstrategyontheperformanceofprognosticmodelswhenusingmultipleimputation AT kodennist effectofvariableselectionstrategyontheperformanceofprognosticmodelswhenusingmultipleimputation AT whiteianr effectofvariableselectionstrategyontheperformanceofprognosticmodelswhenusingmultipleimputation |