Cargando…

Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: A simulation study

Background: In clinical prediction modelling, missing data can occur at any stage of the model pipeline; development, validation or deployment. Multiple imputation is often recommended yet challenging to apply at deployment; for example, the outcome cannot be in the imputation model, as recommended...

Descripción completa

Detalles Bibliográficos
Autores principales: Sisk, Rose, Sperrin, Matthew, Peek, Niels, van Smeden, Maarten, Martin, Glen Philip
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10515473/
https://www.ncbi.nlm.nih.gov/pubmed/37105540
http://dx.doi.org/10.1177/09622802231165001
_version_ 1785108954902167552
author Sisk, Rose
Sperrin, Matthew
Peek, Niels
van Smeden, Maarten
Martin, Glen Philip
author_facet Sisk, Rose
Sperrin, Matthew
Peek, Niels
van Smeden, Maarten
Martin, Glen Philip
author_sort Sisk, Rose
collection PubMed
description Background: In clinical prediction modelling, missing data can occur at any stage of the model pipeline; development, validation or deployment. Multiple imputation is often recommended yet challenging to apply at deployment; for example, the outcome cannot be in the imputation model, as recommended under multiple imputation. Regression imputation uses a fitted model to impute the predicted value of missing predictors from observed data, and could offer a pragmatic alternative at deployment. Moreover, the use of missing indicators has been proposed to handle informative missingness, but it is currently unknown how well this method performs in the context of clinical prediction models. Methods: We simulated data under various missing data mechanisms to compare the predictive performance of clinical prediction models developed using both imputation methods. We consider deployment scenarios where missing data is permitted or prohibited, imputation models that use or omit the outcome, and clinical prediction models that include or omit missing indicators. We assume that the missingness mechanism remains constant across the model pipeline. We also apply the proposed strategies to critical care data. Results: With complete data available at deployment, our findings were in line with existing recommendations; that the outcome should be used to impute development data when using multiple imputation and omitted under regression imputation. When missingness is allowed at deployment, omitting the outcome from the imputation model at the development was preferred. Missing indicators improved model performance in many cases but can be harmful under outcome-dependent missingness. Conclusion: We provide evidence that commonly taught principles of handling missing data via multiple imputation may not apply to clinical prediction models, particularly when data can be missing at deployment. We observed comparable predictive performance under multiple imputation and regression imputation. The performance of the missing data handling method must be evaluated on a study-by-study basis, and the most appropriate strategy for handling missing data at development should consider whether missing data are allowed at deployment. Some guidance is provided.
format Online
Article
Text
id pubmed-10515473
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-105154732023-09-23 Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: A simulation study Sisk, Rose Sperrin, Matthew Peek, Niels van Smeden, Maarten Martin, Glen Philip Stat Methods Med Res Original Research Articles Background: In clinical prediction modelling, missing data can occur at any stage of the model pipeline; development, validation or deployment. Multiple imputation is often recommended yet challenging to apply at deployment; for example, the outcome cannot be in the imputation model, as recommended under multiple imputation. Regression imputation uses a fitted model to impute the predicted value of missing predictors from observed data, and could offer a pragmatic alternative at deployment. Moreover, the use of missing indicators has been proposed to handle informative missingness, but it is currently unknown how well this method performs in the context of clinical prediction models. Methods: We simulated data under various missing data mechanisms to compare the predictive performance of clinical prediction models developed using both imputation methods. We consider deployment scenarios where missing data is permitted or prohibited, imputation models that use or omit the outcome, and clinical prediction models that include or omit missing indicators. We assume that the missingness mechanism remains constant across the model pipeline. We also apply the proposed strategies to critical care data. Results: With complete data available at deployment, our findings were in line with existing recommendations; that the outcome should be used to impute development data when using multiple imputation and omitted under regression imputation. When missingness is allowed at deployment, omitting the outcome from the imputation model at the development was preferred. Missing indicators improved model performance in many cases but can be harmful under outcome-dependent missingness. Conclusion: We provide evidence that commonly taught principles of handling missing data via multiple imputation may not apply to clinical prediction models, particularly when data can be missing at deployment. We observed comparable predictive performance under multiple imputation and regression imputation. The performance of the missing data handling method must be evaluated on a study-by-study basis, and the most appropriate strategy for handling missing data at development should consider whether missing data are allowed at deployment. Some guidance is provided. SAGE Publications 2023-04-27 2023-08 /pmc/articles/PMC10515473/ /pubmed/37105540 http://dx.doi.org/10.1177/09622802231165001 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by-nc/4.0/This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Original Research Articles
Sisk, Rose
Sperrin, Matthew
Peek, Niels
van Smeden, Maarten
Martin, Glen Philip
Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: A simulation study
title Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: A simulation study
title_full Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: A simulation study
title_fullStr Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: A simulation study
title_full_unstemmed Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: A simulation study
title_short Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: A simulation study
title_sort imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: a simulation study
topic Original Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10515473/
https://www.ncbi.nlm.nih.gov/pubmed/37105540
http://dx.doi.org/10.1177/09622802231165001
work_keys_str_mv AT siskrose imputationandmissingindicatorsforhandlingmissingdatainthedevelopmentanddeploymentofclinicalpredictionmodelsasimulationstudy
AT sperrinmatthew imputationandmissingindicatorsforhandlingmissingdatainthedevelopmentanddeploymentofclinicalpredictionmodelsasimulationstudy
AT peekniels imputationandmissingindicatorsforhandlingmissingdatainthedevelopmentanddeploymentofclinicalpredictionmodelsasimulationstudy
AT vansmedenmaarten imputationandmissingindicatorsforhandlingmissingdatainthedevelopmentanddeploymentofclinicalpredictionmodelsasimulationstudy
AT martinglenphilip imputationandmissingindicatorsforhandlingmissingdatainthedevelopmentanddeploymentofclinicalpredictionmodelsasimulationstudy