Cargando…

Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation

BACKGROUND: Missing values are a frequent issue in human studies. In many situations, multiple imputation (MI) is an appropriate missing data handling strategy, whereby missing values are imputed multiple times, the analysis is performed in every imputed data set, and the obtained estimates are pool...

Descripción completa

Detalles Bibliográficos
Autores principales: Wahl, Simone, Boulesteix, Anne-Laure, Zierer, Astrid, Thorand, Barbara, Avan de Wiel, Mark
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5080703/
https://www.ncbi.nlm.nih.gov/pubmed/27782817
http://dx.doi.org/10.1186/s12874-016-0239-7
_version_ 1782462774645882880
author Wahl, Simone
Boulesteix, Anne-Laure
Zierer, Astrid
Thorand, Barbara
Avan de Wiel, Mark
author_facet Wahl, Simone
Boulesteix, Anne-Laure
Zierer, Astrid
Thorand, Barbara
Avan de Wiel, Mark
author_sort Wahl, Simone
collection PubMed
description BACKGROUND: Missing values are a frequent issue in human studies. In many situations, multiple imputation (MI) is an appropriate missing data handling strategy, whereby missing values are imputed multiple times, the analysis is performed in every imputed data set, and the obtained estimates are pooled. If the aim is to estimate (added) predictive performance measures, such as (change in) the area under the receiver-operating characteristic curve (AUC), internal validation strategies become desirable in order to correct for optimism. It is not fully understood how internal validation should be combined with multiple imputation. METHODS: In a comprehensive simulation study and in a real data set based on blood markers as predictors for mortality, we compare three combination strategies: Val-MI, internal validation followed by MI on the training and test parts separately, MI-Val, MI on the full data set followed by internal validation, and MI(-y)-Val, MI on the full data set omitting the outcome followed by internal validation. Different validation strategies, including bootstrap und cross-validation, different (added) performance measures, and various data characteristics are considered, and the strategies are evaluated with regard to bias and mean squared error of the obtained performance estimates. In addition, we elaborate on the number of resamples and imputations to be used, and adopt a strategy for confidence interval construction to incomplete data. RESULTS: Internal validation is essential in order to avoid optimism, with the bootstrap 0.632+ estimate representing a reliable method to correct for optimism. While estimates obtained by MI-Val are optimistically biased, those obtained by MI(-y)-Val tend to be pessimistic in the presence of a true underlying effect. Val-MI provides largely unbiased estimates, with a slight pessimistic bias with increasing true effect size, number of covariates and decreasing sample size. In Val-MI, accuracy of the estimate is more strongly improved by increasing the number of bootstrap draws rather than the number of imputations. With a simple integrated approach, valid confidence intervals for performance estimates can be obtained. CONCLUSIONS: When prognostic models are developed on incomplete data, Val-MI represents a valid strategy to obtain estimates of predictive performance measures. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12874-016-0239-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5080703
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50807032016-10-31 Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation Wahl, Simone Boulesteix, Anne-Laure Zierer, Astrid Thorand, Barbara Avan de Wiel, Mark BMC Med Res Methodol Research Article BACKGROUND: Missing values are a frequent issue in human studies. In many situations, multiple imputation (MI) is an appropriate missing data handling strategy, whereby missing values are imputed multiple times, the analysis is performed in every imputed data set, and the obtained estimates are pooled. If the aim is to estimate (added) predictive performance measures, such as (change in) the area under the receiver-operating characteristic curve (AUC), internal validation strategies become desirable in order to correct for optimism. It is not fully understood how internal validation should be combined with multiple imputation. METHODS: In a comprehensive simulation study and in a real data set based on blood markers as predictors for mortality, we compare three combination strategies: Val-MI, internal validation followed by MI on the training and test parts separately, MI-Val, MI on the full data set followed by internal validation, and MI(-y)-Val, MI on the full data set omitting the outcome followed by internal validation. Different validation strategies, including bootstrap und cross-validation, different (added) performance measures, and various data characteristics are considered, and the strategies are evaluated with regard to bias and mean squared error of the obtained performance estimates. In addition, we elaborate on the number of resamples and imputations to be used, and adopt a strategy for confidence interval construction to incomplete data. RESULTS: Internal validation is essential in order to avoid optimism, with the bootstrap 0.632+ estimate representing a reliable method to correct for optimism. While estimates obtained by MI-Val are optimistically biased, those obtained by MI(-y)-Val tend to be pessimistic in the presence of a true underlying effect. Val-MI provides largely unbiased estimates, with a slight pessimistic bias with increasing true effect size, number of covariates and decreasing sample size. In Val-MI, accuracy of the estimate is more strongly improved by increasing the number of bootstrap draws rather than the number of imputations. With a simple integrated approach, valid confidence intervals for performance estimates can be obtained. CONCLUSIONS: When prognostic models are developed on incomplete data, Val-MI represents a valid strategy to obtain estimates of predictive performance measures. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12874-016-0239-7) contains supplementary material, which is available to authorized users. BioMed Central 2016-10-26 /pmc/articles/PMC5080703/ /pubmed/27782817 http://dx.doi.org/10.1186/s12874-016-0239-7 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Wahl, Simone
Boulesteix, Anne-Laure
Zierer, Astrid
Thorand, Barbara
Avan de Wiel, Mark
Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation
title Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation
title_full Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation
title_fullStr Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation
title_full_unstemmed Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation
title_short Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation
title_sort assessment of predictive performance in incomplete data by combining internal validation and multiple imputation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5080703/
https://www.ncbi.nlm.nih.gov/pubmed/27782817
http://dx.doi.org/10.1186/s12874-016-0239-7
work_keys_str_mv AT wahlsimone assessmentofpredictiveperformanceinincompletedatabycombininginternalvalidationandmultipleimputation
AT boulesteixannelaure assessmentofpredictiveperformanceinincompletedatabycombininginternalvalidationandmultipleimputation
AT ziererastrid assessmentofpredictiveperformanceinincompletedatabycombininginternalvalidationandmultipleimputation
AT thorandbarbara assessmentofpredictiveperformanceinincompletedatabycombininginternalvalidationandmultipleimputation
AT avandewielmark assessmentofpredictiveperformanceinincompletedatabycombininginternalvalidationandmultipleimputation