Cargando…

Effects of Different Missing Data Imputation Techniques on the Performance of Undiagnosed Diabetes Risk Prediction Models in a Mixed-Ancestry Population of South Africa

BACKGROUND: Imputation techniques used to handle missing data are based on the principle of replacement. It is widely advocated that multiple imputation is superior to other imputation methods, however studies have suggested that simple methods for filling missing data can be just as accurate as com...

Descripción completa

Detalles Bibliográficos
Autores principales:	Masconi, Katya L., Matsha, Tandi E., Erasmus, Rajiv T., Kengne, Andre P.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4583496/ https://www.ncbi.nlm.nih.gov/pubmed/26406594 http://dx.doi.org/10.1371/journal.pone.0139210

_version_	1782391861844901888
author	Masconi, Katya L. Matsha, Tandi E. Erasmus, Rajiv T. Kengne, Andre P.
author_facet	Masconi, Katya L. Matsha, Tandi E. Erasmus, Rajiv T. Kengne, Andre P.
author_sort	Masconi, Katya L.
collection	PubMed
description	BACKGROUND: Imputation techniques used to handle missing data are based on the principle of replacement. It is widely advocated that multiple imputation is superior to other imputation methods, however studies have suggested that simple methods for filling missing data can be just as accurate as complex methods. The objective of this study was to implement a number of simple and more complex imputation methods, and assess the effect of these techniques on the performance of undiagnosed diabetes risk prediction models during external validation. METHODS: Data from the Cape Town Bellville-South cohort served as the basis for this study. Imputation methods and models were identified via recent systematic reviews. Models’ discrimination was assessed and compared using C-statistic and non-parametric methods, before and after recalibration through simple intercept adjustment. RESULTS: The study sample consisted of 1256 individuals, of whom 173 were excluded due to previously diagnosed diabetes. Of the final 1083 individuals, 329 (30.4%) had missing data. Family history had the highest proportion of missing data (25%). Imputation of the outcome, undiagnosed diabetes, was highest in stochastic regression imputation (163 individuals). Overall, deletion resulted in the lowest model performances while simple imputation yielded the highest C-statistic for the Cambridge Diabetes Risk model, Kuwaiti Risk model, Omani Diabetes Risk model and Rotterdam Predictive model. Multiple imputation only yielded the highest C-statistic for the Rotterdam Predictive model, which were matched by simpler imputation methods. CONCLUSIONS: Deletion was confirmed as a poor technique for handling missing data. However, despite the emphasized disadvantages of simpler imputation methods, this study showed that implementing these methods results in similar predictive utility for undiagnosed diabetes when compared to multiple imputation.
format	Online Article Text
id	pubmed-4583496
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-45834962015-10-02 Effects of Different Missing Data Imputation Techniques on the Performance of Undiagnosed Diabetes Risk Prediction Models in a Mixed-Ancestry Population of South Africa Masconi, Katya L. Matsha, Tandi E. Erasmus, Rajiv T. Kengne, Andre P. PLoS One Research Article BACKGROUND: Imputation techniques used to handle missing data are based on the principle of replacement. It is widely advocated that multiple imputation is superior to other imputation methods, however studies have suggested that simple methods for filling missing data can be just as accurate as complex methods. The objective of this study was to implement a number of simple and more complex imputation methods, and assess the effect of these techniques on the performance of undiagnosed diabetes risk prediction models during external validation. METHODS: Data from the Cape Town Bellville-South cohort served as the basis for this study. Imputation methods and models were identified via recent systematic reviews. Models’ discrimination was assessed and compared using C-statistic and non-parametric methods, before and after recalibration through simple intercept adjustment. RESULTS: The study sample consisted of 1256 individuals, of whom 173 were excluded due to previously diagnosed diabetes. Of the final 1083 individuals, 329 (30.4%) had missing data. Family history had the highest proportion of missing data (25%). Imputation of the outcome, undiagnosed diabetes, was highest in stochastic regression imputation (163 individuals). Overall, deletion resulted in the lowest model performances while simple imputation yielded the highest C-statistic for the Cambridge Diabetes Risk model, Kuwaiti Risk model, Omani Diabetes Risk model and Rotterdam Predictive model. Multiple imputation only yielded the highest C-statistic for the Rotterdam Predictive model, which were matched by simpler imputation methods. CONCLUSIONS: Deletion was confirmed as a poor technique for handling missing data. However, despite the emphasized disadvantages of simpler imputation methods, this study showed that implementing these methods results in similar predictive utility for undiagnosed diabetes when compared to multiple imputation. Public Library of Science 2015-09-25 /pmc/articles/PMC4583496/ /pubmed/26406594 http://dx.doi.org/10.1371/journal.pone.0139210 Text en © 2015 Masconi et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Masconi, Katya L. Matsha, Tandi E. Erasmus, Rajiv T. Kengne, Andre P. Effects of Different Missing Data Imputation Techniques on the Performance of Undiagnosed Diabetes Risk Prediction Models in a Mixed-Ancestry Population of South Africa
title	Effects of Different Missing Data Imputation Techniques on the Performance of Undiagnosed Diabetes Risk Prediction Models in a Mixed-Ancestry Population of South Africa
title_full	Effects of Different Missing Data Imputation Techniques on the Performance of Undiagnosed Diabetes Risk Prediction Models in a Mixed-Ancestry Population of South Africa
title_fullStr	Effects of Different Missing Data Imputation Techniques on the Performance of Undiagnosed Diabetes Risk Prediction Models in a Mixed-Ancestry Population of South Africa
title_full_unstemmed	Effects of Different Missing Data Imputation Techniques on the Performance of Undiagnosed Diabetes Risk Prediction Models in a Mixed-Ancestry Population of South Africa
title_short	Effects of Different Missing Data Imputation Techniques on the Performance of Undiagnosed Diabetes Risk Prediction Models in a Mixed-Ancestry Population of South Africa
title_sort	effects of different missing data imputation techniques on the performance of undiagnosed diabetes risk prediction models in a mixed-ancestry population of south africa
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4583496/ https://www.ncbi.nlm.nih.gov/pubmed/26406594 http://dx.doi.org/10.1371/journal.pone.0139210
work_keys_str_mv	AT masconikatyal effectsofdifferentmissingdataimputationtechniquesontheperformanceofundiagnoseddiabetesriskpredictionmodelsinamixedancestrypopulationofsouthafrica AT matshatandie effectsofdifferentmissingdataimputationtechniquesontheperformanceofundiagnoseddiabetesriskpredictionmodelsinamixedancestrypopulationofsouthafrica AT erasmusrajivt effectsofdifferentmissingdataimputationtechniquesontheperformanceofundiagnoseddiabetesriskpredictionmodelsinamixedancestrypopulationofsouthafrica AT kengneandrep effectsofdifferentmissingdataimputationtechniquesontheperformanceofundiagnoseddiabetesriskpredictionmodelsinamixedancestrypopulationofsouthafrica

Effects of Different Missing Data Imputation Techniques on the Performance of Undiagnosed Diabetes Risk Prediction Models in a Mixed-Ancestry Population of South Africa

Ejemplares similares