Cargando…

Does the Missing Data Imputation Method Affect the Composition and Performance of Prognostic Models?

BACKGROUND: We already showed the superiority of imputation of missing data (via Multivariable Imputation via Chained Equations (MICE) method) over exclusion of them; however, the methodology of MICE is complicated. Furthermore, easier imputation methods are available. The aim of this study was to c...

Descripción completa

Detalles Bibliográficos
Autores principales: Baneshi, M R, Talei, A R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Kowsar 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3372019/
https://www.ncbi.nlm.nih.gov/pubmed/22737551
_version_ 1782235308533743616
author Baneshi, M R
Talei, A R
author_facet Baneshi, M R
Talei, A R
author_sort Baneshi, M R
collection PubMed
description BACKGROUND: We already showed the superiority of imputation of missing data (via Multivariable Imputation via Chained Equations (MICE) method) over exclusion of them; however, the methodology of MICE is complicated. Furthermore, easier imputation methods are available. The aim of this study was to compare them in terms of model composition and performance. METHODS: Three hundreds and ten breast cancer patients were recruited. Four approaches were applied to impute missing data. First we adopted an ad hoc method in which missing data for each variable was replaced by the median of observed values. Then 3 likelihood-based approaches were used. In the regression imputation, a regression model compared the variable with missing data to the rest of the variables. The regression equation was used to fill the missing data. The Expectation Maximum (E-M) algorithm was implemented in which missing data and regression parameters were estimated iteratively until convergence of regression parameters. Finally, the MICE method was applied. Models developed were compared in terms of variables significantly contributed to the multifactorial analysis, sensitivity and specificity. RESULTS: All candidate variables significantly contributed to the MICE model. However, grade of disease lost its effect in other three models. The MICE model showed the best performance followed by E-M model. CONCLUSION: Among imputation methods, final models were not the same, in terms of composition and perform­ance. Therefore, modern imputation methods are recommended to recover the information.
format Online
Article
Text
id pubmed-3372019
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Kowsar
record_format MEDLINE/PubMed
spelling pubmed-33720192012-06-21 Does the Missing Data Imputation Method Affect the Composition and Performance of Prognostic Models? Baneshi, M R Talei, A R Iran Red Crescent Med J Original Article BACKGROUND: We already showed the superiority of imputation of missing data (via Multivariable Imputation via Chained Equations (MICE) method) over exclusion of them; however, the methodology of MICE is complicated. Furthermore, easier imputation methods are available. The aim of this study was to compare them in terms of model composition and performance. METHODS: Three hundreds and ten breast cancer patients were recruited. Four approaches were applied to impute missing data. First we adopted an ad hoc method in which missing data for each variable was replaced by the median of observed values. Then 3 likelihood-based approaches were used. In the regression imputation, a regression model compared the variable with missing data to the rest of the variables. The regression equation was used to fill the missing data. The Expectation Maximum (E-M) algorithm was implemented in which missing data and regression parameters were estimated iteratively until convergence of regression parameters. Finally, the MICE method was applied. Models developed were compared in terms of variables significantly contributed to the multifactorial analysis, sensitivity and specificity. RESULTS: All candidate variables significantly contributed to the MICE model. However, grade of disease lost its effect in other three models. The MICE model showed the best performance followed by E-M model. CONCLUSION: Among imputation methods, final models were not the same, in terms of composition and perform­ance. Therefore, modern imputation methods are recommended to recover the information. Kowsar 2012-01 2012-01-01 /pmc/articles/PMC3372019/ /pubmed/22737551 Text en Copyright © 2012, Kowsar Corp. http://creativecommons.org/licenses/by/2.5/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Baneshi, M R
Talei, A R
Does the Missing Data Imputation Method Affect the Composition and Performance of Prognostic Models?
title Does the Missing Data Imputation Method Affect the Composition and Performance of Prognostic Models?
title_full Does the Missing Data Imputation Method Affect the Composition and Performance of Prognostic Models?
title_fullStr Does the Missing Data Imputation Method Affect the Composition and Performance of Prognostic Models?
title_full_unstemmed Does the Missing Data Imputation Method Affect the Composition and Performance of Prognostic Models?
title_short Does the Missing Data Imputation Method Affect the Composition and Performance of Prognostic Models?
title_sort does the missing data imputation method affect the composition and performance of prognostic models?
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3372019/
https://www.ncbi.nlm.nih.gov/pubmed/22737551
work_keys_str_mv AT baneshimr doesthemissingdataimputationmethodaffectthecompositionandperformanceofprognosticmodels
AT taleiar doesthemissingdataimputationmethodaffectthecompositionandperformanceofprognosticmodels