Cargando…

Comparison of regression imputation methods of baseline covariates that predict survival outcomes

INTRODUCTION: Missing data are inevitable in medical research and appropriate handling of missing data is critical for statistical estimation and making inferences. Imputation is often employed in order to maximize the amount of data available for statistical analysis and is preferred over the typic...

Descripción completa

Detalles Bibliográficos
Autores principales: Solomon, Nicole, Lokhnygina, Yuliya, Halabi, Susan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cambridge University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8057424/
https://www.ncbi.nlm.nih.gov/pubmed/33948262
http://dx.doi.org/10.1017/cts.2020.533
_version_ 1783680832838303744
author Solomon, Nicole
Lokhnygina, Yuliya
Halabi, Susan
author_facet Solomon, Nicole
Lokhnygina, Yuliya
Halabi, Susan
author_sort Solomon, Nicole
collection PubMed
description INTRODUCTION: Missing data are inevitable in medical research and appropriate handling of missing data is critical for statistical estimation and making inferences. Imputation is often employed in order to maximize the amount of data available for statistical analysis and is preferred over the typically biased output of complete case analysis. This article examines several types of regression imputation of missing covariates in the prediction of time-to-event outcomes subject to right censoring. METHODS: We evaluated the performance of five regression methods in the imputation of missing covariates for the proportional hazards model via summary statistics, including proportional bias and proportional mean squared error. The primary objective was to determine which among the parametric generalized linear models (GLMs) and least absolute shrinkage and selection operator (LASSO), and nonparametric multivariate adaptive regression splines (MARS), support vector machine (SVM), and random forest (RF), provides the “best” imputation model for baseline missing covariates in predicting a survival outcome. RESULTS: LASSO on an average observed the smallest bias, mean square error, mean square prediction error, and median absolute deviation (MAD) of the final analysis model’s parameters among all five methods considered. SVM performed the second best while GLM and MARS exhibited the lowest relative performances. CONCLUSION: LASSO and SVM outperform GLM, MARS, and RF in the context of regression imputation for prediction of a time-to-event outcome.
format Online
Article
Text
id pubmed-8057424
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Cambridge University Press
record_format MEDLINE/PubMed
spelling pubmed-80574242021-05-03 Comparison of regression imputation methods of baseline covariates that predict survival outcomes Solomon, Nicole Lokhnygina, Yuliya Halabi, Susan J Clin Transl Sci Research Article INTRODUCTION: Missing data are inevitable in medical research and appropriate handling of missing data is critical for statistical estimation and making inferences. Imputation is often employed in order to maximize the amount of data available for statistical analysis and is preferred over the typically biased output of complete case analysis. This article examines several types of regression imputation of missing covariates in the prediction of time-to-event outcomes subject to right censoring. METHODS: We evaluated the performance of five regression methods in the imputation of missing covariates for the proportional hazards model via summary statistics, including proportional bias and proportional mean squared error. The primary objective was to determine which among the parametric generalized linear models (GLMs) and least absolute shrinkage and selection operator (LASSO), and nonparametric multivariate adaptive regression splines (MARS), support vector machine (SVM), and random forest (RF), provides the “best” imputation model for baseline missing covariates in predicting a survival outcome. RESULTS: LASSO on an average observed the smallest bias, mean square error, mean square prediction error, and median absolute deviation (MAD) of the final analysis model’s parameters among all five methods considered. SVM performed the second best while GLM and MARS exhibited the lowest relative performances. CONCLUSION: LASSO and SVM outperform GLM, MARS, and RF in the context of regression imputation for prediction of a time-to-event outcome. Cambridge University Press 2020-09-04 /pmc/articles/PMC8057424/ /pubmed/33948262 http://dx.doi.org/10.1017/cts.2020.533 Text en © The Association for Clinical and Translational Science 2020 https://creativecommons.org/licenses/by/4.0/This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Solomon, Nicole
Lokhnygina, Yuliya
Halabi, Susan
Comparison of regression imputation methods of baseline covariates that predict survival outcomes
title Comparison of regression imputation methods of baseline covariates that predict survival outcomes
title_full Comparison of regression imputation methods of baseline covariates that predict survival outcomes
title_fullStr Comparison of regression imputation methods of baseline covariates that predict survival outcomes
title_full_unstemmed Comparison of regression imputation methods of baseline covariates that predict survival outcomes
title_short Comparison of regression imputation methods of baseline covariates that predict survival outcomes
title_sort comparison of regression imputation methods of baseline covariates that predict survival outcomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8057424/
https://www.ncbi.nlm.nih.gov/pubmed/33948262
http://dx.doi.org/10.1017/cts.2020.533
work_keys_str_mv AT solomonnicole comparisonofregressionimputationmethodsofbaselinecovariatesthatpredictsurvivaloutcomes
AT lokhnyginayuliya comparisonofregressionimputationmethodsofbaselinecovariatesthatpredictsurvivaloutcomes
AT halabisusan comparisonofregressionimputationmethodsofbaselinecovariatesthatpredictsurvivaloutcomes