Cargando…
Comparison of regression imputation methods of baseline covariates that predict survival outcomes
INTRODUCTION: Missing data are inevitable in medical research and appropriate handling of missing data is critical for statistical estimation and making inferences. Imputation is often employed in order to maximize the amount of data available for statistical analysis and is preferred over the typic...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cambridge University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8057424/ https://www.ncbi.nlm.nih.gov/pubmed/33948262 http://dx.doi.org/10.1017/cts.2020.533 |
_version_ | 1783680832838303744 |
---|---|
author | Solomon, Nicole Lokhnygina, Yuliya Halabi, Susan |
author_facet | Solomon, Nicole Lokhnygina, Yuliya Halabi, Susan |
author_sort | Solomon, Nicole |
collection | PubMed |
description | INTRODUCTION: Missing data are inevitable in medical research and appropriate handling of missing data is critical for statistical estimation and making inferences. Imputation is often employed in order to maximize the amount of data available for statistical analysis and is preferred over the typically biased output of complete case analysis. This article examines several types of regression imputation of missing covariates in the prediction of time-to-event outcomes subject to right censoring. METHODS: We evaluated the performance of five regression methods in the imputation of missing covariates for the proportional hazards model via summary statistics, including proportional bias and proportional mean squared error. The primary objective was to determine which among the parametric generalized linear models (GLMs) and least absolute shrinkage and selection operator (LASSO), and nonparametric multivariate adaptive regression splines (MARS), support vector machine (SVM), and random forest (RF), provides the “best” imputation model for baseline missing covariates in predicting a survival outcome. RESULTS: LASSO on an average observed the smallest bias, mean square error, mean square prediction error, and median absolute deviation (MAD) of the final analysis model’s parameters among all five methods considered. SVM performed the second best while GLM and MARS exhibited the lowest relative performances. CONCLUSION: LASSO and SVM outperform GLM, MARS, and RF in the context of regression imputation for prediction of a time-to-event outcome. |
format | Online Article Text |
id | pubmed-8057424 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Cambridge University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-80574242021-05-03 Comparison of regression imputation methods of baseline covariates that predict survival outcomes Solomon, Nicole Lokhnygina, Yuliya Halabi, Susan J Clin Transl Sci Research Article INTRODUCTION: Missing data are inevitable in medical research and appropriate handling of missing data is critical for statistical estimation and making inferences. Imputation is often employed in order to maximize the amount of data available for statistical analysis and is preferred over the typically biased output of complete case analysis. This article examines several types of regression imputation of missing covariates in the prediction of time-to-event outcomes subject to right censoring. METHODS: We evaluated the performance of five regression methods in the imputation of missing covariates for the proportional hazards model via summary statistics, including proportional bias and proportional mean squared error. The primary objective was to determine which among the parametric generalized linear models (GLMs) and least absolute shrinkage and selection operator (LASSO), and nonparametric multivariate adaptive regression splines (MARS), support vector machine (SVM), and random forest (RF), provides the “best” imputation model for baseline missing covariates in predicting a survival outcome. RESULTS: LASSO on an average observed the smallest bias, mean square error, mean square prediction error, and median absolute deviation (MAD) of the final analysis model’s parameters among all five methods considered. SVM performed the second best while GLM and MARS exhibited the lowest relative performances. CONCLUSION: LASSO and SVM outperform GLM, MARS, and RF in the context of regression imputation for prediction of a time-to-event outcome. Cambridge University Press 2020-09-04 /pmc/articles/PMC8057424/ /pubmed/33948262 http://dx.doi.org/10.1017/cts.2020.533 Text en © The Association for Clinical and Translational Science 2020 https://creativecommons.org/licenses/by/4.0/This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Solomon, Nicole Lokhnygina, Yuliya Halabi, Susan Comparison of regression imputation methods of baseline covariates that predict survival outcomes |
title | Comparison of regression imputation methods of baseline covariates that predict survival outcomes |
title_full | Comparison of regression imputation methods of baseline covariates that predict survival outcomes |
title_fullStr | Comparison of regression imputation methods of baseline covariates that predict survival outcomes |
title_full_unstemmed | Comparison of regression imputation methods of baseline covariates that predict survival outcomes |
title_short | Comparison of regression imputation methods of baseline covariates that predict survival outcomes |
title_sort | comparison of regression imputation methods of baseline covariates that predict survival outcomes |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8057424/ https://www.ncbi.nlm.nih.gov/pubmed/33948262 http://dx.doi.org/10.1017/cts.2020.533 |
work_keys_str_mv | AT solomonnicole comparisonofregressionimputationmethodsofbaselinecovariatesthatpredictsurvivaloutcomes AT lokhnyginayuliya comparisonofregressionimputationmethodsofbaselinecovariatesthatpredictsurvivaloutcomes AT halabisusan comparisonofregressionimputationmethodsofbaselinecovariatesthatpredictsurvivaloutcomes |