Cargando…

Variable Importance and Prediction Methods for Longitudinal Problems with Missing Variables

We present prediction and variable importance (VIM) methods for longitudinal data sets containing continuous and binary exposures subject to missingness. We demonstrate the use of these methods for prognosis of medical outcomes of severe trauma patients, a field in which current medical practice inv...

Descripción completa

Detalles Bibliográficos
Autores principales:	Díaz, Iván, Hubbard, Alan, Decker, Anna, Cohen, Mitchell
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4376910/ https://www.ncbi.nlm.nih.gov/pubmed/25815719 http://dx.doi.org/10.1371/journal.pone.0120031

_version_	1782363809921368064
author	Díaz, Iván Hubbard, Alan Decker, Anna Cohen, Mitchell
author_facet	Díaz, Iván Hubbard, Alan Decker, Anna Cohen, Mitchell
author_sort	Díaz, Iván
collection	PubMed
description	We present prediction and variable importance (VIM) methods for longitudinal data sets containing continuous and binary exposures subject to missingness. We demonstrate the use of these methods for prognosis of medical outcomes of severe trauma patients, a field in which current medical practice involves rules of thumb and scoring methods that only use a few variables and ignore the dynamic and high-dimensional nature of trauma recovery. Well-principled prediction and VIM methods can provide a tool to make care decisions informed by the high-dimensional patient’s physiological and clinical history. Our VIM parameters are analogous to slope coefficients in adjusted regressions, but are not dependent on a specific statistical model, nor require a certain functional form of the prediction regression to be estimated. In addition, they can be causally interpreted under causal and statistical assumptions as the expected outcome under time-specific clinical interventions, related to changes in the mean of the outcome if each individual experiences a specified change in the variable (keeping other variables in the model fixed). Better yet, the targeted MLE used is doubly robust and locally efficient. Because the proposed VIM does not constrain the prediction model fit, we use a very flexible ensemble learner (the SuperLearner), which returns a linear combination of a list of user-given algorithms. Not only is such a prediction algorithm intuitive appealing, it has theoretical justification as being asymptotically equivalent to the oracle selector. The results of the analysis show effects whose size and significance would have been not been found using a parametric approach (such as stepwise regression or LASSO). In addition, the procedure is even more compelling as the predictor on which it is based showed significant improvements in cross-validated fit, for instance area under the curve (AUC) for a receiver-operator curve (ROC). Thus, given that 1) our VIM applies to any model fitting procedure, 2) under assumptions has meaningful clinical (causal) interpretations and 3) has asymptotic (influence-curve) based robust inference, it provides a compelling alternative to existing methods for estimating variable importance in high-dimensional clinical (or other) data.
format	Online Article Text
id	pubmed-4376910
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-43769102015-04-04 Variable Importance and Prediction Methods for Longitudinal Problems with Missing Variables Díaz, Iván Hubbard, Alan Decker, Anna Cohen, Mitchell PLoS One Research Article We present prediction and variable importance (VIM) methods for longitudinal data sets containing continuous and binary exposures subject to missingness. We demonstrate the use of these methods for prognosis of medical outcomes of severe trauma patients, a field in which current medical practice involves rules of thumb and scoring methods that only use a few variables and ignore the dynamic and high-dimensional nature of trauma recovery. Well-principled prediction and VIM methods can provide a tool to make care decisions informed by the high-dimensional patient’s physiological and clinical history. Our VIM parameters are analogous to slope coefficients in adjusted regressions, but are not dependent on a specific statistical model, nor require a certain functional form of the prediction regression to be estimated. In addition, they can be causally interpreted under causal and statistical assumptions as the expected outcome under time-specific clinical interventions, related to changes in the mean of the outcome if each individual experiences a specified change in the variable (keeping other variables in the model fixed). Better yet, the targeted MLE used is doubly robust and locally efficient. Because the proposed VIM does not constrain the prediction model fit, we use a very flexible ensemble learner (the SuperLearner), which returns a linear combination of a list of user-given algorithms. Not only is such a prediction algorithm intuitive appealing, it has theoretical justification as being asymptotically equivalent to the oracle selector. The results of the analysis show effects whose size and significance would have been not been found using a parametric approach (such as stepwise regression or LASSO). In addition, the procedure is even more compelling as the predictor on which it is based showed significant improvements in cross-validated fit, for instance area under the curve (AUC) for a receiver-operator curve (ROC). Thus, given that 1) our VIM applies to any model fitting procedure, 2) under assumptions has meaningful clinical (causal) interpretations and 3) has asymptotic (influence-curve) based robust inference, it provides a compelling alternative to existing methods for estimating variable importance in high-dimensional clinical (or other) data. Public Library of Science 2015-03-27 /pmc/articles/PMC4376910/ /pubmed/25815719 http://dx.doi.org/10.1371/journal.pone.0120031 Text en © 2015 Díaz et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Díaz, Iván Hubbard, Alan Decker, Anna Cohen, Mitchell Variable Importance and Prediction Methods for Longitudinal Problems with Missing Variables
title	Variable Importance and Prediction Methods for Longitudinal Problems with Missing Variables
title_full	Variable Importance and Prediction Methods for Longitudinal Problems with Missing Variables
title_fullStr	Variable Importance and Prediction Methods for Longitudinal Problems with Missing Variables
title_full_unstemmed	Variable Importance and Prediction Methods for Longitudinal Problems with Missing Variables
title_short	Variable Importance and Prediction Methods for Longitudinal Problems with Missing Variables
title_sort	variable importance and prediction methods for longitudinal problems with missing variables
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4376910/ https://www.ncbi.nlm.nih.gov/pubmed/25815719 http://dx.doi.org/10.1371/journal.pone.0120031
work_keys_str_mv	AT diazivan variableimportanceandpredictionmethodsforlongitudinalproblemswithmissingvariables AT hubbardalan variableimportanceandpredictionmethodsforlongitudinalproblemswithmissingvariables AT deckeranna variableimportanceandpredictionmethodsforlongitudinalproblemswithmissingvariables AT cohenmitchell variableimportanceandpredictionmethodsforlongitudinalproblemswithmissingvariables

Variable Importance and Prediction Methods for Longitudinal Problems with Missing Variables

Ejemplares similares