Cargando…

Clinical risk prediction with random forests for survival, longitudinal, and multivariate (RF-SLAM) data analysis

BACKGROUND: Clinical research and medical practice can be advanced through the prediction of an individual’s health state, trajectory, and responses to treatments. However, the majority of current clinical risk prediction models are based on regression approaches or machine learning algorithms that...

Descripción completa

Detalles Bibliográficos
Autores principales: Wongvibulsin, Shannon, Wu, Katherine C., Zeger, Scott L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6937754/
https://www.ncbi.nlm.nih.gov/pubmed/31888507
http://dx.doi.org/10.1186/s12874-019-0863-0
_version_ 1783483928002166784
author Wongvibulsin, Shannon
Wu, Katherine C.
Zeger, Scott L.
author_facet Wongvibulsin, Shannon
Wu, Katherine C.
Zeger, Scott L.
author_sort Wongvibulsin, Shannon
collection PubMed
description BACKGROUND: Clinical research and medical practice can be advanced through the prediction of an individual’s health state, trajectory, and responses to treatments. However, the majority of current clinical risk prediction models are based on regression approaches or machine learning algorithms that are static, rather than dynamic. To benefit from the increasing emergence of large, heterogeneous data sets, such as electronic health records (EHRs), novel tools to support improved clinical decision making through methods for individual-level risk prediction that can handle multiple variables, their interactions, and time-varying values are necessary. METHODS: We introduce a novel dynamic approach to clinical risk prediction for survival, longitudinal, and multivariate (SLAM) outcomes, called random forest for SLAM data analysis (RF-SLAM). RF-SLAM is a continuous-time, random forest method for survival analysis that combines the strengths of existing statistical and machine learning methods to produce individualized Bayes estimates of piecewise-constant hazard rates. We also present a method-agnostic approach for time-varying evaluation of model performance. RESULTS: We derive and illustrate the method by predicting sudden cardiac arrest (SCA) in the Left Ventricular Structural (LV) Predictors of Sudden Cardiac Death (SCD) Registry. We demonstrate superior performance relative to standard random forest methods for survival data. We illustrate the importance of the number of preceding heart failure hospitalizations as a time-dependent predictor in SCA risk assessment. CONCLUSIONS: RF-SLAM is a novel statistical and machine learning method that improves risk prediction by incorporating time-varying information and accommodating a large number of predictors, their interactions, and missing values. RF-SLAM is designed to easily extend to simultaneous predictions of multiple, possibly competing, events and/or repeated measurements of discrete or continuous variables over time.Trial registration: LV Structural Predictors of SCD Registry (clinicaltrials.gov, NCT01076660), retrospectively registered 25 February 2010
format Online
Article
Text
id pubmed-6937754
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69377542019-12-31 Clinical risk prediction with random forests for survival, longitudinal, and multivariate (RF-SLAM) data analysis Wongvibulsin, Shannon Wu, Katherine C. Zeger, Scott L. BMC Med Res Methodol Technical Advance BACKGROUND: Clinical research and medical practice can be advanced through the prediction of an individual’s health state, trajectory, and responses to treatments. However, the majority of current clinical risk prediction models are based on regression approaches or machine learning algorithms that are static, rather than dynamic. To benefit from the increasing emergence of large, heterogeneous data sets, such as electronic health records (EHRs), novel tools to support improved clinical decision making through methods for individual-level risk prediction that can handle multiple variables, their interactions, and time-varying values are necessary. METHODS: We introduce a novel dynamic approach to clinical risk prediction for survival, longitudinal, and multivariate (SLAM) outcomes, called random forest for SLAM data analysis (RF-SLAM). RF-SLAM is a continuous-time, random forest method for survival analysis that combines the strengths of existing statistical and machine learning methods to produce individualized Bayes estimates of piecewise-constant hazard rates. We also present a method-agnostic approach for time-varying evaluation of model performance. RESULTS: We derive and illustrate the method by predicting sudden cardiac arrest (SCA) in the Left Ventricular Structural (LV) Predictors of Sudden Cardiac Death (SCD) Registry. We demonstrate superior performance relative to standard random forest methods for survival data. We illustrate the importance of the number of preceding heart failure hospitalizations as a time-dependent predictor in SCA risk assessment. CONCLUSIONS: RF-SLAM is a novel statistical and machine learning method that improves risk prediction by incorporating time-varying information and accommodating a large number of predictors, their interactions, and missing values. RF-SLAM is designed to easily extend to simultaneous predictions of multiple, possibly competing, events and/or repeated measurements of discrete or continuous variables over time.Trial registration: LV Structural Predictors of SCD Registry (clinicaltrials.gov, NCT01076660), retrospectively registered 25 February 2010 BioMed Central 2019-12-31 /pmc/articles/PMC6937754/ /pubmed/31888507 http://dx.doi.org/10.1186/s12874-019-0863-0 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Technical Advance
Wongvibulsin, Shannon
Wu, Katherine C.
Zeger, Scott L.
Clinical risk prediction with random forests for survival, longitudinal, and multivariate (RF-SLAM) data analysis
title Clinical risk prediction with random forests for survival, longitudinal, and multivariate (RF-SLAM) data analysis
title_full Clinical risk prediction with random forests for survival, longitudinal, and multivariate (RF-SLAM) data analysis
title_fullStr Clinical risk prediction with random forests for survival, longitudinal, and multivariate (RF-SLAM) data analysis
title_full_unstemmed Clinical risk prediction with random forests for survival, longitudinal, and multivariate (RF-SLAM) data analysis
title_short Clinical risk prediction with random forests for survival, longitudinal, and multivariate (RF-SLAM) data analysis
title_sort clinical risk prediction with random forests for survival, longitudinal, and multivariate (rf-slam) data analysis
topic Technical Advance
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6937754/
https://www.ncbi.nlm.nih.gov/pubmed/31888507
http://dx.doi.org/10.1186/s12874-019-0863-0
work_keys_str_mv AT wongvibulsinshannon clinicalriskpredictionwithrandomforestsforsurvivallongitudinalandmultivariaterfslamdataanalysis
AT wukatherinec clinicalriskpredictionwithrandomforestsforsurvivallongitudinalandmultivariaterfslamdataanalysis
AT zegerscottl clinicalriskpredictionwithrandomforestsforsurvivallongitudinalandmultivariaterfslamdataanalysis