Cargando…

Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?

In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed...

Descripción completa

Detalles Bibliográficos
Autores principales: Austin, Peter C, Lee, Douglas S, Steyerberg, Ewout W, Tu, Jack V
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Blackwell Publishing Ltd 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3470596/
https://www.ncbi.nlm.nih.gov/pubmed/22777999
http://dx.doi.org/10.1002/bimj.201100251
_version_ 1782246301103030272
author Austin, Peter C
Lee, Douglas S
Steyerberg, Ewout W
Tu, Jack V
author_facet Austin, Peter C
Lee, Douglas S
Steyerberg, Ewout W
Tu, Jack V
author_sort Austin, Peter C
collection PubMed
description In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease.
format Online
Article
Text
id pubmed-3470596
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Blackwell Publishing Ltd
record_format MEDLINE/PubMed
spelling pubmed-34705962012-10-12 Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods? Austin, Peter C Lee, Douglas S Steyerberg, Ewout W Tu, Jack V Biom J Performance of Prediction In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. Blackwell Publishing Ltd 2012-09 2012-07-06 /pmc/articles/PMC3470596/ /pubmed/22777999 http://dx.doi.org/10.1002/bimj.201100251 Text en © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim http://creativecommons.org/licenses/by/2.5/ Re-use of this article is permitted in accordance with the Creative Commons Deed, Attribution 2.5, which does not permit commercial exploitation.
spellingShingle Performance of Prediction
Austin, Peter C
Lee, Douglas S
Steyerberg, Ewout W
Tu, Jack V
Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?
title Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?
title_full Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?
title_fullStr Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?
title_full_unstemmed Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?
title_short Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?
title_sort regression trees for predicting mortality in patients with cardiovascular disease: what improvement is achieved by using ensemble-based methods?
topic Performance of Prediction
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3470596/
https://www.ncbi.nlm.nih.gov/pubmed/22777999
http://dx.doi.org/10.1002/bimj.201100251
work_keys_str_mv AT austinpeterc regressiontreesforpredictingmortalityinpatientswithcardiovasculardiseasewhatimprovementisachievedbyusingensemblebasedmethods
AT leedouglass regressiontreesforpredictingmortalityinpatientswithcardiovasculardiseasewhatimprovementisachievedbyusingensemblebasedmethods
AT steyerbergewoutw regressiontreesforpredictingmortalityinpatientswithcardiovasculardiseasewhatimprovementisachievedbyusingensemblebasedmethods
AT tujackv regressiontreesforpredictingmortalityinpatientswithcardiovasculardiseasewhatimprovementisachievedbyusingensemblebasedmethods