Cargando…

Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series

Tree-based machine learning methods have gained traction in the statistical and data science fields. They have been shown to provide better solutions to various research questions than traditional analysis approaches. To encourage the uptake of tree-based methods in health research, we review the me...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Liangyuan, Li, Lihua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9736500/
https://www.ncbi.nlm.nih.gov/pubmed/36498153
http://dx.doi.org/10.3390/ijerph192316080
_version_ 1784847044750344192
author Hu, Liangyuan
Li, Lihua
author_facet Hu, Liangyuan
Li, Lihua
author_sort Hu, Liangyuan
collection PubMed
description Tree-based machine learning methods have gained traction in the statistical and data science fields. They have been shown to provide better solutions to various research questions than traditional analysis approaches. To encourage the uptake of tree-based methods in health research, we review the methodological fundamentals of three key tree-based machine learning methods: random forests, extreme gradient boosting and Bayesian additive regression trees. We further conduct a series of case studies to illustrate how these methods can be properly used to solve important health research problems in four domains: variable selection, estimation of causal effects, propensity score weighting and missing data. We exposit that the central idea of using ensemble tree methods for these research questions is accurate prediction via flexible modeling. We applied ensemble trees methods to select important predictors for the presence of postoperative respiratory complication among early stage lung cancer patients with resectable tumors. We then demonstrated how to use these methods to estimate the causal effects of popular surgical approaches on postoperative respiratory complications among lung cancer patients. Using the same data, we further implemented the methods to accurately estimate the inverse probability weights for a propensity score analysis of the comparative effectiveness of the surgical approaches. Finally, we demonstrated how random forests can be used to impute missing data using the Study of Women’s Health Across the Nation data set. To conclude, the tree-based methods are a flexible tool and should be properly used for health investigations.
format Online
Article
Text
id pubmed-9736500
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-97365002022-12-11 Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series Hu, Liangyuan Li, Lihua Int J Environ Res Public Health Review Tree-based machine learning methods have gained traction in the statistical and data science fields. They have been shown to provide better solutions to various research questions than traditional analysis approaches. To encourage the uptake of tree-based methods in health research, we review the methodological fundamentals of three key tree-based machine learning methods: random forests, extreme gradient boosting and Bayesian additive regression trees. We further conduct a series of case studies to illustrate how these methods can be properly used to solve important health research problems in four domains: variable selection, estimation of causal effects, propensity score weighting and missing data. We exposit that the central idea of using ensemble tree methods for these research questions is accurate prediction via flexible modeling. We applied ensemble trees methods to select important predictors for the presence of postoperative respiratory complication among early stage lung cancer patients with resectable tumors. We then demonstrated how to use these methods to estimate the causal effects of popular surgical approaches on postoperative respiratory complications among lung cancer patients. Using the same data, we further implemented the methods to accurately estimate the inverse probability weights for a propensity score analysis of the comparative effectiveness of the surgical approaches. Finally, we demonstrated how random forests can be used to impute missing data using the Study of Women’s Health Across the Nation data set. To conclude, the tree-based methods are a flexible tool and should be properly used for health investigations. MDPI 2022-12-01 /pmc/articles/PMC9736500/ /pubmed/36498153 http://dx.doi.org/10.3390/ijerph192316080 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Review
Hu, Liangyuan
Li, Lihua
Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series
title Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series
title_full Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series
title_fullStr Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series
title_full_unstemmed Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series
title_short Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series
title_sort using tree-based machine learning for health studies: literature review and case series
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9736500/
https://www.ncbi.nlm.nih.gov/pubmed/36498153
http://dx.doi.org/10.3390/ijerph192316080
work_keys_str_mv AT huliangyuan usingtreebasedmachinelearningforhealthstudiesliteraturereviewandcaseseries
AT lilihua usingtreebasedmachinelearningforhealthstudiesliteraturereviewandcaseseries