Cargando…

Alternative stopping rules to limit tree expansion for random forest models

Random forests are a popular type of machine learning model, which are relatively robust to overfitting, unlike some other machine learning models, and adequately capture non-linear relationships between an outcome of interest and multiple independent variables. There are relatively few adjustable h...

Descripción completa

Detalles Bibliográficos
Autores principales:	Little, Mark P., Rosenberg, Philip S., Arsham, Aryana
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9448733/ https://www.ncbi.nlm.nih.gov/pubmed/36068261 http://dx.doi.org/10.1038/s41598-022-19281-7

_version_	1784784131145596928
author	Little, Mark P. Rosenberg, Philip S. Arsham, Aryana
author_facet	Little, Mark P. Rosenberg, Philip S. Arsham, Aryana
author_sort	Little, Mark P.
collection	PubMed
description	Random forests are a popular type of machine learning model, which are relatively robust to overfitting, unlike some other machine learning models, and adequately capture non-linear relationships between an outcome of interest and multiple independent variables. There are relatively few adjustable hyperparameters in the standard random forest models, among them the minimum size of the terminal nodes on each tree. The usual stopping rule, as proposed by Breiman, stops tree expansion by limiting the size of the parent nodes, so that a node cannot be split if it has less than a specified number of observations. Recently an alternative stopping criterion has been proposed, stopping tree expansion so that all terminal nodes have at least a minimum number of observations. The present paper proposes three generalisations of this idea, limiting the growth in regression random forests, based on the variance, range, or inter-centile range. The new approaches are applied to diabetes data obtained from the National Health and Nutrition Examination Survey and four other datasets (Tasmanian Abalone data, Boston Housing crime rate data, Los Angeles ozone concentration data, MIT servo data). Empirical analysis presented herein demonstrate that the new stopping rules yield competitive mean square prediction error to standard random forest models. In general, use of the intercentile range statistic to control tree expansion yields much less variation in mean square prediction error, and mean square prediction error is also closer to the optimal. The Fortran code developed is provided in the Supplementary Material.
format	Online Article Text
id	pubmed-9448733
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-94487332022-09-08 Alternative stopping rules to limit tree expansion for random forest models Little, Mark P. Rosenberg, Philip S. Arsham, Aryana Sci Rep Article Random forests are a popular type of machine learning model, which are relatively robust to overfitting, unlike some other machine learning models, and adequately capture non-linear relationships between an outcome of interest and multiple independent variables. There are relatively few adjustable hyperparameters in the standard random forest models, among them the minimum size of the terminal nodes on each tree. The usual stopping rule, as proposed by Breiman, stops tree expansion by limiting the size of the parent nodes, so that a node cannot be split if it has less than a specified number of observations. Recently an alternative stopping criterion has been proposed, stopping tree expansion so that all terminal nodes have at least a minimum number of observations. The present paper proposes three generalisations of this idea, limiting the growth in regression random forests, based on the variance, range, or inter-centile range. The new approaches are applied to diabetes data obtained from the National Health and Nutrition Examination Survey and four other datasets (Tasmanian Abalone data, Boston Housing crime rate data, Los Angeles ozone concentration data, MIT servo data). Empirical analysis presented herein demonstrate that the new stopping rules yield competitive mean square prediction error to standard random forest models. In general, use of the intercentile range statistic to control tree expansion yields much less variation in mean square prediction error, and mean square prediction error is also closer to the optimal. The Fortran code developed is provided in the Supplementary Material. Nature Publishing Group UK 2022-09-06 /pmc/articles/PMC9448733/ /pubmed/36068261 http://dx.doi.org/10.1038/s41598-022-19281-7 Text en © This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Little, Mark P. Rosenberg, Philip S. Arsham, Aryana Alternative stopping rules to limit tree expansion for random forest models
title	Alternative stopping rules to limit tree expansion for random forest models
title_full	Alternative stopping rules to limit tree expansion for random forest models
title_fullStr	Alternative stopping rules to limit tree expansion for random forest models
title_full_unstemmed	Alternative stopping rules to limit tree expansion for random forest models
title_short	Alternative stopping rules to limit tree expansion for random forest models
title_sort	alternative stopping rules to limit tree expansion for random forest models
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9448733/ https://www.ncbi.nlm.nih.gov/pubmed/36068261 http://dx.doi.org/10.1038/s41598-022-19281-7
work_keys_str_mv	AT littlemarkp alternativestoppingrulestolimittreeexpansionforrandomforestmodels AT rosenbergphilips alternativestoppingrulestolimittreeexpansionforrandomforestmodels AT arshamaryana alternativestoppingrulestolimittreeexpansionforrandomforestmodels

Alternative stopping rules to limit tree expansion for random forest models

Ejemplares similares