Cargando…

Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests

The diversity forest algorithm is an alternative candidate node split sampling scheme that makes innovative complex split procedures in random forests possible. While conventional univariable, binary splitting suffices for obtaining strong predictive performance, new complex split procedures can hel...

Descripción completa

Detalles Bibliográficos
Autor principal: Hornung, Roman
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Singapore 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8533673/
https://www.ncbi.nlm.nih.gov/pubmed/34723205
http://dx.doi.org/10.1007/s42979-021-00920-1
_version_ 1784587370185621504
author Hornung, Roman
author_facet Hornung, Roman
author_sort Hornung, Roman
collection PubMed
description The diversity forest algorithm is an alternative candidate node split sampling scheme that makes innovative complex split procedures in random forests possible. While conventional univariable, binary splitting suffices for obtaining strong predictive performance, new complex split procedures can help tackling practically important issues. For example, interactions between features can be exploited effectively by bivariable splitting. With diversity forests, each split is selected from a candidate split set that is sampled in the following way: for [Formula: see text] : (1) sample one split problem; (2) sample a single or few splits from the split problem sampled in (1) and add this or these splits to the candidate split set. The split problems are specifically structured collections of splits that depend on the respective split procedure considered. This sampling scheme makes innovative complex split procedures computationally tangible while avoiding overfitting. Important general properties of the diversity forest algorithm are evaluated empirically using univariable, binary splitting. Based on 220 data sets with binary outcomes, diversity forests are compared with conventional random forests and random forests using extremely randomized trees. It is seen that the split sampling scheme of diversity forests does not impair the predictive performance of random forests and that the performance is quite robust with regard to the specified nsplits value. The recently developed interaction forests are the first diversity forest method that uses a complex split procedure. Interaction forests allow modeling and detecting interactions between features effectively. Further potential complex split procedures are discussed as an outlook. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s42979-021-00920-1.
format Online
Article
Text
id pubmed-8533673
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer Singapore
record_format MEDLINE/PubMed
spelling pubmed-85336732021-10-25 Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests Hornung, Roman SN Comput Sci Original Research The diversity forest algorithm is an alternative candidate node split sampling scheme that makes innovative complex split procedures in random forests possible. While conventional univariable, binary splitting suffices for obtaining strong predictive performance, new complex split procedures can help tackling practically important issues. For example, interactions between features can be exploited effectively by bivariable splitting. With diversity forests, each split is selected from a candidate split set that is sampled in the following way: for [Formula: see text] : (1) sample one split problem; (2) sample a single or few splits from the split problem sampled in (1) and add this or these splits to the candidate split set. The split problems are specifically structured collections of splits that depend on the respective split procedure considered. This sampling scheme makes innovative complex split procedures computationally tangible while avoiding overfitting. Important general properties of the diversity forest algorithm are evaluated empirically using univariable, binary splitting. Based on 220 data sets with binary outcomes, diversity forests are compared with conventional random forests and random forests using extremely randomized trees. It is seen that the split sampling scheme of diversity forests does not impair the predictive performance of random forests and that the performance is quite robust with regard to the specified nsplits value. The recently developed interaction forests are the first diversity forest method that uses a complex split procedure. Interaction forests allow modeling and detecting interactions between features effectively. Further potential complex split procedures are discussed as an outlook. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s42979-021-00920-1. Springer Singapore 2021-10-21 2022 /pmc/articles/PMC8533673/ /pubmed/34723205 http://dx.doi.org/10.1007/s42979-021-00920-1 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Original Research
Hornung, Roman
Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests
title Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests
title_full Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests
title_fullStr Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests
title_full_unstemmed Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests
title_short Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests
title_sort diversity forests: using split sampling to enable innovative complex split procedures in random forests
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8533673/
https://www.ncbi.nlm.nih.gov/pubmed/34723205
http://dx.doi.org/10.1007/s42979-021-00920-1
work_keys_str_mv AT hornungroman diversityforestsusingsplitsamplingtoenableinnovativecomplexsplitproceduresinrandomforests