Cargando…
Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests
The diversity forest algorithm is an alternative candidate node split sampling scheme that makes innovative complex split procedures in random forests possible. While conventional univariable, binary splitting suffices for obtaining strong predictive performance, new complex split procedures can hel...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Singapore
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8533673/ https://www.ncbi.nlm.nih.gov/pubmed/34723205 http://dx.doi.org/10.1007/s42979-021-00920-1 |
_version_ | 1784587370185621504 |
---|---|
author | Hornung, Roman |
author_facet | Hornung, Roman |
author_sort | Hornung, Roman |
collection | PubMed |
description | The diversity forest algorithm is an alternative candidate node split sampling scheme that makes innovative complex split procedures in random forests possible. While conventional univariable, binary splitting suffices for obtaining strong predictive performance, new complex split procedures can help tackling practically important issues. For example, interactions between features can be exploited effectively by bivariable splitting. With diversity forests, each split is selected from a candidate split set that is sampled in the following way: for [Formula: see text] : (1) sample one split problem; (2) sample a single or few splits from the split problem sampled in (1) and add this or these splits to the candidate split set. The split problems are specifically structured collections of splits that depend on the respective split procedure considered. This sampling scheme makes innovative complex split procedures computationally tangible while avoiding overfitting. Important general properties of the diversity forest algorithm are evaluated empirically using univariable, binary splitting. Based on 220 data sets with binary outcomes, diversity forests are compared with conventional random forests and random forests using extremely randomized trees. It is seen that the split sampling scheme of diversity forests does not impair the predictive performance of random forests and that the performance is quite robust with regard to the specified nsplits value. The recently developed interaction forests are the first diversity forest method that uses a complex split procedure. Interaction forests allow modeling and detecting interactions between features effectively. Further potential complex split procedures are discussed as an outlook. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s42979-021-00920-1. |
format | Online Article Text |
id | pubmed-8533673 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Springer Singapore |
record_format | MEDLINE/PubMed |
spelling | pubmed-85336732021-10-25 Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests Hornung, Roman SN Comput Sci Original Research The diversity forest algorithm is an alternative candidate node split sampling scheme that makes innovative complex split procedures in random forests possible. While conventional univariable, binary splitting suffices for obtaining strong predictive performance, new complex split procedures can help tackling practically important issues. For example, interactions between features can be exploited effectively by bivariable splitting. With diversity forests, each split is selected from a candidate split set that is sampled in the following way: for [Formula: see text] : (1) sample one split problem; (2) sample a single or few splits from the split problem sampled in (1) and add this or these splits to the candidate split set. The split problems are specifically structured collections of splits that depend on the respective split procedure considered. This sampling scheme makes innovative complex split procedures computationally tangible while avoiding overfitting. Important general properties of the diversity forest algorithm are evaluated empirically using univariable, binary splitting. Based on 220 data sets with binary outcomes, diversity forests are compared with conventional random forests and random forests using extremely randomized trees. It is seen that the split sampling scheme of diversity forests does not impair the predictive performance of random forests and that the performance is quite robust with regard to the specified nsplits value. The recently developed interaction forests are the first diversity forest method that uses a complex split procedure. Interaction forests allow modeling and detecting interactions between features effectively. Further potential complex split procedures are discussed as an outlook. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s42979-021-00920-1. Springer Singapore 2021-10-21 2022 /pmc/articles/PMC8533673/ /pubmed/34723205 http://dx.doi.org/10.1007/s42979-021-00920-1 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Original Research Hornung, Roman Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests |
title | Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests |
title_full | Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests |
title_fullStr | Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests |
title_full_unstemmed | Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests |
title_short | Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests |
title_sort | diversity forests: using split sampling to enable innovative complex split procedures in random forests |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8533673/ https://www.ncbi.nlm.nih.gov/pubmed/34723205 http://dx.doi.org/10.1007/s42979-021-00920-1 |
work_keys_str_mv | AT hornungroman diversityforestsusingsplitsamplingtoenableinnovativecomplexsplitproceduresinrandomforests |