Cargando…

Random forest of perfect trees: concept, performance, applications and perspectives

MOTIVATION: The principle of Breiman's random forest (RF) is to build and assemble complementary classification trees in a way that maximizes their variability. We propose a new type of random forest that disobeys Breiman’s principles and involves building trees with no classification errors in...

Descripción completa

Detalles Bibliográficos
Autores principales: Nguyen, Jean-Michel, Jézéquel, Pascal, Gillois, Pierre, Silva, Luisa, Ben Azzouz, Faouda, Lambert-Lacroix, Sophie, Juin, Philippe, Campone, Mario, Gaultier, Aurélie, Moreau-Gaudry, Alexandre, Antonioli, Daniel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8352507/
https://www.ncbi.nlm.nih.gov/pubmed/33523112
http://dx.doi.org/10.1093/bioinformatics/btab074
Descripción
Sumario:MOTIVATION: The principle of Breiman's random forest (RF) is to build and assemble complementary classification trees in a way that maximizes their variability. We propose a new type of random forest that disobeys Breiman’s principles and involves building trees with no classification errors in very large quantities. We used a new type of decision tree that uses a neuron at each node as well as an in-innovative half Christmas tree structure. With these new RFs, we developed a score, based on a family of ten new statistical information criteria, called Nguyen information criteria (NICs), to evaluate the predictive qualities of features in three dimensions. RESULTS: The first NIC allowed the Akaike information criterion to be minimized more quickly than data obtained with the Gini index when the features were introduced in a logistic regression model. The selected features based on the NICScore showed a slight advantage compared to the support vector machines—recursive feature elimination (SVM-RFE) method. We demonstrate that the inclusion of artificial neurons in tree nodes allows a large number of classifiers in the same node to be taken into account simultaneously and results in perfect trees without classification errors. AVAILABILITY AND IMPLEMENTATION: The methods used to build the perfect trees in this article were implemented in the ‘ROP’ R package, archived at https://cran.r-project.org/web/packages/ROP/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.