Cargando…

Contrast trees and distribution boosting

A method for decision tree induction is presented. Given a set of predictor variables [Formula: see text] and two outcome variables [Formula: see text] and [Formula: see text] associated with each [Formula: see text] , the goal is to identify those values of [Formula: see text] for which the respect...

Descripción completa

Detalles Bibliográficos
Autor principal: Friedman, Jerome H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7474603/
https://www.ncbi.nlm.nih.gov/pubmed/32817416
http://dx.doi.org/10.1073/pnas.1921562117
Descripción
Sumario:A method for decision tree induction is presented. Given a set of predictor variables [Formula: see text] and two outcome variables [Formula: see text] and [Formula: see text] associated with each [Formula: see text] , the goal is to identify those values of [Formula: see text] for which the respective distributions of [Formula: see text] and [Formula: see text] , or selected properties of those distributions such as means or quantiles, are most different. Contrast trees provide a lack-of-fit measure for statistical models of such statistics, or for the complete conditional distribution [Formula: see text] , as a function of [Formula: see text]. They are easily interpreted and can be used as diagnostic tools to reveal and then understand the inaccuracies of models produced by any learning method. A corresponding contrast-boosting strategy is described for remedying any uncovered errors, thereby producing potentially more accurate predictions. This leads to a distribution-boosting strategy for directly estimating the full conditional distribution of [Formula: see text] at each [Formula: see text] under no assumptions concerning its shape, form, or parametric representation.