Cargando…

Random generalized linear model: a highly accurate and interpretable ensemble predictor

BACKGROUND: Ensemble predictors such as the random forest are known to have superior accuracy but their black-box predictions are difficult to interpret. In contrast, a generalized linear model (GLM) is very interpretable especially when forward feature selection is used to construct the model. Howe...

Descripción completa

Detalles Bibliográficos
Autores principales:	Song, Lin, Langfelder, Peter, Horvath, Steve
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3645958/ https://www.ncbi.nlm.nih.gov/pubmed/23323760 http://dx.doi.org/10.1186/1471-2105-14-5

_version_	1782268544642187264
author	Song, Lin Langfelder, Peter Horvath, Steve
author_facet	Song, Lin Langfelder, Peter Horvath, Steve
author_sort	Song, Lin
collection	PubMed
description	BACKGROUND: Ensemble predictors such as the random forest are known to have superior accuracy but their black-box predictions are difficult to interpret. In contrast, a generalized linear model (GLM) is very interpretable especially when forward feature selection is used to construct the model. However, forward feature selection tends to overfit the data and leads to low predictive accuracy. Therefore, it remains an important research goal to combine the advantages of ensemble predictors (high accuracy) with the advantages of forward regression modeling (interpretability). To address this goal several articles have explored GLM based ensemble predictors. Since limited evaluations suggested that these ensemble predictors were less accurate than alternative predictors, they have found little attention in the literature. RESULTS: Comprehensive evaluations involving hundreds of genomic data sets, the UCI machine learning benchmark data, and simulations are used to give GLM based ensemble predictors a new and careful look. A novel bootstrap aggregated (bagged) GLM predictor that incorporates several elements of randomness and instability (random subspace method, optional interaction terms, forward variable selection) often outperforms a host of alternative prediction methods including random forests and penalized regression models (ridge regression, elastic net, lasso). This random generalized linear model (RGLM) predictor provides variable importance measures that can be used to define a “thinned” ensemble predictor (involving few features) that retains excellent predictive accuracy. CONCLUSION: RGLM is a state of the art predictor that shares the advantages of a random forest (excellent predictive accuracy, feature importance measures, out-of-bag estimates of accuracy) with those of a forward selected generalized linear model (interpretability). These methods are implemented in the freely available R software package randomGLM.
format	Online Article Text
id	pubmed-3645958
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-36459582013-05-09 Random generalized linear model: a highly accurate and interpretable ensemble predictor Song, Lin Langfelder, Peter Horvath, Steve BMC Bioinformatics Methodology Article BACKGROUND: Ensemble predictors such as the random forest are known to have superior accuracy but their black-box predictions are difficult to interpret. In contrast, a generalized linear model (GLM) is very interpretable especially when forward feature selection is used to construct the model. However, forward feature selection tends to overfit the data and leads to low predictive accuracy. Therefore, it remains an important research goal to combine the advantages of ensemble predictors (high accuracy) with the advantages of forward regression modeling (interpretability). To address this goal several articles have explored GLM based ensemble predictors. Since limited evaluations suggested that these ensemble predictors were less accurate than alternative predictors, they have found little attention in the literature. RESULTS: Comprehensive evaluations involving hundreds of genomic data sets, the UCI machine learning benchmark data, and simulations are used to give GLM based ensemble predictors a new and careful look. A novel bootstrap aggregated (bagged) GLM predictor that incorporates several elements of randomness and instability (random subspace method, optional interaction terms, forward variable selection) often outperforms a host of alternative prediction methods including random forests and penalized regression models (ridge regression, elastic net, lasso). This random generalized linear model (RGLM) predictor provides variable importance measures that can be used to define a “thinned” ensemble predictor (involving few features) that retains excellent predictive accuracy. CONCLUSION: RGLM is a state of the art predictor that shares the advantages of a random forest (excellent predictive accuracy, feature importance measures, out-of-bag estimates of accuracy) with those of a forward selected generalized linear model (interpretability). These methods are implemented in the freely available R software package randomGLM. BioMed Central 2013-01-16 /pmc/articles/PMC3645958/ /pubmed/23323760 http://dx.doi.org/10.1186/1471-2105-14-5 Text en Copyright © 2013 Song et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Song, Lin Langfelder, Peter Horvath, Steve Random generalized linear model: a highly accurate and interpretable ensemble predictor
title	Random generalized linear model: a highly accurate and interpretable ensemble predictor
title_full	Random generalized linear model: a highly accurate and interpretable ensemble predictor
title_fullStr	Random generalized linear model: a highly accurate and interpretable ensemble predictor
title_full_unstemmed	Random generalized linear model: a highly accurate and interpretable ensemble predictor
title_short	Random generalized linear model: a highly accurate and interpretable ensemble predictor
title_sort	random generalized linear model: a highly accurate and interpretable ensemble predictor
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3645958/ https://www.ncbi.nlm.nih.gov/pubmed/23323760 http://dx.doi.org/10.1186/1471-2105-14-5
work_keys_str_mv	AT songlin randomgeneralizedlinearmodelahighlyaccurateandinterpretableensemblepredictor AT langfelderpeter randomgeneralizedlinearmodelahighlyaccurateandinterpretableensemblepredictor AT horvathsteve randomgeneralizedlinearmodelahighlyaccurateandinterpretableensemblepredictor

Random generalized linear model: a highly accurate and interpretable ensemble predictor

Ejemplares similares