Cargando…

An AUC-based permutation variable importance measure for random forests

BACKGROUND: The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal...

Descripción completa

Detalles Bibliográficos
Autores principales:	Janitza, Silke, Strobl, Carolin, Boulesteix, Anne-Laure
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3626572/ https://www.ncbi.nlm.nih.gov/pubmed/23560875 http://dx.doi.org/10.1186/1471-2105-14-119

_version_	1782266205067804672
author	Janitza, Silke Strobl, Carolin Boulesteix, Anne-Laure
author_facet	Janitza, Silke Strobl, Carolin Boulesteix, Anne-Laure
author_sort	Janitza, Silke
collection	PubMed
description	BACKGROUND: The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance. RESULTS: We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the new AUC-based permutation VIM outperforms the standard permutation VIM for unbalanced data settings while both permutation VIMs have equal performance for balanced data settings. CONCLUSIONS: The standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R package party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html.
format	Online Article Text
id	pubmed-3626572
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-36265722013-04-23 An AUC-based permutation variable importance measure for random forests Janitza, Silke Strobl, Carolin Boulesteix, Anne-Laure BMC Bioinformatics Methodology Article BACKGROUND: The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance. RESULTS: We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the new AUC-based permutation VIM outperforms the standard permutation VIM for unbalanced data settings while both permutation VIMs have equal performance for balanced data settings. CONCLUSIONS: The standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R package party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html. BioMed Central 2013-04-05 /pmc/articles/PMC3626572/ /pubmed/23560875 http://dx.doi.org/10.1186/1471-2105-14-119 Text en Copyright © 2013 Janitza et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Janitza, Silke Strobl, Carolin Boulesteix, Anne-Laure An AUC-based permutation variable importance measure for random forests
title	An AUC-based permutation variable importance measure for random forests
title_full	An AUC-based permutation variable importance measure for random forests
title_fullStr	An AUC-based permutation variable importance measure for random forests
title_full_unstemmed	An AUC-based permutation variable importance measure for random forests
title_short	An AUC-based permutation variable importance measure for random forests
title_sort	auc-based permutation variable importance measure for random forests
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3626572/ https://www.ncbi.nlm.nih.gov/pubmed/23560875 http://dx.doi.org/10.1186/1471-2105-14-119
work_keys_str_mv	AT janitzasilke anaucbasedpermutationvariableimportancemeasureforrandomforests AT stroblcarolin anaucbasedpermutationvariableimportancemeasureforrandomforests AT boulesteixannelaure anaucbasedpermutationvariableimportancemeasureforrandomforests AT janitzasilke aucbasedpermutationvariableimportancemeasureforrandomforests AT stroblcarolin aucbasedpermutationvariableimportancemeasureforrandomforests AT boulesteixannelaure aucbasedpermutationvariableimportancemeasureforrandomforests

An AUC-based permutation variable importance measure for random forests

Ejemplares similares