Cargando…

GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles

SHapley Additive exPlanation (SHAP) values (Lundberg & Lee, 2017) provide a game theoretic interpretation of the predictions of machine learning models based on Shapley values (Shapley, 1953). While exact calculation of SHAP values is computationally intractable in general, a recursive polynomia...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mitchell, Rory, Frank, Eibe, Holmes, Geoffrey
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2022
Materias:	Algorithms and Analysis of Algorithms
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9044362/ https://www.ncbi.nlm.nih.gov/pubmed/35494875 http://dx.doi.org/10.7717/peerj-cs.880

_version_	1784695089346379776
author	Mitchell, Rory Frank, Eibe Holmes, Geoffrey
author_facet	Mitchell, Rory Frank, Eibe Holmes, Geoffrey
author_sort	Mitchell, Rory
collection	PubMed
description	SHapley Additive exPlanation (SHAP) values (Lundberg & Lee, 2017) provide a game theoretic interpretation of the predictions of machine learning models based on Shapley values (Shapley, 1953). While exact calculation of SHAP values is computationally intractable in general, a recursive polynomial-time algorithm called TreeShap (Lundberg et al., 2020) is available for decision tree models. However, despite its polynomial time complexity, TreeShap can become a significant bottleneck in practical machine learning pipelines when applied to large decision tree ensembles. Unfortunately, the complicated TreeShap algorithm is difficult to map to hardware accelerators such as GPUs. In this work, we present GPUTreeShap, a reformulated TreeShap algorithm suitable for massively parallel computation on graphics processing units. Our approach first preprocesses each decision tree to isolate variable sized sub-problems from the original recursive algorithm, then solves a bin packing problem, and finally maps sub-problems to single-instruction, multiple-thread (SIMT) tasks for parallel execution with specialised hardware instructions. With a single NVIDIA Tesla V100-32 GPU, we achieve speedups of up to 19× for SHAP values, and speedups of up to 340× for SHAP interaction values, over a state-of-the-art multi-core CPU implementation executed on two 20-core Xeon E5-2698 v4 2.2 GHz CPUs. We also experiment with multi-GPU computing using eight V100 GPUs, demonstrating throughput of 1.2 M rows per second—equivalent CPU-based performance is estimated to require 6850 CPU cores.
format	Online Article Text
id	pubmed-9044362
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-90443622022-04-28 GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles Mitchell, Rory Frank, Eibe Holmes, Geoffrey PeerJ Comput Sci Algorithms and Analysis of Algorithms SHapley Additive exPlanation (SHAP) values (Lundberg & Lee, 2017) provide a game theoretic interpretation of the predictions of machine learning models based on Shapley values (Shapley, 1953). While exact calculation of SHAP values is computationally intractable in general, a recursive polynomial-time algorithm called TreeShap (Lundberg et al., 2020) is available for decision tree models. However, despite its polynomial time complexity, TreeShap can become a significant bottleneck in practical machine learning pipelines when applied to large decision tree ensembles. Unfortunately, the complicated TreeShap algorithm is difficult to map to hardware accelerators such as GPUs. In this work, we present GPUTreeShap, a reformulated TreeShap algorithm suitable for massively parallel computation on graphics processing units. Our approach first preprocesses each decision tree to isolate variable sized sub-problems from the original recursive algorithm, then solves a bin packing problem, and finally maps sub-problems to single-instruction, multiple-thread (SIMT) tasks for parallel execution with specialised hardware instructions. With a single NVIDIA Tesla V100-32 GPU, we achieve speedups of up to 19× for SHAP values, and speedups of up to 340× for SHAP interaction values, over a state-of-the-art multi-core CPU implementation executed on two 20-core Xeon E5-2698 v4 2.2 GHz CPUs. We also experiment with multi-GPU computing using eight V100 GPUs, demonstrating throughput of 1.2 M rows per second—equivalent CPU-based performance is estimated to require 6850 CPU cores. PeerJ Inc. 2022-04-05 /pmc/articles/PMC9044362/ /pubmed/35494875 http://dx.doi.org/10.7717/peerj-cs.880 Text en © 2022 Mitchell et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Algorithms and Analysis of Algorithms Mitchell, Rory Frank, Eibe Holmes, Geoffrey GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles
title	GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles
title_full	GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles
title_fullStr	GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles
title_full_unstemmed	GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles
title_short	GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles
title_sort	gputreeshap: massively parallel exact calculation of shap scores for tree ensembles
topic	Algorithms and Analysis of Algorithms
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9044362/ https://www.ncbi.nlm.nih.gov/pubmed/35494875 http://dx.doi.org/10.7717/peerj-cs.880
work_keys_str_mv	AT mitchellrory gputreeshapmassivelyparallelexactcalculationofshapscoresfortreeensembles AT frankeibe gputreeshapmassivelyparallelexactcalculationofshapscoresfortreeensembles AT holmesgeoffrey gputreeshapmassivelyparallelexactcalculationofshapscoresfortreeensembles

GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles

Ejemplares similares