Cargando…

Model independent feature attributions: Shapley values that uncover non-linear dependencies

Shapley values have become increasingly popular in the machine learning literature, thanks to their attractive axiomatisation, flexibility, and uniqueness in satisfying certain notions of ‘fairness’. The flexibility arises from the myriad potential forms of the Shapley value game formulation. Amongs...

Descripción completa

Detalles Bibliográficos
Autores principales: Fryer, Daniel Vidali, Strumke, Inga, Nguyen, Hien
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8189022/
https://www.ncbi.nlm.nih.gov/pubmed/34151001
http://dx.doi.org/10.7717/peerj-cs.582
_version_ 1783705437525245952
author Fryer, Daniel Vidali
Strumke, Inga
Nguyen, Hien
author_facet Fryer, Daniel Vidali
Strumke, Inga
Nguyen, Hien
author_sort Fryer, Daniel Vidali
collection PubMed
description Shapley values have become increasingly popular in the machine learning literature, thanks to their attractive axiomatisation, flexibility, and uniqueness in satisfying certain notions of ‘fairness’. The flexibility arises from the myriad potential forms of the Shapley value game formulation. Amongst the consequences of this flexibility is that there are now many types of Shapley values being discussed, with such variety being a source of potential misunderstanding. To the best of our knowledge, all existing game formulations in the machine learning and statistics literature fall into a category, which we name the model-dependent category of game formulations. In this work, we consider an alternative and novel formulation which leads to the first instance of what we call model-independent Shapley values. These Shapley values use a measure of non-linear dependence as the characteristic function. The strength of these Shapley values is in their ability to uncover and attribute non-linear dependencies amongst features. We introduce and demonstrate the use of the energy distance correlations, affine-invariant distance correlation, and Hilbert–Schmidt independence criterion as Shapley value characteristic functions. In particular, we demonstrate their potential value for exploratory data analysis and model diagnostics. We conclude with an interesting expository application to a medical survey data set.
format Online
Article
Text
id pubmed-8189022
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-81890222021-06-17 Model independent feature attributions: Shapley values that uncover non-linear dependencies Fryer, Daniel Vidali Strumke, Inga Nguyen, Hien PeerJ Comput Sci Artificial Intelligence Shapley values have become increasingly popular in the machine learning literature, thanks to their attractive axiomatisation, flexibility, and uniqueness in satisfying certain notions of ‘fairness’. The flexibility arises from the myriad potential forms of the Shapley value game formulation. Amongst the consequences of this flexibility is that there are now many types of Shapley values being discussed, with such variety being a source of potential misunderstanding. To the best of our knowledge, all existing game formulations in the machine learning and statistics literature fall into a category, which we name the model-dependent category of game formulations. In this work, we consider an alternative and novel formulation which leads to the first instance of what we call model-independent Shapley values. These Shapley values use a measure of non-linear dependence as the characteristic function. The strength of these Shapley values is in their ability to uncover and attribute non-linear dependencies amongst features. We introduce and demonstrate the use of the energy distance correlations, affine-invariant distance correlation, and Hilbert–Schmidt independence criterion as Shapley value characteristic functions. In particular, we demonstrate their potential value for exploratory data analysis and model diagnostics. We conclude with an interesting expository application to a medical survey data set. PeerJ Inc. 2021-06-02 /pmc/articles/PMC8189022/ /pubmed/34151001 http://dx.doi.org/10.7717/peerj-cs.582 Text en ©2021 Fryer et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Artificial Intelligence
Fryer, Daniel Vidali
Strumke, Inga
Nguyen, Hien
Model independent feature attributions: Shapley values that uncover non-linear dependencies
title Model independent feature attributions: Shapley values that uncover non-linear dependencies
title_full Model independent feature attributions: Shapley values that uncover non-linear dependencies
title_fullStr Model independent feature attributions: Shapley values that uncover non-linear dependencies
title_full_unstemmed Model independent feature attributions: Shapley values that uncover non-linear dependencies
title_short Model independent feature attributions: Shapley values that uncover non-linear dependencies
title_sort model independent feature attributions: shapley values that uncover non-linear dependencies
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8189022/
https://www.ncbi.nlm.nih.gov/pubmed/34151001
http://dx.doi.org/10.7717/peerj-cs.582
work_keys_str_mv AT fryerdanielvidali modelindependentfeatureattributionsshapleyvaluesthatuncovernonlineardependencies
AT strumkeinga modelindependentfeatureattributionsshapleyvaluesthatuncovernonlineardependencies
AT nguyenhien modelindependentfeatureattributionsshapleyvaluesthatuncovernonlineardependencies