Cargando…

Interpretable machine learning with tree-based shapley additive explanations: Application to metabolomics datasets for binary classification

Machine learning (ML) models are used in clinical metabolomics studies most notably for biomarker discoveries, to identify metabolites that discriminate between a case and control group. To improve understanding of the underlying biomedical problem and to bolster confidence in these discoveries, mod...

Descripción completa

Detalles Bibliográficos
Autor principal:	Bifarin, Olatomiwa O.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10159207/ https://www.ncbi.nlm.nih.gov/pubmed/37141218 http://dx.doi.org/10.1371/journal.pone.0284315

_version_	1785037086561140736
author	Bifarin, Olatomiwa O.
author_facet	Bifarin, Olatomiwa O.
author_sort	Bifarin, Olatomiwa O.
collection	PubMed
description	Machine learning (ML) models are used in clinical metabolomics studies most notably for biomarker discoveries, to identify metabolites that discriminate between a case and control group. To improve understanding of the underlying biomedical problem and to bolster confidence in these discoveries, model interpretability is germane. In metabolomics, partial least square discriminant analysis (PLS-DA) and its variants are widely used, partly due to the model’s interpretability with the Variable Influence in Projection (VIP) scores, a global interpretable method. Herein, Tree-based Shapley Additive explanations (SHAP), an interpretable ML method grounded in game theory, was used to explain ML models with local explanation properties. In this study, ML experiments (binary classification) were conducted for three published metabolomics datasets using PLS-DA, random forests, gradient boosting, and extreme gradient boosting (XGBoost). Using one of the datasets, PLS-DA model was explained using VIP scores, while one of the best-performing models, a random forest model, was interpreted using Tree SHAP. The results show that SHAP has a more explanation depth than PLS-DA’s VIP, making it a powerful method for rationalizing machine learning predictions from metabolomics studies.
format	Online Article Text
id	pubmed-10159207
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-101592072023-05-05 Interpretable machine learning with tree-based shapley additive explanations: Application to metabolomics datasets for binary classification Bifarin, Olatomiwa O. PLoS One Research Article Machine learning (ML) models are used in clinical metabolomics studies most notably for biomarker discoveries, to identify metabolites that discriminate between a case and control group. To improve understanding of the underlying biomedical problem and to bolster confidence in these discoveries, model interpretability is germane. In metabolomics, partial least square discriminant analysis (PLS-DA) and its variants are widely used, partly due to the model’s interpretability with the Variable Influence in Projection (VIP) scores, a global interpretable method. Herein, Tree-based Shapley Additive explanations (SHAP), an interpretable ML method grounded in game theory, was used to explain ML models with local explanation properties. In this study, ML experiments (binary classification) were conducted for three published metabolomics datasets using PLS-DA, random forests, gradient boosting, and extreme gradient boosting (XGBoost). Using one of the datasets, PLS-DA model was explained using VIP scores, while one of the best-performing models, a random forest model, was interpreted using Tree SHAP. The results show that SHAP has a more explanation depth than PLS-DA’s VIP, making it a powerful method for rationalizing machine learning predictions from metabolomics studies. Public Library of Science 2023-05-04 /pmc/articles/PMC10159207/ /pubmed/37141218 http://dx.doi.org/10.1371/journal.pone.0284315 Text en © 2023 Olatomiwa O. Bifarin https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Bifarin, Olatomiwa O. Interpretable machine learning with tree-based shapley additive explanations: Application to metabolomics datasets for binary classification
title	Interpretable machine learning with tree-based shapley additive explanations: Application to metabolomics datasets for binary classification
title_full	Interpretable machine learning with tree-based shapley additive explanations: Application to metabolomics datasets for binary classification
title_fullStr	Interpretable machine learning with tree-based shapley additive explanations: Application to metabolomics datasets for binary classification
title_full_unstemmed	Interpretable machine learning with tree-based shapley additive explanations: Application to metabolomics datasets for binary classification
title_short	Interpretable machine learning with tree-based shapley additive explanations: Application to metabolomics datasets for binary classification
title_sort	interpretable machine learning with tree-based shapley additive explanations: application to metabolomics datasets for binary classification
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10159207/ https://www.ncbi.nlm.nih.gov/pubmed/37141218 http://dx.doi.org/10.1371/journal.pone.0284315
work_keys_str_mv	AT bifarinolatomiwao interpretablemachinelearningwithtreebasedshapleyadditiveexplanationsapplicationtometabolomicsdatasetsforbinaryclassification

Interpretable machine learning with tree-based shapley additive explanations: Application to metabolomics datasets for binary classification

Ejemplares similares