Cargando…

Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations

Machine learning methods are widely used within the medical field. However, the reliability and efficacy of these models is difficult to assess, making it difficult for researchers to identify which machine-learning model to apply to their dataset. We assessed whether variance calculations of model...

Descripción completa

Detalles Bibliográficos
Autores principales:	Huang, Alexander A., Huang, Samuel Y.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9949629/ https://www.ncbi.nlm.nih.gov/pubmed/36821544 http://dx.doi.org/10.1371/journal.pone.0281922

_version_	1784892985536675840
author	Huang, Alexander A. Huang, Samuel Y.
author_facet	Huang, Alexander A. Huang, Samuel Y.
author_sort	Huang, Alexander A.
collection	PubMed
description	Machine learning methods are widely used within the medical field. However, the reliability and efficacy of these models is difficult to assess, making it difficult for researchers to identify which machine-learning model to apply to their dataset. We assessed whether variance calculations of model metrics (e.g., AUROC, Sensitivity, Specificity) through bootstrap simulation and SHapely Additive exPlanations (SHAP) could increase model transparency and improve model selection. Data from the England National Health Services Heart Disease Prediction Cohort was used. After comparison of model metrics for XGBoost, Random Forest, Artificial Neural Network, and Adaptive Boosting, XGBoost was used as the machine-learning model of choice in this study. Boost-strap simulation (N = 10,000) was used to empirically derive the distribution of model metrics and covariate Gain statistics. SHapely Additive exPlanations (SHAP) to provide explanations to machine-learning output and simulation to evaluate the variance of model accuracy metrics. For the XGBoost modeling method, we observed (through 10,000 completed simulations) that the AUROC ranged from 0.771 to 0.947, a difference of 0.176, the balanced accuracy ranged from 0.688 to 0.894, a 0.205 difference, the sensitivity ranged from 0.632 to 0.939, a 0.307 difference, and the specificity ranged from 0.595 to 0.944, a 0.394 difference. Among 10,000 simulations completed, we observed that the gain for Angina ranged from 0.225 to 0.456, a difference of 0.231, for Cholesterol ranged from 0.148 to 0.326, a difference of 0.178, for maximum heart rate (MaxHR) ranged from 0.081 to 0.200, a range of 0.119, and for Age ranged from 0.059 to 0.157, difference of 0.098. Use of simulations to empirically evaluate the variability of model metrics and explanatory algorithms to observe if covariates match the literature are necessary for increased transparency, reliability, and utility of machine learning methods. These variance statistics, combined with model accuracy statistics can help researchers identify the best model for a given dataset.
format	Online Article Text
id	pubmed-9949629
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-99496292023-02-24 Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations Huang, Alexander A. Huang, Samuel Y. PLoS One Research Article Machine learning methods are widely used within the medical field. However, the reliability and efficacy of these models is difficult to assess, making it difficult for researchers to identify which machine-learning model to apply to their dataset. We assessed whether variance calculations of model metrics (e.g., AUROC, Sensitivity, Specificity) through bootstrap simulation and SHapely Additive exPlanations (SHAP) could increase model transparency and improve model selection. Data from the England National Health Services Heart Disease Prediction Cohort was used. After comparison of model metrics for XGBoost, Random Forest, Artificial Neural Network, and Adaptive Boosting, XGBoost was used as the machine-learning model of choice in this study. Boost-strap simulation (N = 10,000) was used to empirically derive the distribution of model metrics and covariate Gain statistics. SHapely Additive exPlanations (SHAP) to provide explanations to machine-learning output and simulation to evaluate the variance of model accuracy metrics. For the XGBoost modeling method, we observed (through 10,000 completed simulations) that the AUROC ranged from 0.771 to 0.947, a difference of 0.176, the balanced accuracy ranged from 0.688 to 0.894, a 0.205 difference, the sensitivity ranged from 0.632 to 0.939, a 0.307 difference, and the specificity ranged from 0.595 to 0.944, a 0.394 difference. Among 10,000 simulations completed, we observed that the gain for Angina ranged from 0.225 to 0.456, a difference of 0.231, for Cholesterol ranged from 0.148 to 0.326, a difference of 0.178, for maximum heart rate (MaxHR) ranged from 0.081 to 0.200, a range of 0.119, and for Age ranged from 0.059 to 0.157, difference of 0.098. Use of simulations to empirically evaluate the variability of model metrics and explanatory algorithms to observe if covariates match the literature are necessary for increased transparency, reliability, and utility of machine learning methods. These variance statistics, combined with model accuracy statistics can help researchers identify the best model for a given dataset. Public Library of Science 2023-02-23 /pmc/articles/PMC9949629/ /pubmed/36821544 http://dx.doi.org/10.1371/journal.pone.0281922 Text en © 2023 Huang, Huang https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Huang, Alexander A. Huang, Samuel Y. Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations
title	Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations
title_full	Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations
title_fullStr	Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations
title_full_unstemmed	Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations
title_short	Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations
title_sort	increasing transparency in machine learning through bootstrap simulation and shapely additive explanations
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9949629/ https://www.ncbi.nlm.nih.gov/pubmed/36821544 http://dx.doi.org/10.1371/journal.pone.0281922
work_keys_str_mv	AT huangalexandera increasingtransparencyinmachinelearningthroughbootstrapsimulationandshapelyadditiveexplanations AT huangsamuely increasingtransparencyinmachinelearningthroughbootstrapsimulationandshapelyadditiveexplanations

Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations

Ejemplares similares