Cargando…

Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP

In the field of landscape epidemiology, the contribution of machine learning (ML) to modeling of epidemiological risk scenarios presents itself as a good alternative. This study aims to break with the ”black box” paradigm that underlies the application of automatic learning techniques by using SHAP...

Descripción completa

Detalles Bibliográficos
Autores principales: Scavuzzo, Carlos Matias, Scavuzzo, Juan Manuel, Campero, Micaela Natalia, Anegagrie, Melaku, Aramendia, Aranzazu Amor, Benito, Agustín, Periago, Victoria
Formato: Online Artículo Texto
Lenguaje:English
Publicado: KeAi Publishing 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8844643/
https://www.ncbi.nlm.nih.gov/pubmed/35224316
http://dx.doi.org/10.1016/j.idm.2022.01.004
_version_ 1784651515867168768
author Scavuzzo, Carlos Matias
Scavuzzo, Juan Manuel
Campero, Micaela Natalia
Anegagrie, Melaku
Aramendia, Aranzazu Amor
Benito, Agustín
Periago, Victoria
author_facet Scavuzzo, Carlos Matias
Scavuzzo, Juan Manuel
Campero, Micaela Natalia
Anegagrie, Melaku
Aramendia, Aranzazu Amor
Benito, Agustín
Periago, Victoria
author_sort Scavuzzo, Carlos Matias
collection PubMed
description In the field of landscape epidemiology, the contribution of machine learning (ML) to modeling of epidemiological risk scenarios presents itself as a good alternative. This study aims to break with the ”black box” paradigm that underlies the application of automatic learning techniques by using SHAP to determine the contribution of each variable in ML models applied to geospatial health, using the prevalence of hookworms, intestinal parasites, in Ethiopia, where they are widely distributed; the country bears the third-highest burden of hookworm in Sub-Saharan Africa. XGBoost software was used, a very popular ML model, to fit and analyze the data. The Python SHAP library was used to understand the importance in the trained model, of the variables for predictions. The description of the contribution of these variables on a particular prediction was obtained, using different types of plot methods. The results show that the ML models are superior to the classical statistical models; not only demonstrating similar results but also explaining, by using the SHAP package, the influence and interactions between the variables in the generated models. This analysis provides information to help understand the epidemiological problem presented and provides a tool for similar studies.
format Online
Article
Text
id pubmed-8844643
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher KeAi Publishing
record_format MEDLINE/PubMed
spelling pubmed-88446432022-02-25 Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP Scavuzzo, Carlos Matias Scavuzzo, Juan Manuel Campero, Micaela Natalia Anegagrie, Melaku Aramendia, Aranzazu Amor Benito, Agustín Periago, Victoria Infect Dis Model Original Research Article In the field of landscape epidemiology, the contribution of machine learning (ML) to modeling of epidemiological risk scenarios presents itself as a good alternative. This study aims to break with the ”black box” paradigm that underlies the application of automatic learning techniques by using SHAP to determine the contribution of each variable in ML models applied to geospatial health, using the prevalence of hookworms, intestinal parasites, in Ethiopia, where they are widely distributed; the country bears the third-highest burden of hookworm in Sub-Saharan Africa. XGBoost software was used, a very popular ML model, to fit and analyze the data. The Python SHAP library was used to understand the importance in the trained model, of the variables for predictions. The description of the contribution of these variables on a particular prediction was obtained, using different types of plot methods. The results show that the ML models are superior to the classical statistical models; not only demonstrating similar results but also explaining, by using the SHAP package, the influence and interactions between the variables in the generated models. This analysis provides information to help understand the epidemiological problem presented and provides a tool for similar studies. KeAi Publishing 2022-02-03 /pmc/articles/PMC8844643/ /pubmed/35224316 http://dx.doi.org/10.1016/j.idm.2022.01.004 Text en © 2022 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Original Research Article
Scavuzzo, Carlos Matias
Scavuzzo, Juan Manuel
Campero, Micaela Natalia
Anegagrie, Melaku
Aramendia, Aranzazu Amor
Benito, Agustín
Periago, Victoria
Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
title Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
title_full Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
title_fullStr Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
title_full_unstemmed Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
title_short Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
title_sort feature importance: opening a soil-transmitted helminth machine learning model via shap
topic Original Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8844643/
https://www.ncbi.nlm.nih.gov/pubmed/35224316
http://dx.doi.org/10.1016/j.idm.2022.01.004
work_keys_str_mv AT scavuzzocarlosmatias featureimportanceopeningasoiltransmittedhelminthmachinelearningmodelviashap
AT scavuzzojuanmanuel featureimportanceopeningasoiltransmittedhelminthmachinelearningmodelviashap
AT camperomicaelanatalia featureimportanceopeningasoiltransmittedhelminthmachinelearningmodelviashap
AT anegagriemelaku featureimportanceopeningasoiltransmittedhelminthmachinelearningmodelviashap
AT aramendiaaranzazuamor featureimportanceopeningasoiltransmittedhelminthmachinelearningmodelviashap
AT benitoagustin featureimportanceopeningasoiltransmittedhelminthmachinelearningmodelviashap
AT periagovictoria featureimportanceopeningasoiltransmittedhelminthmachinelearningmodelviashap