Cargando…

Machine learning-based risk factor analysis and prevalence prediction of intestinal parasitic infections using epidemiological survey data

BACKGROUND: Previous epidemiological studies have examined the prevalence and risk factors for a variety of parasitic illnesses, including protozoan and soil-transmitted helminth (STH, e.g., hookworms and roundworms) infections. Despite advancements in machine learning for data analysis, the majorit...

Descripción completa

Detalles Bibliográficos
Autores principales: Zafar, Aziz, Attia, Ziad, Tesfaye, Mehret, Walelign, Sosina, Wordofa, Moges, Abera, Dessie, Desta, Kassu, Tsegaye, Aster, Ay, Ahmet, Taye, Bineyam
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9236253/
https://www.ncbi.nlm.nih.gov/pubmed/35700192
http://dx.doi.org/10.1371/journal.pntd.0010517
_version_ 1784736489320480768
author Zafar, Aziz
Attia, Ziad
Tesfaye, Mehret
Walelign, Sosina
Wordofa, Moges
Abera, Dessie
Desta, Kassu
Tsegaye, Aster
Ay, Ahmet
Taye, Bineyam
author_facet Zafar, Aziz
Attia, Ziad
Tesfaye, Mehret
Walelign, Sosina
Wordofa, Moges
Abera, Dessie
Desta, Kassu
Tsegaye, Aster
Ay, Ahmet
Taye, Bineyam
author_sort Zafar, Aziz
collection PubMed
description BACKGROUND: Previous epidemiological studies have examined the prevalence and risk factors for a variety of parasitic illnesses, including protozoan and soil-transmitted helminth (STH, e.g., hookworms and roundworms) infections. Despite advancements in machine learning for data analysis, the majority of these studies use traditional logistic regression to identify significant risk factors. METHODS: In this study, we used data from a survey of 54 risk factors for intestinal parasitosis in 954 Ethiopian school children. We investigated whether machine learning approaches can supplement traditional logistic regression in identifying intestinal parasite infection risk factors. We used feature selection methods such as InfoGain (IG), ReliefF (ReF), Joint Mutual Information (JMI), and Minimum Redundancy Maximum Relevance (MRMR). Additionally, we predicted children’s parasitic infection status using classifiers such as Logistic Regression (LR), Support Vector Machines (SVM), Random Forests (RF) and XGBoost (XGB), and compared their accuracy and area under the receiver operating characteristic curve (AUROC) scores. For optimal model training, we performed tenfold cross-validation and tuned the classifier hyperparameters. We balanced our dataset using the Synthetic Minority Oversampling (SMOTE) method. Additionally, we used association rule learning to establish a link between risk factors and parasitic infections. KEY FINDINGS: Our study demonstrated that machine learning could be used in conjunction with logistic regression. Using machine learning, we developed models that accurately predicted four parasitic infections: any parasitic infection at 79.9% accuracy, helminth infection at 84.9%, any STH infection at 95.9%, and protozoan infection at 94.2%. The Random Forests (RF) and Support Vector Machines (SVM) classifiers achieved the highest accuracy when top 20 risk factors were considered using Joint Mutual Information (JMI) or all features were used. The best predictors of infection were socioeconomic, demographic, and hematological characteristics. CONCLUSIONS: We demonstrated that feature selection and association rule learning are useful strategies for detecting risk factors for parasite infection. Additionally, we showed that advanced classifiers might be utilized to predict children’s parasitic infection status. When combined with standard logistic regression models, machine learning techniques can identify novel risk factors and predict infection risk.
format Online
Article
Text
id pubmed-9236253
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-92362532022-06-28 Machine learning-based risk factor analysis and prevalence prediction of intestinal parasitic infections using epidemiological survey data Zafar, Aziz Attia, Ziad Tesfaye, Mehret Walelign, Sosina Wordofa, Moges Abera, Dessie Desta, Kassu Tsegaye, Aster Ay, Ahmet Taye, Bineyam PLoS Negl Trop Dis Research Article BACKGROUND: Previous epidemiological studies have examined the prevalence and risk factors for a variety of parasitic illnesses, including protozoan and soil-transmitted helminth (STH, e.g., hookworms and roundworms) infections. Despite advancements in machine learning for data analysis, the majority of these studies use traditional logistic regression to identify significant risk factors. METHODS: In this study, we used data from a survey of 54 risk factors for intestinal parasitosis in 954 Ethiopian school children. We investigated whether machine learning approaches can supplement traditional logistic regression in identifying intestinal parasite infection risk factors. We used feature selection methods such as InfoGain (IG), ReliefF (ReF), Joint Mutual Information (JMI), and Minimum Redundancy Maximum Relevance (MRMR). Additionally, we predicted children’s parasitic infection status using classifiers such as Logistic Regression (LR), Support Vector Machines (SVM), Random Forests (RF) and XGBoost (XGB), and compared their accuracy and area under the receiver operating characteristic curve (AUROC) scores. For optimal model training, we performed tenfold cross-validation and tuned the classifier hyperparameters. We balanced our dataset using the Synthetic Minority Oversampling (SMOTE) method. Additionally, we used association rule learning to establish a link between risk factors and parasitic infections. KEY FINDINGS: Our study demonstrated that machine learning could be used in conjunction with logistic regression. Using machine learning, we developed models that accurately predicted four parasitic infections: any parasitic infection at 79.9% accuracy, helminth infection at 84.9%, any STH infection at 95.9%, and protozoan infection at 94.2%. The Random Forests (RF) and Support Vector Machines (SVM) classifiers achieved the highest accuracy when top 20 risk factors were considered using Joint Mutual Information (JMI) or all features were used. The best predictors of infection were socioeconomic, demographic, and hematological characteristics. CONCLUSIONS: We demonstrated that feature selection and association rule learning are useful strategies for detecting risk factors for parasite infection. Additionally, we showed that advanced classifiers might be utilized to predict children’s parasitic infection status. When combined with standard logistic regression models, machine learning techniques can identify novel risk factors and predict infection risk. Public Library of Science 2022-06-14 /pmc/articles/PMC9236253/ /pubmed/35700192 http://dx.doi.org/10.1371/journal.pntd.0010517 Text en © 2022 Zafar et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zafar, Aziz
Attia, Ziad
Tesfaye, Mehret
Walelign, Sosina
Wordofa, Moges
Abera, Dessie
Desta, Kassu
Tsegaye, Aster
Ay, Ahmet
Taye, Bineyam
Machine learning-based risk factor analysis and prevalence prediction of intestinal parasitic infections using epidemiological survey data
title Machine learning-based risk factor analysis and prevalence prediction of intestinal parasitic infections using epidemiological survey data
title_full Machine learning-based risk factor analysis and prevalence prediction of intestinal parasitic infections using epidemiological survey data
title_fullStr Machine learning-based risk factor analysis and prevalence prediction of intestinal parasitic infections using epidemiological survey data
title_full_unstemmed Machine learning-based risk factor analysis and prevalence prediction of intestinal parasitic infections using epidemiological survey data
title_short Machine learning-based risk factor analysis and prevalence prediction of intestinal parasitic infections using epidemiological survey data
title_sort machine learning-based risk factor analysis and prevalence prediction of intestinal parasitic infections using epidemiological survey data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9236253/
https://www.ncbi.nlm.nih.gov/pubmed/35700192
http://dx.doi.org/10.1371/journal.pntd.0010517
work_keys_str_mv AT zafaraziz machinelearningbasedriskfactoranalysisandprevalencepredictionofintestinalparasiticinfectionsusingepidemiologicalsurveydata
AT attiaziad machinelearningbasedriskfactoranalysisandprevalencepredictionofintestinalparasiticinfectionsusingepidemiologicalsurveydata
AT tesfayemehret machinelearningbasedriskfactoranalysisandprevalencepredictionofintestinalparasiticinfectionsusingepidemiologicalsurveydata
AT walelignsosina machinelearningbasedriskfactoranalysisandprevalencepredictionofintestinalparasiticinfectionsusingepidemiologicalsurveydata
AT wordofamoges machinelearningbasedriskfactoranalysisandprevalencepredictionofintestinalparasiticinfectionsusingepidemiologicalsurveydata
AT aberadessie machinelearningbasedriskfactoranalysisandprevalencepredictionofintestinalparasiticinfectionsusingepidemiologicalsurveydata
AT destakassu machinelearningbasedriskfactoranalysisandprevalencepredictionofintestinalparasiticinfectionsusingepidemiologicalsurveydata
AT tsegayeaster machinelearningbasedriskfactoranalysisandprevalencepredictionofintestinalparasiticinfectionsusingepidemiologicalsurveydata
AT ayahmet machinelearningbasedriskfactoranalysisandprevalencepredictionofintestinalparasiticinfectionsusingepidemiologicalsurveydata
AT tayebineyam machinelearningbasedriskfactoranalysisandprevalencepredictionofintestinalparasiticinfectionsusingepidemiologicalsurveydata