Cargando…
Predicting HIV infection in the decade (2005–2015) pre-COVID-19 in Zimbabwe: A supervised classification-based machine learning approach
The burden of HIV and related diseases have been areas of great concern pre and post the emergence of COVID-19 in Zimbabwe. Machine learning models have been used to predict the risk of diseases, including HIV accurately. Therefore, this paper aimed to determine common risk factors of HIV positivity...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10246851/ https://www.ncbi.nlm.nih.gov/pubmed/37285368 http://dx.doi.org/10.1371/journal.pdig.0000260 |
_version_ | 1785055117069778944 |
---|---|
author | Birri Makota, Rutendo Beauty Musenge, Eustasius |
author_facet | Birri Makota, Rutendo Beauty Musenge, Eustasius |
author_sort | Birri Makota, Rutendo Beauty |
collection | PubMed |
description | The burden of HIV and related diseases have been areas of great concern pre and post the emergence of COVID-19 in Zimbabwe. Machine learning models have been used to predict the risk of diseases, including HIV accurately. Therefore, this paper aimed to determine common risk factors of HIV positivity in Zimbabwe between the decade 2005 to 2015. The data were from three two staged population five-yearly surveys conducted between 2005 and 2015. The outcome variable was HIV status. The prediction model was fit by adopting 80% of the data for learning/training and 20% for testing/prediction. Resampling was done using the stratified 5-fold cross-validation procedure repeatedly. Feature selection was done using Lasso regression, and the best combination of selected features was determined using Sequential Forward Floating Selection. We compared six algorithms in both sexes based on the F1 score, which is the harmonic mean of precision and recall. The overall HIV prevalence for the combined dataset was 22.5% and 15.3% for females and males, respectively. The best-performing algorithm to identify individuals with a higher likelihood of HIV infection was XGBoost, with a high F1 score of 91.4% for males and 90.1% for females based on the combined surveys. The results from the prediction model identified six common features associated with HIV, with total number of lifetime sexual partners and cohabitation duration being the most influential variables for females and males, respectively. In addition to other risk reduction techniques, machine learning may aid in identifying those who might require Pre-exposure prophylaxis, particularly women who experience intimate partner violence. Furthermore, compared to traditional statistical approaches, machine learning uncovered patterns in predicting HIV infection with comparatively reduced uncertainty and, therefore, crucial for effective decision-making. |
format | Online Article Text |
id | pubmed-10246851 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-102468512023-06-08 Predicting HIV infection in the decade (2005–2015) pre-COVID-19 in Zimbabwe: A supervised classification-based machine learning approach Birri Makota, Rutendo Beauty Musenge, Eustasius PLOS Digit Health Research Article The burden of HIV and related diseases have been areas of great concern pre and post the emergence of COVID-19 in Zimbabwe. Machine learning models have been used to predict the risk of diseases, including HIV accurately. Therefore, this paper aimed to determine common risk factors of HIV positivity in Zimbabwe between the decade 2005 to 2015. The data were from three two staged population five-yearly surveys conducted between 2005 and 2015. The outcome variable was HIV status. The prediction model was fit by adopting 80% of the data for learning/training and 20% for testing/prediction. Resampling was done using the stratified 5-fold cross-validation procedure repeatedly. Feature selection was done using Lasso regression, and the best combination of selected features was determined using Sequential Forward Floating Selection. We compared six algorithms in both sexes based on the F1 score, which is the harmonic mean of precision and recall. The overall HIV prevalence for the combined dataset was 22.5% and 15.3% for females and males, respectively. The best-performing algorithm to identify individuals with a higher likelihood of HIV infection was XGBoost, with a high F1 score of 91.4% for males and 90.1% for females based on the combined surveys. The results from the prediction model identified six common features associated with HIV, with total number of lifetime sexual partners and cohabitation duration being the most influential variables for females and males, respectively. In addition to other risk reduction techniques, machine learning may aid in identifying those who might require Pre-exposure prophylaxis, particularly women who experience intimate partner violence. Furthermore, compared to traditional statistical approaches, machine learning uncovered patterns in predicting HIV infection with comparatively reduced uncertainty and, therefore, crucial for effective decision-making. Public Library of Science 2023-06-07 /pmc/articles/PMC10246851/ /pubmed/37285368 http://dx.doi.org/10.1371/journal.pdig.0000260 Text en © 2023 Makota, Musenge https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Birri Makota, Rutendo Beauty Musenge, Eustasius Predicting HIV infection in the decade (2005–2015) pre-COVID-19 in Zimbabwe: A supervised classification-based machine learning approach |
title | Predicting HIV infection in the decade (2005–2015) pre-COVID-19 in Zimbabwe: A supervised classification-based machine learning approach |
title_full | Predicting HIV infection in the decade (2005–2015) pre-COVID-19 in Zimbabwe: A supervised classification-based machine learning approach |
title_fullStr | Predicting HIV infection in the decade (2005–2015) pre-COVID-19 in Zimbabwe: A supervised classification-based machine learning approach |
title_full_unstemmed | Predicting HIV infection in the decade (2005–2015) pre-COVID-19 in Zimbabwe: A supervised classification-based machine learning approach |
title_short | Predicting HIV infection in the decade (2005–2015) pre-COVID-19 in Zimbabwe: A supervised classification-based machine learning approach |
title_sort | predicting hiv infection in the decade (2005–2015) pre-covid-19 in zimbabwe: a supervised classification-based machine learning approach |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10246851/ https://www.ncbi.nlm.nih.gov/pubmed/37285368 http://dx.doi.org/10.1371/journal.pdig.0000260 |
work_keys_str_mv | AT birrimakotarutendobeauty predictinghivinfectioninthedecade20052015precovid19inzimbabweasupervisedclassificationbasedmachinelearningapproach AT musengeeustasius predictinghivinfectioninthedecade20052015precovid19inzimbabweasupervisedclassificationbasedmachinelearningapproach |