Cargando…
Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa
AIM: HIV prevention measures in sub-Saharan Africa are still short of attaining the UNAIDS 90–90-90 fast track targets set in 2014. Identifying predictors for HIV status may facilitate targeted screening interventions that improve health care. We aimed at identifying HIV predictors as well as predic...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8325403/ https://www.ncbi.nlm.nih.gov/pubmed/34332540 http://dx.doi.org/10.1186/s12874-021-01346-2 |
_version_ | 1783731554088910848 |
---|---|
author | Mutai, Charles K. McSharry, Patrick E. Ngaruye, Innocent Musabanganji, Edouard |
author_facet | Mutai, Charles K. McSharry, Patrick E. Ngaruye, Innocent Musabanganji, Edouard |
author_sort | Mutai, Charles K. |
collection | PubMed |
description | AIM: HIV prevention measures in sub-Saharan Africa are still short of attaining the UNAIDS 90–90-90 fast track targets set in 2014. Identifying predictors for HIV status may facilitate targeted screening interventions that improve health care. We aimed at identifying HIV predictors as well as predicting persons at high risk of the infection. METHOD: We applied machine learning approaches for building models using population-based HIV Impact Assessment (PHIA) data for 41,939 male and 45,105 female respondents with 30 and 40 variables respectively from four countries in sub-Saharan countries. We trained and validated the algorithms on 80% of the data and tested on the remaining 20% where we rotated around the left-out country. An algorithm with the best mean f1 score was retained and trained on the most predictive variables. We used the model to identify people living with HIV and individuals with a higher likelihood of contracting the disease. RESULTS: Application of XGBoost algorithm appeared to significantly improve identification of HIV positivity over the other five algorithms by f1 scoring mean of 90% and 92% for males and females respectively. Amongst the eight most predictor features in both sexes were: age, relationship with family head, the highest level of education, highest grade at that school level, work for payment, avoiding pregnancy, age at the first experience of sex, and wealth quintile. Model performance using these variables increased significantly compared to having all the variables included. We identified five males and 19 females individuals that would require testing to find one HIV positive individual. We also predicted that 4·14% of males and 10.81% of females are at high risk of infection. CONCLUSION: Our findings provide a potential use of the XGBoost algorithm with socio-behavioural-driven data at substantially identifying HIV predictors and predicting individuals at high risk of infection for targeted screening. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-021-01346-2. |
format | Online Article Text |
id | pubmed-8325403 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-83254032021-08-02 Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa Mutai, Charles K. McSharry, Patrick E. Ngaruye, Innocent Musabanganji, Edouard BMC Med Res Methodol Research AIM: HIV prevention measures in sub-Saharan Africa are still short of attaining the UNAIDS 90–90-90 fast track targets set in 2014. Identifying predictors for HIV status may facilitate targeted screening interventions that improve health care. We aimed at identifying HIV predictors as well as predicting persons at high risk of the infection. METHOD: We applied machine learning approaches for building models using population-based HIV Impact Assessment (PHIA) data for 41,939 male and 45,105 female respondents with 30 and 40 variables respectively from four countries in sub-Saharan countries. We trained and validated the algorithms on 80% of the data and tested on the remaining 20% where we rotated around the left-out country. An algorithm with the best mean f1 score was retained and trained on the most predictive variables. We used the model to identify people living with HIV and individuals with a higher likelihood of contracting the disease. RESULTS: Application of XGBoost algorithm appeared to significantly improve identification of HIV positivity over the other five algorithms by f1 scoring mean of 90% and 92% for males and females respectively. Amongst the eight most predictor features in both sexes were: age, relationship with family head, the highest level of education, highest grade at that school level, work for payment, avoiding pregnancy, age at the first experience of sex, and wealth quintile. Model performance using these variables increased significantly compared to having all the variables included. We identified five males and 19 females individuals that would require testing to find one HIV positive individual. We also predicted that 4·14% of males and 10.81% of females are at high risk of infection. CONCLUSION: Our findings provide a potential use of the XGBoost algorithm with socio-behavioural-driven data at substantially identifying HIV predictors and predicting individuals at high risk of infection for targeted screening. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-021-01346-2. BioMed Central 2021-07-31 /pmc/articles/PMC8325403/ /pubmed/34332540 http://dx.doi.org/10.1186/s12874-021-01346-2 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Mutai, Charles K. McSharry, Patrick E. Ngaruye, Innocent Musabanganji, Edouard Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa |
title | Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa |
title_full | Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa |
title_fullStr | Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa |
title_full_unstemmed | Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa |
title_short | Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa |
title_sort | use of machine learning techniques to identify hiv predictors for screening in sub-saharan africa |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8325403/ https://www.ncbi.nlm.nih.gov/pubmed/34332540 http://dx.doi.org/10.1186/s12874-021-01346-2 |
work_keys_str_mv | AT mutaicharlesk useofmachinelearningtechniquestoidentifyhivpredictorsforscreeninginsubsaharanafrica AT mcsharrypatricke useofmachinelearningtechniquestoidentifyhivpredictorsforscreeninginsubsaharanafrica AT ngaruyeinnocent useofmachinelearningtechniquestoidentifyhivpredictorsforscreeninginsubsaharanafrica AT musabanganjiedouard useofmachinelearningtechniquestoidentifyhivpredictorsforscreeninginsubsaharanafrica |