Cargando…

Application of machine learning algorithms in predicting HIV infection among men who have sex with men: Model development and validation

BACKGROUND: Continuously growing of HIV incidence among men who have sex with men (MSM), as well as the low rate of HIV testing of MSM in China, demonstrates a need for innovative strategies to improve the implementation of HIV prevention. The use of machine learning algorithms is an increasing tend...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Jiajin, Li, Jinhua, Jiang, Siqing, Cheng, Wei, Jiang, Jun, Xu, Yun, Yang, Jiezhe, Zhou, Xin, Chai, Chengliang, Wu, Chao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9452878/
https://www.ncbi.nlm.nih.gov/pubmed/36091522
http://dx.doi.org/10.3389/fpubh.2022.967681
_version_ 1784785014009888768
author He, Jiajin
Li, Jinhua
Jiang, Siqing
Cheng, Wei
Jiang, Jun
Xu, Yun
Yang, Jiezhe
Zhou, Xin
Chai, Chengliang
Wu, Chao
author_facet He, Jiajin
Li, Jinhua
Jiang, Siqing
Cheng, Wei
Jiang, Jun
Xu, Yun
Yang, Jiezhe
Zhou, Xin
Chai, Chengliang
Wu, Chao
author_sort He, Jiajin
collection PubMed
description BACKGROUND: Continuously growing of HIV incidence among men who have sex with men (MSM), as well as the low rate of HIV testing of MSM in China, demonstrates a need for innovative strategies to improve the implementation of HIV prevention. The use of machine learning algorithms is an increasing tendency in disease diagnosis prediction. We aimed to develop and validate machine learning models in predicting HIV infection among MSM that can identify individuals at increased risk of HIV acquisition for transmission-reduction interventions. METHODS: We extracted data from MSM sentinel surveillance in Zhejiang province from 2018 to 2020. Univariate logistic regression was used to select significant variables in 2018–2019 data (P < 0.05). After data processing and feature selection, we divided the model development data into two groups by stratified random sampling: training data (70%) and testing data (30%). The Synthetic Minority Oversampling Technique (SMOTE) was applied to solve the problem of unbalanced data. The evaluation metrics of model performance were comprised of accuracy, precision, recall, F-measure, and the area under the receiver operating characteristic curve (AUC). Then, we explored three commonly-used machine learning algorithms to compare with logistic regression (LR), including decision tree (DT), support vector machines (SVM), and random forest (RF). Finally, the four models were validated prospectively with 2020 data from Zhejiang province. RESULTS: A total of 6,346 MSM were included in model development data, 372 of whom were diagnosed with HIV. In feature selection, 12 variables were selected as model predicting indicators. Compared with LR, the algorithms of DT, SVM, and RF improved the classification prediction performance in SMOTE-processed data, with the AUC of 0.778, 0.856, 0.887, and 0.942, respectively. RF was the best-performing algorithm (accuracy = 0.871, precision = 0.960, recall = 0.775, F-measure = 0.858, and AUC = 0.942). And the RF model still performed well on prospective validation (AUC = 0.846). CONCLUSION: Machine learning models are substantially better than conventional LR model and RF should be considered in prediction tools of HIV infection in Chinese MSM. Further studies are needed to optimize and promote these algorithms and evaluate their impact on HIV prevention of MSM.
format Online
Article
Text
id pubmed-9452878
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-94528782022-09-09 Application of machine learning algorithms in predicting HIV infection among men who have sex with men: Model development and validation He, Jiajin Li, Jinhua Jiang, Siqing Cheng, Wei Jiang, Jun Xu, Yun Yang, Jiezhe Zhou, Xin Chai, Chengliang Wu, Chao Front Public Health Public Health BACKGROUND: Continuously growing of HIV incidence among men who have sex with men (MSM), as well as the low rate of HIV testing of MSM in China, demonstrates a need for innovative strategies to improve the implementation of HIV prevention. The use of machine learning algorithms is an increasing tendency in disease diagnosis prediction. We aimed to develop and validate machine learning models in predicting HIV infection among MSM that can identify individuals at increased risk of HIV acquisition for transmission-reduction interventions. METHODS: We extracted data from MSM sentinel surveillance in Zhejiang province from 2018 to 2020. Univariate logistic regression was used to select significant variables in 2018–2019 data (P < 0.05). After data processing and feature selection, we divided the model development data into two groups by stratified random sampling: training data (70%) and testing data (30%). The Synthetic Minority Oversampling Technique (SMOTE) was applied to solve the problem of unbalanced data. The evaluation metrics of model performance were comprised of accuracy, precision, recall, F-measure, and the area under the receiver operating characteristic curve (AUC). Then, we explored three commonly-used machine learning algorithms to compare with logistic regression (LR), including decision tree (DT), support vector machines (SVM), and random forest (RF). Finally, the four models were validated prospectively with 2020 data from Zhejiang province. RESULTS: A total of 6,346 MSM were included in model development data, 372 of whom were diagnosed with HIV. In feature selection, 12 variables were selected as model predicting indicators. Compared with LR, the algorithms of DT, SVM, and RF improved the classification prediction performance in SMOTE-processed data, with the AUC of 0.778, 0.856, 0.887, and 0.942, respectively. RF was the best-performing algorithm (accuracy = 0.871, precision = 0.960, recall = 0.775, F-measure = 0.858, and AUC = 0.942). And the RF model still performed well on prospective validation (AUC = 0.846). CONCLUSION: Machine learning models are substantially better than conventional LR model and RF should be considered in prediction tools of HIV infection in Chinese MSM. Further studies are needed to optimize and promote these algorithms and evaluate their impact on HIV prevention of MSM. Frontiers Media S.A. 2022-08-25 /pmc/articles/PMC9452878/ /pubmed/36091522 http://dx.doi.org/10.3389/fpubh.2022.967681 Text en Copyright © 2022 He, Li, Jiang, Cheng, Jiang, Xu, Yang, Zhou, Chai and Wu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Public Health
He, Jiajin
Li, Jinhua
Jiang, Siqing
Cheng, Wei
Jiang, Jun
Xu, Yun
Yang, Jiezhe
Zhou, Xin
Chai, Chengliang
Wu, Chao
Application of machine learning algorithms in predicting HIV infection among men who have sex with men: Model development and validation
title Application of machine learning algorithms in predicting HIV infection among men who have sex with men: Model development and validation
title_full Application of machine learning algorithms in predicting HIV infection among men who have sex with men: Model development and validation
title_fullStr Application of machine learning algorithms in predicting HIV infection among men who have sex with men: Model development and validation
title_full_unstemmed Application of machine learning algorithms in predicting HIV infection among men who have sex with men: Model development and validation
title_short Application of machine learning algorithms in predicting HIV infection among men who have sex with men: Model development and validation
title_sort application of machine learning algorithms in predicting hiv infection among men who have sex with men: model development and validation
topic Public Health
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9452878/
https://www.ncbi.nlm.nih.gov/pubmed/36091522
http://dx.doi.org/10.3389/fpubh.2022.967681
work_keys_str_mv AT hejiajin applicationofmachinelearningalgorithmsinpredictinghivinfectionamongmenwhohavesexwithmenmodeldevelopmentandvalidation
AT lijinhua applicationofmachinelearningalgorithmsinpredictinghivinfectionamongmenwhohavesexwithmenmodeldevelopmentandvalidation
AT jiangsiqing applicationofmachinelearningalgorithmsinpredictinghivinfectionamongmenwhohavesexwithmenmodeldevelopmentandvalidation
AT chengwei applicationofmachinelearningalgorithmsinpredictinghivinfectionamongmenwhohavesexwithmenmodeldevelopmentandvalidation
AT jiangjun applicationofmachinelearningalgorithmsinpredictinghivinfectionamongmenwhohavesexwithmenmodeldevelopmentandvalidation
AT xuyun applicationofmachinelearningalgorithmsinpredictinghivinfectionamongmenwhohavesexwithmenmodeldevelopmentandvalidation
AT yangjiezhe applicationofmachinelearningalgorithmsinpredictinghivinfectionamongmenwhohavesexwithmenmodeldevelopmentandvalidation
AT zhouxin applicationofmachinelearningalgorithmsinpredictinghivinfectionamongmenwhohavesexwithmenmodeldevelopmentandvalidation
AT chaichengliang applicationofmachinelearningalgorithmsinpredictinghivinfectionamongmenwhohavesexwithmenmodeldevelopmentandvalidation
AT wuchao applicationofmachinelearningalgorithmsinpredictinghivinfectionamongmenwhohavesexwithmenmodeldevelopmentandvalidation