Cargando…

Identification of the high-risk area for schistosomiasis transmission in China based on information value and machine learning: a newly data-driven modeling attempt

BACKGROUND: Schistosomiasis control is striving forward to transmission interruption and even elimination, evidence-lead control is of vital importance to eliminate the hidden dangers of schistosomiasis. This study attempts to identify high risk areas of schistosomiasis in China by using information...

Descripción completa

Detalles Bibliográficos
Autores principales: Gong, Yan-Feng, Zhu, Ling-Qian, Li, Yin-Long, Zhang, Li-Juan, Xue, Jing-Bo, Xia, Shang, Lv, Shan, Xu, Jing, Li, Shi-Zhu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8237418/
https://www.ncbi.nlm.nih.gov/pubmed/34176515
http://dx.doi.org/10.1186/s40249-021-00874-9
_version_ 1783714723062087680
author Gong, Yan-Feng
Zhu, Ling-Qian
Li, Yin-Long
Zhang, Li-Juan
Xue, Jing-Bo
Xia, Shang
Lv, Shan
Xu, Jing
Li, Shi-Zhu
author_facet Gong, Yan-Feng
Zhu, Ling-Qian
Li, Yin-Long
Zhang, Li-Juan
Xue, Jing-Bo
Xia, Shang
Lv, Shan
Xu, Jing
Li, Shi-Zhu
author_sort Gong, Yan-Feng
collection PubMed
description BACKGROUND: Schistosomiasis control is striving forward to transmission interruption and even elimination, evidence-lead control is of vital importance to eliminate the hidden dangers of schistosomiasis. This study attempts to identify high risk areas of schistosomiasis in China by using information value and machine learning. METHODS: The local case distribution from schistosomiasis surveillance data in China between 2005 and 2019 was assessed based on 19 variables including climate, geography, and social economy. Seven models were built in three categories including information value (IV), three machine learning models [logistic regression (LR), random forest (RF), generalized boosted model (GBM)], and three coupled models (IV + LR, IV + RF, IV + GBM). Accuracy, area under the curve (AUC), and F1-score were used to evaluate the prediction performance of the models. The optimal model was selected to predict the risk distribution for schistosomiasis. RESULTS: There is a more prone to schistosomiasis epidemic provided that paddy fields, grasslands, less than 2.5 km from the waterway, annual average temperature of 11.5–19.0 °C, annual average rainfall of 1000–1550 mm. IV + GBM had the highest prediction effect (accuracy = 0.878, AUC = 0.902, F1 = 0.920) compared with the other six models. The results of IV + GBM showed that the risk areas are mainly distributed in the coastal regions of the middle and lower reaches of the Yangtze River, the Poyang Lake region, and the Dongting Lake region. High-risk areas are primarily distributed in eastern Changde, western Yueyang, northeastern Yiyang, middle Changsha of Hunan province; southern Jiujiang, northern Nanchang, northeastern Shangrao, eastern Yichun in Jiangxi province; southern Jingzhou, southern Xiantao, middle Wuhan in Hubei province; southern Anqing, northwestern Guichi, eastern Wuhu in Anhui province; middle Meishan, northern Leshan, and the middle of Liangshan in Sichuan province. CONCLUSIONS: The risk of schistosomiasis transmission in China still exists, with high-risk areas relatively concentrated in the coastal regions of the middle and lower reaches of the Yangtze River. Coupled models of IV and machine learning provide for effective analysis and prediction, forming a scientific basis for evidence-lead surveillance and control. GRAPHIC ABSTRACT: [Image: see text] SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40249-021-00874-9.
format Online
Article
Text
id pubmed-8237418
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-82374182021-06-29 Identification of the high-risk area for schistosomiasis transmission in China based on information value and machine learning: a newly data-driven modeling attempt Gong, Yan-Feng Zhu, Ling-Qian Li, Yin-Long Zhang, Li-Juan Xue, Jing-Bo Xia, Shang Lv, Shan Xu, Jing Li, Shi-Zhu Infect Dis Poverty Research Article BACKGROUND: Schistosomiasis control is striving forward to transmission interruption and even elimination, evidence-lead control is of vital importance to eliminate the hidden dangers of schistosomiasis. This study attempts to identify high risk areas of schistosomiasis in China by using information value and machine learning. METHODS: The local case distribution from schistosomiasis surveillance data in China between 2005 and 2019 was assessed based on 19 variables including climate, geography, and social economy. Seven models were built in three categories including information value (IV), three machine learning models [logistic regression (LR), random forest (RF), generalized boosted model (GBM)], and three coupled models (IV + LR, IV + RF, IV + GBM). Accuracy, area under the curve (AUC), and F1-score were used to evaluate the prediction performance of the models. The optimal model was selected to predict the risk distribution for schistosomiasis. RESULTS: There is a more prone to schistosomiasis epidemic provided that paddy fields, grasslands, less than 2.5 km from the waterway, annual average temperature of 11.5–19.0 °C, annual average rainfall of 1000–1550 mm. IV + GBM had the highest prediction effect (accuracy = 0.878, AUC = 0.902, F1 = 0.920) compared with the other six models. The results of IV + GBM showed that the risk areas are mainly distributed in the coastal regions of the middle and lower reaches of the Yangtze River, the Poyang Lake region, and the Dongting Lake region. High-risk areas are primarily distributed in eastern Changde, western Yueyang, northeastern Yiyang, middle Changsha of Hunan province; southern Jiujiang, northern Nanchang, northeastern Shangrao, eastern Yichun in Jiangxi province; southern Jingzhou, southern Xiantao, middle Wuhan in Hubei province; southern Anqing, northwestern Guichi, eastern Wuhu in Anhui province; middle Meishan, northern Leshan, and the middle of Liangshan in Sichuan province. CONCLUSIONS: The risk of schistosomiasis transmission in China still exists, with high-risk areas relatively concentrated in the coastal regions of the middle and lower reaches of the Yangtze River. Coupled models of IV and machine learning provide for effective analysis and prediction, forming a scientific basis for evidence-lead surveillance and control. GRAPHIC ABSTRACT: [Image: see text] SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40249-021-00874-9. BioMed Central 2021-06-27 /pmc/articles/PMC8237418/ /pubmed/34176515 http://dx.doi.org/10.1186/s40249-021-00874-9 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Gong, Yan-Feng
Zhu, Ling-Qian
Li, Yin-Long
Zhang, Li-Juan
Xue, Jing-Bo
Xia, Shang
Lv, Shan
Xu, Jing
Li, Shi-Zhu
Identification of the high-risk area for schistosomiasis transmission in China based on information value and machine learning: a newly data-driven modeling attempt
title Identification of the high-risk area for schistosomiasis transmission in China based on information value and machine learning: a newly data-driven modeling attempt
title_full Identification of the high-risk area for schistosomiasis transmission in China based on information value and machine learning: a newly data-driven modeling attempt
title_fullStr Identification of the high-risk area for schistosomiasis transmission in China based on information value and machine learning: a newly data-driven modeling attempt
title_full_unstemmed Identification of the high-risk area for schistosomiasis transmission in China based on information value and machine learning: a newly data-driven modeling attempt
title_short Identification of the high-risk area for schistosomiasis transmission in China based on information value and machine learning: a newly data-driven modeling attempt
title_sort identification of the high-risk area for schistosomiasis transmission in china based on information value and machine learning: a newly data-driven modeling attempt
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8237418/
https://www.ncbi.nlm.nih.gov/pubmed/34176515
http://dx.doi.org/10.1186/s40249-021-00874-9
work_keys_str_mv AT gongyanfeng identificationofthehighriskareaforschistosomiasistransmissioninchinabasedoninformationvalueandmachinelearninganewlydatadrivenmodelingattempt
AT zhulingqian identificationofthehighriskareaforschistosomiasistransmissioninchinabasedoninformationvalueandmachinelearninganewlydatadrivenmodelingattempt
AT liyinlong identificationofthehighriskareaforschistosomiasistransmissioninchinabasedoninformationvalueandmachinelearninganewlydatadrivenmodelingattempt
AT zhanglijuan identificationofthehighriskareaforschistosomiasistransmissioninchinabasedoninformationvalueandmachinelearninganewlydatadrivenmodelingattempt
AT xuejingbo identificationofthehighriskareaforschistosomiasistransmissioninchinabasedoninformationvalueandmachinelearninganewlydatadrivenmodelingattempt
AT xiashang identificationofthehighriskareaforschistosomiasistransmissioninchinabasedoninformationvalueandmachinelearninganewlydatadrivenmodelingattempt
AT lvshan identificationofthehighriskareaforschistosomiasistransmissioninchinabasedoninformationvalueandmachinelearninganewlydatadrivenmodelingattempt
AT xujing identificationofthehighriskareaforschistosomiasistransmissioninchinabasedoninformationvalueandmachinelearninganewlydatadrivenmodelingattempt
AT lishizhu identificationofthehighriskareaforschistosomiasistransmissioninchinabasedoninformationvalueandmachinelearninganewlydatadrivenmodelingattempt