Cargando…

Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter

BACKGROUND: Lyme disease is one of the most commonly reported infectious diseases in the United States (US), accounting for more than [Formula: see text] of all vector-borne diseases in North America. OBJECTIVE: In this paper, self-reported tweets on Twitter were analyzed in order to predict potenti...

Descripción completa

Detalles Bibliográficos
Autores principales:	Boligarla, Srikanth, Laison, Elda Kokoè Elolo, Li, Jiaxin, Mahadevan, Raja, Ng, Austen, Lin, Yangming, Thioub, Mamadou Yamar, Huang, Bruce, Ibrahim, Mohamed Hamza, Nasri, Bouchra
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10578027/ https://www.ncbi.nlm.nih.gov/pubmed/37845666 http://dx.doi.org/10.1186/s12911-023-02315-z

_version_	1785121437215883264
author	Boligarla, Srikanth Laison, Elda Kokoè Elolo Li, Jiaxin Mahadevan, Raja Ng, Austen Lin, Yangming Thioub, Mamadou Yamar Huang, Bruce Ibrahim, Mohamed Hamza Nasri, Bouchra
author_facet	Boligarla, Srikanth Laison, Elda Kokoè Elolo Li, Jiaxin Mahadevan, Raja Ng, Austen Lin, Yangming Thioub, Mamadou Yamar Huang, Bruce Ibrahim, Mohamed Hamza Nasri, Bouchra
author_sort	Boligarla, Srikanth
collection	PubMed
description	BACKGROUND: Lyme disease is one of the most commonly reported infectious diseases in the United States (US), accounting for more than [Formula: see text] of all vector-borne diseases in North America. OBJECTIVE: In this paper, self-reported tweets on Twitter were analyzed in order to predict potential Lyme disease cases and accurately assess incidence rates in the US. METHODS: The study was done in three stages: (1) Approximately 1.3 million tweets were collected and pre-processed to extract the most relevant Lyme disease tweets with geolocations. A subset of tweets were semi-automatically labelled as relevant or irrelevant to Lyme disease using a set of precise keywords, and the remaining portion were manually labelled, yielding a curated labelled dataset of 77, 500 tweets. (2) This labelled data set was used to train, validate, and test various combinations of NLP word embedding methods and prominent ML classification models, such as TF-IDF and logistic regression, Word2vec and XGboost, and BERTweet, among others, to identify potential Lyme disease tweets. (3) Lastly, the presence of spatio-temporal patterns in the US over a 10-year period were studied. RESULTS: Preliminary results showed that BERTweet outperformed all tested NLP classifiers for identifying Lyme disease tweets, achieving the highest classification accuracy and F1-score of [Formula: see text] . There was also a consistent pattern indicating that the West and Northeast regions of the US had a higher tweet rate over time. CONCLUSIONS: We focused on the less-studied problem of using Twitter data as a surveillance tool for Lyme disease in the US. Several crucial findings have emerged from the study. First, there is a fairly strong correlation between classified tweet counts and Lyme disease counts, with both following similar trends. Second, in 2015 and early 2016, the social media network like Twitter was essential in raising popular awareness of Lyme disease. Third, counties with a high incidence rate were not necessarily related with a high tweet rate, and vice versa. Fourth, BERTweet can be used as a reliable NLP classifier for detecting relevant Lyme disease tweets. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02315-z.
format	Online Article Text
id	pubmed-10578027
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-105780272023-10-17 Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter Boligarla, Srikanth Laison, Elda Kokoè Elolo Li, Jiaxin Mahadevan, Raja Ng, Austen Lin, Yangming Thioub, Mamadou Yamar Huang, Bruce Ibrahim, Mohamed Hamza Nasri, Bouchra BMC Med Inform Decis Mak Research BACKGROUND: Lyme disease is one of the most commonly reported infectious diseases in the United States (US), accounting for more than [Formula: see text] of all vector-borne diseases in North America. OBJECTIVE: In this paper, self-reported tweets on Twitter were analyzed in order to predict potential Lyme disease cases and accurately assess incidence rates in the US. METHODS: The study was done in three stages: (1) Approximately 1.3 million tweets were collected and pre-processed to extract the most relevant Lyme disease tweets with geolocations. A subset of tweets were semi-automatically labelled as relevant or irrelevant to Lyme disease using a set of precise keywords, and the remaining portion were manually labelled, yielding a curated labelled dataset of 77, 500 tweets. (2) This labelled data set was used to train, validate, and test various combinations of NLP word embedding methods and prominent ML classification models, such as TF-IDF and logistic regression, Word2vec and XGboost, and BERTweet, among others, to identify potential Lyme disease tweets. (3) Lastly, the presence of spatio-temporal patterns in the US over a 10-year period were studied. RESULTS: Preliminary results showed that BERTweet outperformed all tested NLP classifiers for identifying Lyme disease tweets, achieving the highest classification accuracy and F1-score of [Formula: see text] . There was also a consistent pattern indicating that the West and Northeast regions of the US had a higher tweet rate over time. CONCLUSIONS: We focused on the less-studied problem of using Twitter data as a surveillance tool for Lyme disease in the US. Several crucial findings have emerged from the study. First, there is a fairly strong correlation between classified tweet counts and Lyme disease counts, with both following similar trends. Second, in 2015 and early 2016, the social media network like Twitter was essential in raising popular awareness of Lyme disease. Third, counties with a high incidence rate were not necessarily related with a high tweet rate, and vice versa. Fourth, BERTweet can be used as a reliable NLP classifier for detecting relevant Lyme disease tweets. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02315-z. BioMed Central 2023-10-16 /pmc/articles/PMC10578027/ /pubmed/37845666 http://dx.doi.org/10.1186/s12911-023-02315-z Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Boligarla, Srikanth Laison, Elda Kokoè Elolo Li, Jiaxin Mahadevan, Raja Ng, Austen Lin, Yangming Thioub, Mamadou Yamar Huang, Bruce Ibrahim, Mohamed Hamza Nasri, Bouchra Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter
title	Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter
title_full	Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter
title_fullStr	Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter
title_full_unstemmed	Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter
title_short	Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter
title_sort	leveraging machine learning approaches for predicting potential lyme disease cases and incidence rates in the united states using twitter
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10578027/ https://www.ncbi.nlm.nih.gov/pubmed/37845666 http://dx.doi.org/10.1186/s12911-023-02315-z
work_keys_str_mv	AT boligarlasrikanth leveragingmachinelearningapproachesforpredictingpotentiallymediseasecasesandincidenceratesintheunitedstatesusingtwitter AT laisoneldakokoeelolo leveragingmachinelearningapproachesforpredictingpotentiallymediseasecasesandincidenceratesintheunitedstatesusingtwitter AT lijiaxin leveragingmachinelearningapproachesforpredictingpotentiallymediseasecasesandincidenceratesintheunitedstatesusingtwitter AT mahadevanraja leveragingmachinelearningapproachesforpredictingpotentiallymediseasecasesandincidenceratesintheunitedstatesusingtwitter AT ngausten leveragingmachinelearningapproachesforpredictingpotentiallymediseasecasesandincidenceratesintheunitedstatesusingtwitter AT linyangming leveragingmachinelearningapproachesforpredictingpotentiallymediseasecasesandincidenceratesintheunitedstatesusingtwitter AT thioubmamadouyamar leveragingmachinelearningapproachesforpredictingpotentiallymediseasecasesandincidenceratesintheunitedstatesusingtwitter AT huangbruce leveragingmachinelearningapproachesforpredictingpotentiallymediseasecasesandincidenceratesintheunitedstatesusingtwitter AT ibrahimmohamedhamza leveragingmachinelearningapproachesforpredictingpotentiallymediseasecasesandincidenceratesintheunitedstatesusingtwitter AT nasribouchra leveragingmachinelearningapproachesforpredictingpotentiallymediseasecasesandincidenceratesintheunitedstatesusingtwitter

Leveraging machine learning approaches for predicting potential Lyme disease cases and incidence rates in the United States using Twitter

Ejemplares similares