Cargando…

Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis

BACKGROUND: Lyme disease is among the most reported tick-borne diseases worldwide, making it a major ongoing public health concern. An effective Lyme disease case reporting system depends on timely diagnosis and reporting by health care professionals, and accurate laboratory testing and interpretati...

Descripción completa

Detalles Bibliográficos
Autores principales: Laison, Elda Kokoe Elolo, Hamza Ibrahim, Mohamed, Boligarla, Srikanth, Li, Jiaxin, Mahadevan, Raja, Ng, Austen, Muthuramalingam, Venkataraman, Lee, Wee Yi, Yin, Yijun, Nasri, Bouchra R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10616745/
https://www.ncbi.nlm.nih.gov/pubmed/37843893
http://dx.doi.org/10.2196/47014
_version_ 1785129464304238592
author Laison, Elda Kokoe Elolo
Hamza Ibrahim, Mohamed
Boligarla, Srikanth
Li, Jiaxin
Mahadevan, Raja
Ng, Austen
Muthuramalingam, Venkataraman
Lee, Wee Yi
Yin, Yijun
Nasri, Bouchra R
author_facet Laison, Elda Kokoe Elolo
Hamza Ibrahim, Mohamed
Boligarla, Srikanth
Li, Jiaxin
Mahadevan, Raja
Ng, Austen
Muthuramalingam, Venkataraman
Lee, Wee Yi
Yin, Yijun
Nasri, Bouchra R
author_sort Laison, Elda Kokoe Elolo
collection PubMed
description BACKGROUND: Lyme disease is among the most reported tick-borne diseases worldwide, making it a major ongoing public health concern. An effective Lyme disease case reporting system depends on timely diagnosis and reporting by health care professionals, and accurate laboratory testing and interpretation for clinical diagnosis validation. A lack of these can lead to delayed diagnosis and treatment, which can exacerbate the severity of Lyme disease symptoms. Therefore, there is a need to improve the monitoring of Lyme disease by using other data sources, such as web-based data. OBJECTIVE: We analyzed global Twitter data to understand its potential and limitations as a tool for Lyme disease surveillance. We propose a transformer-based classification system to identify potential Lyme disease cases using self-reported tweets. METHODS: Our initial sample included 20,000 tweets collected worldwide from a database of over 1.3 million Lyme disease tweets. After preprocessing and geolocating tweets, tweets in a subset of the initial sample were manually labeled as potential Lyme disease cases or non-Lyme disease cases using carefully selected keywords. Emojis were converted to sentiment words, which were then replaced in the tweets. This labeled tweet set was used for the training, validation, and performance testing of DistilBERT (distilled version of BERT [Bidirectional Encoder Representations from Transformers]), ALBERT (A Lite BERT), and BERTweet (BERT for English Tweets) classifiers. RESULTS: The empirical results showed that BERTweet was the best classifier among all evaluated models (average F1-score of 89.3%, classification accuracy of 90.0%, and precision of 97.1%). However, for recall, term frequency-inverse document frequency and k-nearest neighbors performed better (93.2% and 82.6%, respectively). On using emojis to enrich the tweet embeddings, BERTweet had an increased recall (8% increase), DistilBERT had an increased F1-score of 93.8% (4% increase) and classification accuracy of 94.1% (4% increase), and ALBERT had an increased F1-score of 93.1% (5% increase) and classification accuracy of 93.9% (5% increase). The general awareness of Lyme disease was high in the United States, the United Kingdom, Australia, and Canada, with self-reported potential cases of Lyme disease from these countries accounting for around 50% (9939/20,000) of the collected English-language tweets, whereas Lyme disease–related tweets were rare in countries from Africa and Asia. The most reported Lyme disease–related symptoms in the data were rash, fatigue, fever, and arthritis, while symptoms, such as lymphadenopathy, palpitations, swollen lymph nodes, neck stiffness, and arrythmia, were uncommon, in accordance with Lyme disease symptom frequency. CONCLUSIONS: The study highlights the robustness of BERTweet and DistilBERT as classifiers for potential cases of Lyme disease from self-reported data. The results demonstrated that emojis are effective for enrichment, thereby improving the accuracy of tweet embeddings and the performance of classifiers. Specifically, emojis reflecting sadness, empathy, and encouragement can reduce false negatives.
format Online
Article
Text
id pubmed-10616745
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-106167452023-11-01 Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis Laison, Elda Kokoe Elolo Hamza Ibrahim, Mohamed Boligarla, Srikanth Li, Jiaxin Mahadevan, Raja Ng, Austen Muthuramalingam, Venkataraman Lee, Wee Yi Yin, Yijun Nasri, Bouchra R J Med Internet Res Original Paper BACKGROUND: Lyme disease is among the most reported tick-borne diseases worldwide, making it a major ongoing public health concern. An effective Lyme disease case reporting system depends on timely diagnosis and reporting by health care professionals, and accurate laboratory testing and interpretation for clinical diagnosis validation. A lack of these can lead to delayed diagnosis and treatment, which can exacerbate the severity of Lyme disease symptoms. Therefore, there is a need to improve the monitoring of Lyme disease by using other data sources, such as web-based data. OBJECTIVE: We analyzed global Twitter data to understand its potential and limitations as a tool for Lyme disease surveillance. We propose a transformer-based classification system to identify potential Lyme disease cases using self-reported tweets. METHODS: Our initial sample included 20,000 tweets collected worldwide from a database of over 1.3 million Lyme disease tweets. After preprocessing and geolocating tweets, tweets in a subset of the initial sample were manually labeled as potential Lyme disease cases or non-Lyme disease cases using carefully selected keywords. Emojis were converted to sentiment words, which were then replaced in the tweets. This labeled tweet set was used for the training, validation, and performance testing of DistilBERT (distilled version of BERT [Bidirectional Encoder Representations from Transformers]), ALBERT (A Lite BERT), and BERTweet (BERT for English Tweets) classifiers. RESULTS: The empirical results showed that BERTweet was the best classifier among all evaluated models (average F1-score of 89.3%, classification accuracy of 90.0%, and precision of 97.1%). However, for recall, term frequency-inverse document frequency and k-nearest neighbors performed better (93.2% and 82.6%, respectively). On using emojis to enrich the tweet embeddings, BERTweet had an increased recall (8% increase), DistilBERT had an increased F1-score of 93.8% (4% increase) and classification accuracy of 94.1% (4% increase), and ALBERT had an increased F1-score of 93.1% (5% increase) and classification accuracy of 93.9% (5% increase). The general awareness of Lyme disease was high in the United States, the United Kingdom, Australia, and Canada, with self-reported potential cases of Lyme disease from these countries accounting for around 50% (9939/20,000) of the collected English-language tweets, whereas Lyme disease–related tweets were rare in countries from Africa and Asia. The most reported Lyme disease–related symptoms in the data were rash, fatigue, fever, and arthritis, while symptoms, such as lymphadenopathy, palpitations, swollen lymph nodes, neck stiffness, and arrythmia, were uncommon, in accordance with Lyme disease symptom frequency. CONCLUSIONS: The study highlights the robustness of BERTweet and DistilBERT as classifiers for potential cases of Lyme disease from self-reported data. The results demonstrated that emojis are effective for enrichment, thereby improving the accuracy of tweet embeddings and the performance of classifiers. Specifically, emojis reflecting sadness, empathy, and encouragement can reduce false negatives. JMIR Publications 2023-10-16 /pmc/articles/PMC10616745/ /pubmed/37843893 http://dx.doi.org/10.2196/47014 Text en ©Elda Kokoe Elolo Laison, Mohamed Hamza Ibrahim, Srikanth Boligarla, Jiaxin Li, Raja Mahadevan, Austen Ng, Venkataraman Muthuramalingam, Wee Yi Lee, Yijun Yin, Bouchra R Nasri. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 16.10.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Laison, Elda Kokoe Elolo
Hamza Ibrahim, Mohamed
Boligarla, Srikanth
Li, Jiaxin
Mahadevan, Raja
Ng, Austen
Muthuramalingam, Venkataraman
Lee, Wee Yi
Yin, Yijun
Nasri, Bouchra R
Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis
title Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis
title_full Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis
title_fullStr Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis
title_full_unstemmed Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis
title_short Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis
title_sort identifying potential lyme disease cases using self-reported worldwide tweets: deep learning modeling approach enhanced with sentimental words through emojis
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10616745/
https://www.ncbi.nlm.nih.gov/pubmed/37843893
http://dx.doi.org/10.2196/47014
work_keys_str_mv AT laisoneldakokoeelolo identifyingpotentiallymediseasecasesusingselfreportedworldwidetweetsdeeplearningmodelingapproachenhancedwithsentimentalwordsthroughemojis
AT hamzaibrahimmohamed identifyingpotentiallymediseasecasesusingselfreportedworldwidetweetsdeeplearningmodelingapproachenhancedwithsentimentalwordsthroughemojis
AT boligarlasrikanth identifyingpotentiallymediseasecasesusingselfreportedworldwidetweetsdeeplearningmodelingapproachenhancedwithsentimentalwordsthroughemojis
AT lijiaxin identifyingpotentiallymediseasecasesusingselfreportedworldwidetweetsdeeplearningmodelingapproachenhancedwithsentimentalwordsthroughemojis
AT mahadevanraja identifyingpotentiallymediseasecasesusingselfreportedworldwidetweetsdeeplearningmodelingapproachenhancedwithsentimentalwordsthroughemojis
AT ngausten identifyingpotentiallymediseasecasesusingselfreportedworldwidetweetsdeeplearningmodelingapproachenhancedwithsentimentalwordsthroughemojis
AT muthuramalingamvenkataraman identifyingpotentiallymediseasecasesusingselfreportedworldwidetweetsdeeplearningmodelingapproachenhancedwithsentimentalwordsthroughemojis
AT leeweeyi identifyingpotentiallymediseasecasesusingselfreportedworldwidetweetsdeeplearningmodelingapproachenhancedwithsentimentalwordsthroughemojis
AT yinyijun identifyingpotentiallymediseasecasesusingselfreportedworldwidetweetsdeeplearningmodelingapproachenhancedwithsentimentalwordsthroughemojis
AT nasribouchrar identifyingpotentiallymediseasecasesusingselfreportedworldwidetweetsdeeplearningmodelingapproachenhancedwithsentimentalwordsthroughemojis