Cargando…

Monitoring COVID-19 pandemic through the lens of social media using natural language processing and machine learning

PURPOSE: It has been over a year since the first known case of coronavirus disease (COVID-19) emerged, yet the pandemic is far from over. To date, the coronavirus pandemic has infected over eighty million people and has killed more than 1.78 million worldwide. This study aims to explore “how useful...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Yang, Whitfield, Christopher, Zhang, Tianyang, Hauser, Amanda, Reynolds, Taeyonn, Anwar, Mohd
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8226148/
https://www.ncbi.nlm.nih.gov/pubmed/34188896
http://dx.doi.org/10.1007/s13755-021-00158-4
_version_ 1783712225099251712
author Liu, Yang
Whitfield, Christopher
Zhang, Tianyang
Hauser, Amanda
Reynolds, Taeyonn
Anwar, Mohd
author_facet Liu, Yang
Whitfield, Christopher
Zhang, Tianyang
Hauser, Amanda
Reynolds, Taeyonn
Anwar, Mohd
author_sort Liu, Yang
collection PubMed
description PURPOSE: It has been over a year since the first known case of coronavirus disease (COVID-19) emerged, yet the pandemic is far from over. To date, the coronavirus pandemic has infected over eighty million people and has killed more than 1.78 million worldwide. This study aims to explore “how useful is Reddit social media platform to surveil COVID-19 pandemic?” and “how do people’s concerns/behaviors change over the course of COVID-19 pandemic in North Carolina?”. The purpose of this study was to compare people’s thoughts, behavior changes, discussion topics, and the number of confirmed cases and deaths by applying natural language processing (NLP) to COVID-19 related data. METHODS: In this study, we collected COVID-19 related data from 18 subreddits of North Carolina from March to August 2020. Next, we applied methods from natural language processing and machine learning to analyze collected Reddit posts using feature engineering, topic modeling, custom named-entity recognition (NER), and BERT-based (Bidirectional Encoder Representations from Transformers) sentence clustering. Using these methods, we were able to glean people’s responses and their concerns about COVID-19 pandemic in North Carolina. RESULTS: We observed a positive change in attitudes towards masks for residents in North Carolina. The high-frequency words in all subreddit corpora for each of the COVID-19 mitigation strategy categories are: Distancing (DIST)—“social distance/distancing”, “lockdown”, and “work from home”; Disinfection (DIT)—“(hand) sanitizer/soap”, “hygiene”, and "wipe"; Personal Protective Equipment (PPE)—“mask/facemask(s)/face shield”, “n95(s)/kn95”, and “cloth/gown”; Symptoms (SYM)—“death”, “flu/influenza”, and “cough/coughed”; Testing (TEST)—“cases”, “(antibody) test”, and “test results (positive/negative)”. CONCLUSION: The findings in our study show that the use of Reddit data to monitor COVID-19 pandemic in North Carolina (NC) was effective. The study shows the utility of NLP methods (e.g. cosine similarity, Latent Dirichlet Allocation (LDA) topic modeling, custom NER and BERT-based sentence clustering) in discovering the change of the public's concerns/behaviors over the course of COVID-19 pandemic in NC using Reddit data. Moreover, the results show that social media data can be utilized to surveil the epidemic situation in a specific community.
format Online
Article
Text
id pubmed-8226148
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-82261482021-06-25 Monitoring COVID-19 pandemic through the lens of social media using natural language processing and machine learning Liu, Yang Whitfield, Christopher Zhang, Tianyang Hauser, Amanda Reynolds, Taeyonn Anwar, Mohd Health Inf Sci Syst Research PURPOSE: It has been over a year since the first known case of coronavirus disease (COVID-19) emerged, yet the pandemic is far from over. To date, the coronavirus pandemic has infected over eighty million people and has killed more than 1.78 million worldwide. This study aims to explore “how useful is Reddit social media platform to surveil COVID-19 pandemic?” and “how do people’s concerns/behaviors change over the course of COVID-19 pandemic in North Carolina?”. The purpose of this study was to compare people’s thoughts, behavior changes, discussion topics, and the number of confirmed cases and deaths by applying natural language processing (NLP) to COVID-19 related data. METHODS: In this study, we collected COVID-19 related data from 18 subreddits of North Carolina from March to August 2020. Next, we applied methods from natural language processing and machine learning to analyze collected Reddit posts using feature engineering, topic modeling, custom named-entity recognition (NER), and BERT-based (Bidirectional Encoder Representations from Transformers) sentence clustering. Using these methods, we were able to glean people’s responses and their concerns about COVID-19 pandemic in North Carolina. RESULTS: We observed a positive change in attitudes towards masks for residents in North Carolina. The high-frequency words in all subreddit corpora for each of the COVID-19 mitigation strategy categories are: Distancing (DIST)—“social distance/distancing”, “lockdown”, and “work from home”; Disinfection (DIT)—“(hand) sanitizer/soap”, “hygiene”, and "wipe"; Personal Protective Equipment (PPE)—“mask/facemask(s)/face shield”, “n95(s)/kn95”, and “cloth/gown”; Symptoms (SYM)—“death”, “flu/influenza”, and “cough/coughed”; Testing (TEST)—“cases”, “(antibody) test”, and “test results (positive/negative)”. CONCLUSION: The findings in our study show that the use of Reddit data to monitor COVID-19 pandemic in North Carolina (NC) was effective. The study shows the utility of NLP methods (e.g. cosine similarity, Latent Dirichlet Allocation (LDA) topic modeling, custom NER and BERT-based sentence clustering) in discovering the change of the public's concerns/behaviors over the course of COVID-19 pandemic in NC using Reddit data. Moreover, the results show that social media data can be utilized to surveil the epidemic situation in a specific community. Springer International Publishing 2021-06-25 /pmc/articles/PMC8226148/ /pubmed/34188896 http://dx.doi.org/10.1007/s13755-021-00158-4 Text en © The Author(s), under exclusive licence to Springer Nature Switzerland AG 2021
spellingShingle Research
Liu, Yang
Whitfield, Christopher
Zhang, Tianyang
Hauser, Amanda
Reynolds, Taeyonn
Anwar, Mohd
Monitoring COVID-19 pandemic through the lens of social media using natural language processing and machine learning
title Monitoring COVID-19 pandemic through the lens of social media using natural language processing and machine learning
title_full Monitoring COVID-19 pandemic through the lens of social media using natural language processing and machine learning
title_fullStr Monitoring COVID-19 pandemic through the lens of social media using natural language processing and machine learning
title_full_unstemmed Monitoring COVID-19 pandemic through the lens of social media using natural language processing and machine learning
title_short Monitoring COVID-19 pandemic through the lens of social media using natural language processing and machine learning
title_sort monitoring covid-19 pandemic through the lens of social media using natural language processing and machine learning
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8226148/
https://www.ncbi.nlm.nih.gov/pubmed/34188896
http://dx.doi.org/10.1007/s13755-021-00158-4
work_keys_str_mv AT liuyang monitoringcovid19pandemicthroughthelensofsocialmediausingnaturallanguageprocessingandmachinelearning
AT whitfieldchristopher monitoringcovid19pandemicthroughthelensofsocialmediausingnaturallanguageprocessingandmachinelearning
AT zhangtianyang monitoringcovid19pandemicthroughthelensofsocialmediausingnaturallanguageprocessingandmachinelearning
AT hauseramanda monitoringcovid19pandemicthroughthelensofsocialmediausingnaturallanguageprocessingandmachinelearning
AT reynoldstaeyonn monitoringcovid19pandemicthroughthelensofsocialmediausingnaturallanguageprocessingandmachinelearning
AT anwarmohd monitoringcovid19pandemicthroughthelensofsocialmediausingnaturallanguageprocessingandmachinelearning