Cargando…

SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning

The increase of social media usage across the globe has fueled efforts in digital epidemiology for mining valuable information such as medication use, adverse drug effects and reports of viral infections that directly and indirectly affect population health. Such specific information can, however, b...

Descripción completa

Detalles Bibliográficos
Autores principales: Magge, Arjun, Weissenbacher, Davy, O’Connor, Karen, Scotch, Matthew, Gonzalez-Hernandez, Graciela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7885933/
https://www.ncbi.nlm.nih.gov/pubmed/33594374
http://dx.doi.org/10.1101/2021.02.09.21251454
_version_ 1783651694603665408
author Magge, Arjun
Weissenbacher, Davy
O’Connor, Karen
Scotch, Matthew
Gonzalez-Hernandez, Graciela
author_facet Magge, Arjun
Weissenbacher, Davy
O’Connor, Karen
Scotch, Matthew
Gonzalez-Hernandez, Graciela
author_sort Magge, Arjun
collection PubMed
description The increase of social media usage across the globe has fueled efforts in digital epidemiology for mining valuable information such as medication use, adverse drug effects and reports of viral infections that directly and indirectly affect population health. Such specific information can, however, be scarce, hard to find, and mostly expressed in very colloquial language. In this work, we focus on a fundamental problem that enables social media mining for disease monitoring. We present and make available SEED, a natural language processing approach to detect symptom and disease mentions from social media data obtained from platforms such as Twitter and DailyStrength and to normalize them into UMLS terminology. Using multi-corpus training and deep learning models, the tool achieves an overall F1 score of 0.86 and 0.72 on DailyStrength and balanced Twitter datasets, significantly improving over previous approaches on the same datasets. We apply the tool on Twitter posts that report COVID19 symptoms, particularly to quantify whether the SEED system can extract symptoms absent in the training data. The study results also draw attention to the potential of multi-corpus training for performance improvements and the need for continuous training on newly obtained data for consistent performance amidst the ever-changing nature of the social media vocabulary.
format Online
Article
Text
id pubmed-7885933
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-78859332021-02-17 SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning Magge, Arjun Weissenbacher, Davy O’Connor, Karen Scotch, Matthew Gonzalez-Hernandez, Graciela medRxiv Article The increase of social media usage across the globe has fueled efforts in digital epidemiology for mining valuable information such as medication use, adverse drug effects and reports of viral infections that directly and indirectly affect population health. Such specific information can, however, be scarce, hard to find, and mostly expressed in very colloquial language. In this work, we focus on a fundamental problem that enables social media mining for disease monitoring. We present and make available SEED, a natural language processing approach to detect symptom and disease mentions from social media data obtained from platforms such as Twitter and DailyStrength and to normalize them into UMLS terminology. Using multi-corpus training and deep learning models, the tool achieves an overall F1 score of 0.86 and 0.72 on DailyStrength and balanced Twitter datasets, significantly improving over previous approaches on the same datasets. We apply the tool on Twitter posts that report COVID19 symptoms, particularly to quantify whether the SEED system can extract symptoms absent in the training data. The study results also draw attention to the potential of multi-corpus training for performance improvements and the need for continuous training on newly obtained data for consistent performance amidst the ever-changing nature of the social media vocabulary. Cold Spring Harbor Laboratory 2022-03-21 /pmc/articles/PMC7885933/ /pubmed/33594374 http://dx.doi.org/10.1101/2021.02.09.21251454 Text en https://creativecommons.org/licenses/by-nc/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Magge, Arjun
Weissenbacher, Davy
O’Connor, Karen
Scotch, Matthew
Gonzalez-Hernandez, Graciela
SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning
title SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning
title_full SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning
title_fullStr SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning
title_full_unstemmed SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning
title_short SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning
title_sort seed: symptom extraction from english social media posts using deep learning and transfer learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7885933/
https://www.ncbi.nlm.nih.gov/pubmed/33594374
http://dx.doi.org/10.1101/2021.02.09.21251454
work_keys_str_mv AT maggearjun seedsymptomextractionfromenglishsocialmediapostsusingdeeplearningandtransferlearning
AT weissenbacherdavy seedsymptomextractionfromenglishsocialmediapostsusingdeeplearningandtransferlearning
AT oconnorkaren seedsymptomextractionfromenglishsocialmediapostsusingdeeplearningandtransferlearning
AT scotchmatthew seedsymptomextractionfromenglishsocialmediapostsusingdeeplearningandtransferlearning
AT gonzalezhernandezgraciela seedsymptomextractionfromenglishsocialmediapostsusingdeeplearningandtransferlearning