Cargando…
SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning
The increase of social media usage across the globe has fueled efforts in digital epidemiology for mining valuable information such as medication use, adverse drug effects and reports of viral infections that directly and indirectly affect population health. Such specific information can, however, b...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7885933/ https://www.ncbi.nlm.nih.gov/pubmed/33594374 http://dx.doi.org/10.1101/2021.02.09.21251454 |
_version_ | 1783651694603665408 |
---|---|
author | Magge, Arjun Weissenbacher, Davy O’Connor, Karen Scotch, Matthew Gonzalez-Hernandez, Graciela |
author_facet | Magge, Arjun Weissenbacher, Davy O’Connor, Karen Scotch, Matthew Gonzalez-Hernandez, Graciela |
author_sort | Magge, Arjun |
collection | PubMed |
description | The increase of social media usage across the globe has fueled efforts in digital epidemiology for mining valuable information such as medication use, adverse drug effects and reports of viral infections that directly and indirectly affect population health. Such specific information can, however, be scarce, hard to find, and mostly expressed in very colloquial language. In this work, we focus on a fundamental problem that enables social media mining for disease monitoring. We present and make available SEED, a natural language processing approach to detect symptom and disease mentions from social media data obtained from platforms such as Twitter and DailyStrength and to normalize them into UMLS terminology. Using multi-corpus training and deep learning models, the tool achieves an overall F1 score of 0.86 and 0.72 on DailyStrength and balanced Twitter datasets, significantly improving over previous approaches on the same datasets. We apply the tool on Twitter posts that report COVID19 symptoms, particularly to quantify whether the SEED system can extract symptoms absent in the training data. The study results also draw attention to the potential of multi-corpus training for performance improvements and the need for continuous training on newly obtained data for consistent performance amidst the ever-changing nature of the social media vocabulary. |
format | Online Article Text |
id | pubmed-7885933 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-78859332021-02-17 SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning Magge, Arjun Weissenbacher, Davy O’Connor, Karen Scotch, Matthew Gonzalez-Hernandez, Graciela medRxiv Article The increase of social media usage across the globe has fueled efforts in digital epidemiology for mining valuable information such as medication use, adverse drug effects and reports of viral infections that directly and indirectly affect population health. Such specific information can, however, be scarce, hard to find, and mostly expressed in very colloquial language. In this work, we focus on a fundamental problem that enables social media mining for disease monitoring. We present and make available SEED, a natural language processing approach to detect symptom and disease mentions from social media data obtained from platforms such as Twitter and DailyStrength and to normalize them into UMLS terminology. Using multi-corpus training and deep learning models, the tool achieves an overall F1 score of 0.86 and 0.72 on DailyStrength and balanced Twitter datasets, significantly improving over previous approaches on the same datasets. We apply the tool on Twitter posts that report COVID19 symptoms, particularly to quantify whether the SEED system can extract symptoms absent in the training data. The study results also draw attention to the potential of multi-corpus training for performance improvements and the need for continuous training on newly obtained data for consistent performance amidst the ever-changing nature of the social media vocabulary. Cold Spring Harbor Laboratory 2022-03-21 /pmc/articles/PMC7885933/ /pubmed/33594374 http://dx.doi.org/10.1101/2021.02.09.21251454 Text en https://creativecommons.org/licenses/by-nc/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Magge, Arjun Weissenbacher, Davy O’Connor, Karen Scotch, Matthew Gonzalez-Hernandez, Graciela SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning |
title | SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning |
title_full | SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning |
title_fullStr | SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning |
title_full_unstemmed | SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning |
title_short | SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning |
title_sort | seed: symptom extraction from english social media posts using deep learning and transfer learning |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7885933/ https://www.ncbi.nlm.nih.gov/pubmed/33594374 http://dx.doi.org/10.1101/2021.02.09.21251454 |
work_keys_str_mv | AT maggearjun seedsymptomextractionfromenglishsocialmediapostsusingdeeplearningandtransferlearning AT weissenbacherdavy seedsymptomextractionfromenglishsocialmediapostsusingdeeplearningandtransferlearning AT oconnorkaren seedsymptomextractionfromenglishsocialmediapostsusingdeeplearningandtransferlearning AT scotchmatthew seedsymptomextractionfromenglishsocialmediapostsusingdeeplearningandtransferlearning AT gonzalezhernandezgraciela seedsymptomextractionfromenglishsocialmediapostsusingdeeplearningandtransferlearning |