Cargando…

Mining of Textual Health Information from Reddit: Analysis of Chronic Diseases With Extracted Entities and Their Relations

BACKGROUND: Social media platforms constitute a rich data source for natural language processing tasks such as named entity recognition, relation extraction, and sentiment analysis. In particular, social media platforms about health provide a different insight into patient’s experiences with disease...

Descripción completa

Detalles Bibliográficos
Autores principales: Foufi, Vasiliki, Timakum, Tatsawan, Gaudet-Blavignac, Christophe, Lovis, Christian, Song, Min
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6595941/
https://www.ncbi.nlm.nih.gov/pubmed/31199327
http://dx.doi.org/10.2196/12876
_version_ 1783430483729711104
author Foufi, Vasiliki
Timakum, Tatsawan
Gaudet-Blavignac, Christophe
Lovis, Christian
Song, Min
author_facet Foufi, Vasiliki
Timakum, Tatsawan
Gaudet-Blavignac, Christophe
Lovis, Christian
Song, Min
author_sort Foufi, Vasiliki
collection PubMed
description BACKGROUND: Social media platforms constitute a rich data source for natural language processing tasks such as named entity recognition, relation extraction, and sentiment analysis. In particular, social media platforms about health provide a different insight into patient’s experiences with diseases and treatment than those found in the scientific literature. OBJECTIVE: This paper aimed to report a study of entities related to chronic diseases and their relation in user-generated text posts. The major focus of our research is the study of biomedical entities found in health social media platforms and their relations and the way people suffering from chronic diseases express themselves. METHODS: We collected a corpus of 17,624 text posts from disease-specific subreddits of the social news and discussion website Reddit. For entity and relation extraction from this corpus, we employed the PKDE4J tool developed by Song et al (2015). PKDE4J is a text mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. RESULTS: Using PKDE4J, we extracted 2 types of entities and relations: biomedical entities and relations and subject-predicate-object entity relations. In total, 82,138 entities and 30,341 relation pairs were extracted from the Reddit dataset. The most highly mentioned entities were those related to oncological disease (2884 occurrences of cancer) and asthma (2180 occurrences). The relation pair anatomy-disease was the most frequent (5550 occurrences), the highest frequent entities in this pair being cancer and lymph. The manual validation of the extracted entities showed a very good performance of the system at the entity extraction task (3682/5151, 71.48% extracted entities were correctly labeled). CONCLUSIONS: This study showed that people are eager to share their personal experience with chronic diseases on social media platforms despite possible privacy and security issues. The results reported in this paper are promising and demonstrate the need for more in-depth studies on the way patients with chronic diseases express themselves on social media platforms.
format Online
Article
Text
id pubmed-6595941
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-65959412019-07-17 Mining of Textual Health Information from Reddit: Analysis of Chronic Diseases With Extracted Entities and Their Relations Foufi, Vasiliki Timakum, Tatsawan Gaudet-Blavignac, Christophe Lovis, Christian Song, Min J Med Internet Res Original Paper BACKGROUND: Social media platforms constitute a rich data source for natural language processing tasks such as named entity recognition, relation extraction, and sentiment analysis. In particular, social media platforms about health provide a different insight into patient’s experiences with diseases and treatment than those found in the scientific literature. OBJECTIVE: This paper aimed to report a study of entities related to chronic diseases and their relation in user-generated text posts. The major focus of our research is the study of biomedical entities found in health social media platforms and their relations and the way people suffering from chronic diseases express themselves. METHODS: We collected a corpus of 17,624 text posts from disease-specific subreddits of the social news and discussion website Reddit. For entity and relation extraction from this corpus, we employed the PKDE4J tool developed by Song et al (2015). PKDE4J is a text mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. RESULTS: Using PKDE4J, we extracted 2 types of entities and relations: biomedical entities and relations and subject-predicate-object entity relations. In total, 82,138 entities and 30,341 relation pairs were extracted from the Reddit dataset. The most highly mentioned entities were those related to oncological disease (2884 occurrences of cancer) and asthma (2180 occurrences). The relation pair anatomy-disease was the most frequent (5550 occurrences), the highest frequent entities in this pair being cancer and lymph. The manual validation of the extracted entities showed a very good performance of the system at the entity extraction task (3682/5151, 71.48% extracted entities were correctly labeled). CONCLUSIONS: This study showed that people are eager to share their personal experience with chronic diseases on social media platforms despite possible privacy and security issues. The results reported in this paper are promising and demonstrate the need for more in-depth studies on the way patients with chronic diseases express themselves on social media platforms. JMIR Publications 2019-06-13 /pmc/articles/PMC6595941/ /pubmed/31199327 http://dx.doi.org/10.2196/12876 Text en ©Vasiliki Foufi, Tatsawan Timakum, Christophe Gaudet-Blavignac, Christian Lovis, Min Song. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 13.06.2019. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Foufi, Vasiliki
Timakum, Tatsawan
Gaudet-Blavignac, Christophe
Lovis, Christian
Song, Min
Mining of Textual Health Information from Reddit: Analysis of Chronic Diseases With Extracted Entities and Their Relations
title Mining of Textual Health Information from Reddit: Analysis of Chronic Diseases With Extracted Entities and Their Relations
title_full Mining of Textual Health Information from Reddit: Analysis of Chronic Diseases With Extracted Entities and Their Relations
title_fullStr Mining of Textual Health Information from Reddit: Analysis of Chronic Diseases With Extracted Entities and Their Relations
title_full_unstemmed Mining of Textual Health Information from Reddit: Analysis of Chronic Diseases With Extracted Entities and Their Relations
title_short Mining of Textual Health Information from Reddit: Analysis of Chronic Diseases With Extracted Entities and Their Relations
title_sort mining of textual health information from reddit: analysis of chronic diseases with extracted entities and their relations
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6595941/
https://www.ncbi.nlm.nih.gov/pubmed/31199327
http://dx.doi.org/10.2196/12876
work_keys_str_mv AT foufivasiliki miningoftextualhealthinformationfromredditanalysisofchronicdiseaseswithextractedentitiesandtheirrelations
AT timakumtatsawan miningoftextualhealthinformationfromredditanalysisofchronicdiseaseswithextractedentitiesandtheirrelations
AT gaudetblavignacchristophe miningoftextualhealthinformationfromredditanalysisofchronicdiseaseswithextractedentitiesandtheirrelations
AT lovischristian miningoftextualhealthinformationfromredditanalysisofchronicdiseaseswithextractedentitiesandtheirrelations
AT songmin miningoftextualhealthinformationfromredditanalysisofchronicdiseaseswithextractedentitiesandtheirrelations