Cargando…

Consumers’ Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites

BACKGROUND: The widely known terminology gap between health professionals and health consumers hinders effective information seeking for consumers. OBJECTIVE: The aim of this study was to better understand consumers’ usage of medical concepts by evaluating the coverage of concepts and semantic types...

Descripción completa

Detalles Bibliográficos
Autores principales: Park, Min Sook, He, Zhe, Chen, Zhiwei, Oh, Sanghee, Bian, Jiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5146325/
https://www.ncbi.nlm.nih.gov/pubmed/27884812
http://dx.doi.org/10.2196/medinform.5748
_version_ 1782473463632494592
author Park, Min Sook
He, Zhe
Chen, Zhiwei
Oh, Sanghee
Bian, Jiang
author_facet Park, Min Sook
He, Zhe
Chen, Zhiwei
Oh, Sanghee
Bian, Jiang
author_sort Park, Min Sook
collection PubMed
description BACKGROUND: The widely known terminology gap between health professionals and health consumers hinders effective information seeking for consumers. OBJECTIVE: The aim of this study was to better understand consumers’ usage of medical concepts by evaluating the coverage of concepts and semantic types of the Unified Medical Language System (UMLS) on diabetes-related postings in 2 types of social media: blogs and social question and answer (Q&A). METHODS: We collected 2 types of social media data: (1) a total of 3711 blogs tagged with “diabetes” on Tumblr posted between February and October 2015; and (2) a total of 58,422 questions and associated answers posted between 2009 and 2014 in the diabetes category of Yahoo! Answers. We analyzed the datasets using a widely adopted biomedical text processing framework Apache cTAKES and its extension YTEX. First, we applied the named entity recognition (NER) method implemented in YTEX to identify UMLS concepts in the datasets. We then analyzed the coverage and the popularity of concepts in the UMLS source vocabularies across the 2 datasets (ie, blogs and social Q&A). Further, we conducted a concept-level comparative coverage analysis between SNOMED Clinical Terms (SNOMED CT) and Open-Access Collaborative Consumer Health Vocabulary (OAC CHV)—the top 2 UMLS source vocabularies that have the most coverage on our datasets. We also analyzed the UMLS semantic types that were frequently observed in our datasets. RESULTS: We identified 2415 UMLS concepts from blog postings, 6452 UMLS concepts from social Q&A questions, and 10,378 UMLS concepts from the answers. The medical concepts identified in the blogs can be covered by 56 source vocabularies in the UMLS, while those in questions and answers can be covered by 58 source vocabularies. SNOMED CT was the dominant vocabulary in terms of coverage across all the datasets, ranging from 84.9% to 95.9%. It was followed by OAC CHV (between 73.5% and 80.0%) and Metathesaurus Names (MTH) (between 55.7% and 73.5%). All of the social media datasets shared frequent semantic types such as “Amino Acid, Peptide, or Protein,” “Body Part, Organ, or Organ Component,” and “Disease or Syndrome.” CONCLUSIONS: Although the 3 social media datasets vary greatly in size, they exhibited similar conceptual coverage among UMLS source vocabularies and the identified concepts showed similar semantic type distributions. As such, concepts that are both frequently used by consumers and also found in professional vocabularies such as SNOMED CT can be suggested to OAC CHV to improve its coverage.
format Online
Article
Text
id pubmed-5146325
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-51463252016-12-20 Consumers’ Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites Park, Min Sook He, Zhe Chen, Zhiwei Oh, Sanghee Bian, Jiang JMIR Med Inform Original Paper BACKGROUND: The widely known terminology gap between health professionals and health consumers hinders effective information seeking for consumers. OBJECTIVE: The aim of this study was to better understand consumers’ usage of medical concepts by evaluating the coverage of concepts and semantic types of the Unified Medical Language System (UMLS) on diabetes-related postings in 2 types of social media: blogs and social question and answer (Q&A). METHODS: We collected 2 types of social media data: (1) a total of 3711 blogs tagged with “diabetes” on Tumblr posted between February and October 2015; and (2) a total of 58,422 questions and associated answers posted between 2009 and 2014 in the diabetes category of Yahoo! Answers. We analyzed the datasets using a widely adopted biomedical text processing framework Apache cTAKES and its extension YTEX. First, we applied the named entity recognition (NER) method implemented in YTEX to identify UMLS concepts in the datasets. We then analyzed the coverage and the popularity of concepts in the UMLS source vocabularies across the 2 datasets (ie, blogs and social Q&A). Further, we conducted a concept-level comparative coverage analysis between SNOMED Clinical Terms (SNOMED CT) and Open-Access Collaborative Consumer Health Vocabulary (OAC CHV)—the top 2 UMLS source vocabularies that have the most coverage on our datasets. We also analyzed the UMLS semantic types that were frequently observed in our datasets. RESULTS: We identified 2415 UMLS concepts from blog postings, 6452 UMLS concepts from social Q&A questions, and 10,378 UMLS concepts from the answers. The medical concepts identified in the blogs can be covered by 56 source vocabularies in the UMLS, while those in questions and answers can be covered by 58 source vocabularies. SNOMED CT was the dominant vocabulary in terms of coverage across all the datasets, ranging from 84.9% to 95.9%. It was followed by OAC CHV (between 73.5% and 80.0%) and Metathesaurus Names (MTH) (between 55.7% and 73.5%). All of the social media datasets shared frequent semantic types such as “Amino Acid, Peptide, or Protein,” “Body Part, Organ, or Organ Component,” and “Disease or Syndrome.” CONCLUSIONS: Although the 3 social media datasets vary greatly in size, they exhibited similar conceptual coverage among UMLS source vocabularies and the identified concepts showed similar semantic type distributions. As such, concepts that are both frequently used by consumers and also found in professional vocabularies such as SNOMED CT can be suggested to OAC CHV to improve its coverage. JMIR Publications 2016-11-24 /pmc/articles/PMC5146325/ /pubmed/27884812 http://dx.doi.org/10.2196/medinform.5748 Text en ©Min Sook Park, Zhe He, Zhiwei Chen, Sanghee Oh, Jiang Bian. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 24.11.2016. https://creativecommons.org/licenses/by/2.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/ (https://creativecommons.org/licenses/by/2.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Park, Min Sook
He, Zhe
Chen, Zhiwei
Oh, Sanghee
Bian, Jiang
Consumers’ Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites
title Consumers’ Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites
title_full Consumers’ Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites
title_fullStr Consumers’ Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites
title_full_unstemmed Consumers’ Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites
title_short Consumers’ Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites
title_sort consumers’ use of umls concepts on social media: diabetes-related textual data analysis in blog and social q&a sites
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5146325/
https://www.ncbi.nlm.nih.gov/pubmed/27884812
http://dx.doi.org/10.2196/medinform.5748
work_keys_str_mv AT parkminsook consumersuseofumlsconceptsonsocialmediadiabetesrelatedtextualdataanalysisinblogandsocialqasites
AT hezhe consumersuseofumlsconceptsonsocialmediadiabetesrelatedtextualdataanalysisinblogandsocialqasites
AT chenzhiwei consumersuseofumlsconceptsonsocialmediadiabetesrelatedtextualdataanalysisinblogandsocialqasites
AT ohsanghee consumersuseofumlsconceptsonsocialmediadiabetesrelatedtextualdataanalysisinblogandsocialqasites
AT bianjiang consumersuseofumlsconceptsonsocialmediadiabetesrelatedtextualdataanalysisinblogandsocialqasites