Cargando…
Consumers’ Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites
BACKGROUND: The widely known terminology gap between health professionals and health consumers hinders effective information seeking for consumers. OBJECTIVE: The aim of this study was to better understand consumers’ usage of medical concepts by evaluating the coverage of concepts and semantic types...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5146325/ https://www.ncbi.nlm.nih.gov/pubmed/27884812 http://dx.doi.org/10.2196/medinform.5748 |
_version_ | 1782473463632494592 |
---|---|
author | Park, Min Sook He, Zhe Chen, Zhiwei Oh, Sanghee Bian, Jiang |
author_facet | Park, Min Sook He, Zhe Chen, Zhiwei Oh, Sanghee Bian, Jiang |
author_sort | Park, Min Sook |
collection | PubMed |
description | BACKGROUND: The widely known terminology gap between health professionals and health consumers hinders effective information seeking for consumers. OBJECTIVE: The aim of this study was to better understand consumers’ usage of medical concepts by evaluating the coverage of concepts and semantic types of the Unified Medical Language System (UMLS) on diabetes-related postings in 2 types of social media: blogs and social question and answer (Q&A). METHODS: We collected 2 types of social media data: (1) a total of 3711 blogs tagged with “diabetes” on Tumblr posted between February and October 2015; and (2) a total of 58,422 questions and associated answers posted between 2009 and 2014 in the diabetes category of Yahoo! Answers. We analyzed the datasets using a widely adopted biomedical text processing framework Apache cTAKES and its extension YTEX. First, we applied the named entity recognition (NER) method implemented in YTEX to identify UMLS concepts in the datasets. We then analyzed the coverage and the popularity of concepts in the UMLS source vocabularies across the 2 datasets (ie, blogs and social Q&A). Further, we conducted a concept-level comparative coverage analysis between SNOMED Clinical Terms (SNOMED CT) and Open-Access Collaborative Consumer Health Vocabulary (OAC CHV)—the top 2 UMLS source vocabularies that have the most coverage on our datasets. We also analyzed the UMLS semantic types that were frequently observed in our datasets. RESULTS: We identified 2415 UMLS concepts from blog postings, 6452 UMLS concepts from social Q&A questions, and 10,378 UMLS concepts from the answers. The medical concepts identified in the blogs can be covered by 56 source vocabularies in the UMLS, while those in questions and answers can be covered by 58 source vocabularies. SNOMED CT was the dominant vocabulary in terms of coverage across all the datasets, ranging from 84.9% to 95.9%. It was followed by OAC CHV (between 73.5% and 80.0%) and Metathesaurus Names (MTH) (between 55.7% and 73.5%). All of the social media datasets shared frequent semantic types such as “Amino Acid, Peptide, or Protein,” “Body Part, Organ, or Organ Component,” and “Disease or Syndrome.” CONCLUSIONS: Although the 3 social media datasets vary greatly in size, they exhibited similar conceptual coverage among UMLS source vocabularies and the identified concepts showed similar semantic type distributions. As such, concepts that are both frequently used by consumers and also found in professional vocabularies such as SNOMED CT can be suggested to OAC CHV to improve its coverage. |
format | Online Article Text |
id | pubmed-5146325 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-51463252016-12-20 Consumers’ Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites Park, Min Sook He, Zhe Chen, Zhiwei Oh, Sanghee Bian, Jiang JMIR Med Inform Original Paper BACKGROUND: The widely known terminology gap between health professionals and health consumers hinders effective information seeking for consumers. OBJECTIVE: The aim of this study was to better understand consumers’ usage of medical concepts by evaluating the coverage of concepts and semantic types of the Unified Medical Language System (UMLS) on diabetes-related postings in 2 types of social media: blogs and social question and answer (Q&A). METHODS: We collected 2 types of social media data: (1) a total of 3711 blogs tagged with “diabetes” on Tumblr posted between February and October 2015; and (2) a total of 58,422 questions and associated answers posted between 2009 and 2014 in the diabetes category of Yahoo! Answers. We analyzed the datasets using a widely adopted biomedical text processing framework Apache cTAKES and its extension YTEX. First, we applied the named entity recognition (NER) method implemented in YTEX to identify UMLS concepts in the datasets. We then analyzed the coverage and the popularity of concepts in the UMLS source vocabularies across the 2 datasets (ie, blogs and social Q&A). Further, we conducted a concept-level comparative coverage analysis between SNOMED Clinical Terms (SNOMED CT) and Open-Access Collaborative Consumer Health Vocabulary (OAC CHV)—the top 2 UMLS source vocabularies that have the most coverage on our datasets. We also analyzed the UMLS semantic types that were frequently observed in our datasets. RESULTS: We identified 2415 UMLS concepts from blog postings, 6452 UMLS concepts from social Q&A questions, and 10,378 UMLS concepts from the answers. The medical concepts identified in the blogs can be covered by 56 source vocabularies in the UMLS, while those in questions and answers can be covered by 58 source vocabularies. SNOMED CT was the dominant vocabulary in terms of coverage across all the datasets, ranging from 84.9% to 95.9%. It was followed by OAC CHV (between 73.5% and 80.0%) and Metathesaurus Names (MTH) (between 55.7% and 73.5%). All of the social media datasets shared frequent semantic types such as “Amino Acid, Peptide, or Protein,” “Body Part, Organ, or Organ Component,” and “Disease or Syndrome.” CONCLUSIONS: Although the 3 social media datasets vary greatly in size, they exhibited similar conceptual coverage among UMLS source vocabularies and the identified concepts showed similar semantic type distributions. As such, concepts that are both frequently used by consumers and also found in professional vocabularies such as SNOMED CT can be suggested to OAC CHV to improve its coverage. JMIR Publications 2016-11-24 /pmc/articles/PMC5146325/ /pubmed/27884812 http://dx.doi.org/10.2196/medinform.5748 Text en ©Min Sook Park, Zhe He, Zhiwei Chen, Sanghee Oh, Jiang Bian. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 24.11.2016. https://creativecommons.org/licenses/by/2.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/ (https://creativecommons.org/licenses/by/2.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Park, Min Sook He, Zhe Chen, Zhiwei Oh, Sanghee Bian, Jiang Consumers’ Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites |
title | Consumers’ Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites |
title_full | Consumers’ Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites |
title_fullStr | Consumers’ Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites |
title_full_unstemmed | Consumers’ Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites |
title_short | Consumers’ Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites |
title_sort | consumers’ use of umls concepts on social media: diabetes-related textual data analysis in blog and social q&a sites |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5146325/ https://www.ncbi.nlm.nih.gov/pubmed/27884812 http://dx.doi.org/10.2196/medinform.5748 |
work_keys_str_mv | AT parkminsook consumersuseofumlsconceptsonsocialmediadiabetesrelatedtextualdataanalysisinblogandsocialqasites AT hezhe consumersuseofumlsconceptsonsocialmediadiabetesrelatedtextualdataanalysisinblogandsocialqasites AT chenzhiwei consumersuseofumlsconceptsonsocialmediadiabetesrelatedtextualdataanalysisinblogandsocialqasites AT ohsanghee consumersuseofumlsconceptsonsocialmediadiabetesrelatedtextualdataanalysisinblogandsocialqasites AT bianjiang consumersuseofumlsconceptsonsocialmediadiabetesrelatedtextualdataanalysisinblogandsocialqasites |