Cargando…

An automated method to enrich consumer health vocabularies using GloVe word embeddings and an auxiliary lexical resource

BACKGROUND: Clear language makes communication easier between any two parties. A layman may have difficulty communicating with a professional due to not understanding the specialized terms common to the domain. In healthcare, it is rare to find a layman knowledgeable in medical terminology which can...

Descripción completa

Detalles Bibliográficos
Autores principales: Ibrahim, Mohammed, Gauch, Susan, Salman, Omar, Alqahtani, Mohammed
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8371999/
https://www.ncbi.nlm.nih.gov/pubmed/34458573
http://dx.doi.org/10.7717/peerj-cs.668
_version_ 1783739750564233216
author Ibrahim, Mohammed
Gauch, Susan
Salman, Omar
Alqahtani, Mohammed
author_facet Ibrahim, Mohammed
Gauch, Susan
Salman, Omar
Alqahtani, Mohammed
author_sort Ibrahim, Mohammed
collection PubMed
description BACKGROUND: Clear language makes communication easier between any two parties. A layman may have difficulty communicating with a professional due to not understanding the specialized terms common to the domain. In healthcare, it is rare to find a layman knowledgeable in medical terminology which can lead to poor understanding of their condition and/or treatment. To bridge this gap, several professional vocabularies and ontologies have been created to map laymen medical terms to professional medical terms and vice versa. OBJECTIVE: Many of the presented vocabularies are built manually or semi-automatically requiring large investments of time and human effort and consequently the slow growth of these vocabularies. In this paper, we present an automatic method to enrich laymen’s vocabularies that has the benefit of being able to be applied to vocabularies in any domain. METHODS: Our entirely automatic approach uses machine learning, specifically Global Vectors for Word Embeddings (GloVe), on a corpus collected from a social media healthcare platform to extend and enhance consumer health vocabularies. Our approach further improves the consumer health vocabularies by incorporating synonyms and hyponyms from the WordNet ontology. The basic GloVe and our novel algorithms incorporating WordNet were evaluated using two laymen datasets from the National Library of Medicine (NLM), Open-Access Consumer Health Vocabulary (OAC CHV) and MedlinePlus Healthcare Vocabulary. RESULTS: The results show that GloVe was able to find new laymen terms with an F-score of 48.44%. Furthermore, our enhanced GloVe approach outperformed basic GloVe with an average F-score of 61%, a relative improvement of 25%. Furthermore, the enhanced GloVe showed a statistical significance over the two ground truth datasets with P < 0.001. CONCLUSIONS: This paper presents an automatic approach to enrich consumer health vocabularies using the GloVe word embeddings and an auxiliary lexical source, WordNet. Our approach was evaluated used healthcare text downloaded from MedHelp.org, a healthcare social media platform using two standard laymen vocabularies, OAC CHV, and MedlinePlus. We used the WordNet ontology to expand the healthcare corpus by including synonyms, hyponyms, and hypernyms for each layman term occurrence in the corpus. Given a seed term selected from a concept in the ontology, we measured our algorithms’ ability to automatically extract synonyms for those terms that appeared in the ground truth concept. We found that enhanced GloVe outperformed GloVe with a relative improvement of 25% in the F-score.
format Online
Article
Text
id pubmed-8371999
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-83719992021-08-26 An automated method to enrich consumer health vocabularies using GloVe word embeddings and an auxiliary lexical resource Ibrahim, Mohammed Gauch, Susan Salman, Omar Alqahtani, Mohammed PeerJ Comput Sci Bioinformatics BACKGROUND: Clear language makes communication easier between any two parties. A layman may have difficulty communicating with a professional due to not understanding the specialized terms common to the domain. In healthcare, it is rare to find a layman knowledgeable in medical terminology which can lead to poor understanding of their condition and/or treatment. To bridge this gap, several professional vocabularies and ontologies have been created to map laymen medical terms to professional medical terms and vice versa. OBJECTIVE: Many of the presented vocabularies are built manually or semi-automatically requiring large investments of time and human effort and consequently the slow growth of these vocabularies. In this paper, we present an automatic method to enrich laymen’s vocabularies that has the benefit of being able to be applied to vocabularies in any domain. METHODS: Our entirely automatic approach uses machine learning, specifically Global Vectors for Word Embeddings (GloVe), on a corpus collected from a social media healthcare platform to extend and enhance consumer health vocabularies. Our approach further improves the consumer health vocabularies by incorporating synonyms and hyponyms from the WordNet ontology. The basic GloVe and our novel algorithms incorporating WordNet were evaluated using two laymen datasets from the National Library of Medicine (NLM), Open-Access Consumer Health Vocabulary (OAC CHV) and MedlinePlus Healthcare Vocabulary. RESULTS: The results show that GloVe was able to find new laymen terms with an F-score of 48.44%. Furthermore, our enhanced GloVe approach outperformed basic GloVe with an average F-score of 61%, a relative improvement of 25%. Furthermore, the enhanced GloVe showed a statistical significance over the two ground truth datasets with P < 0.001. CONCLUSIONS: This paper presents an automatic approach to enrich consumer health vocabularies using the GloVe word embeddings and an auxiliary lexical source, WordNet. Our approach was evaluated used healthcare text downloaded from MedHelp.org, a healthcare social media platform using two standard laymen vocabularies, OAC CHV, and MedlinePlus. We used the WordNet ontology to expand the healthcare corpus by including synonyms, hyponyms, and hypernyms for each layman term occurrence in the corpus. Given a seed term selected from a concept in the ontology, we measured our algorithms’ ability to automatically extract synonyms for those terms that appeared in the ground truth concept. We found that enhanced GloVe outperformed GloVe with a relative improvement of 25% in the F-score. PeerJ Inc. 2021-08-09 /pmc/articles/PMC8371999/ /pubmed/34458573 http://dx.doi.org/10.7717/peerj-cs.668 Text en © 2021 Ibrahim et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Ibrahim, Mohammed
Gauch, Susan
Salman, Omar
Alqahtani, Mohammed
An automated method to enrich consumer health vocabularies using GloVe word embeddings and an auxiliary lexical resource
title An automated method to enrich consumer health vocabularies using GloVe word embeddings and an auxiliary lexical resource
title_full An automated method to enrich consumer health vocabularies using GloVe word embeddings and an auxiliary lexical resource
title_fullStr An automated method to enrich consumer health vocabularies using GloVe word embeddings and an auxiliary lexical resource
title_full_unstemmed An automated method to enrich consumer health vocabularies using GloVe word embeddings and an auxiliary lexical resource
title_short An automated method to enrich consumer health vocabularies using GloVe word embeddings and an auxiliary lexical resource
title_sort automated method to enrich consumer health vocabularies using glove word embeddings and an auxiliary lexical resource
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8371999/
https://www.ncbi.nlm.nih.gov/pubmed/34458573
http://dx.doi.org/10.7717/peerj-cs.668
work_keys_str_mv AT ibrahimmohammed anautomatedmethodtoenrichconsumerhealthvocabulariesusingglovewordembeddingsandanauxiliarylexicalresource
AT gauchsusan anautomatedmethodtoenrichconsumerhealthvocabulariesusingglovewordembeddingsandanauxiliarylexicalresource
AT salmanomar anautomatedmethodtoenrichconsumerhealthvocabulariesusingglovewordembeddingsandanauxiliarylexicalresource
AT alqahtanimohammed anautomatedmethodtoenrichconsumerhealthvocabulariesusingglovewordembeddingsandanauxiliarylexicalresource
AT ibrahimmohammed automatedmethodtoenrichconsumerhealthvocabulariesusingglovewordembeddingsandanauxiliarylexicalresource
AT gauchsusan automatedmethodtoenrichconsumerhealthvocabulariesusingglovewordembeddingsandanauxiliarylexicalresource
AT salmanomar automatedmethodtoenrichconsumerhealthvocabulariesusingglovewordembeddingsandanauxiliarylexicalresource
AT alqahtanimohammed automatedmethodtoenrichconsumerhealthvocabulariesusingglovewordembeddingsandanauxiliarylexicalresource