Cargando…

Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts

OBJECTIVE: The study sought to explore the use of deep learning techniques to measure the semantic relatedness between Unified Medical Language System (UMLS) concepts. MATERIALS AND METHODS: Concept sentence embeddings were generated for UMLS concepts by applying the word embedding models BioWordVec...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mao, Yuqing, Fung, Kin Wah
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Research and Applications
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7566472/ https://www.ncbi.nlm.nih.gov/pubmed/33029614 http://dx.doi.org/10.1093/jamia/ocaa136

_version_	1783596139363172352
author	Mao, Yuqing Fung, Kin Wah
author_facet	Mao, Yuqing Fung, Kin Wah
author_sort	Mao, Yuqing
collection	PubMed
description	OBJECTIVE: The study sought to explore the use of deep learning techniques to measure the semantic relatedness between Unified Medical Language System (UMLS) concepts. MATERIALS AND METHODS: Concept sentence embeddings were generated for UMLS concepts by applying the word embedding models BioWordVec and various flavors of BERT to concept sentences formed by concatenating UMLS terms. Graph embeddings were generated by the graph convolutional networks and 4 knowledge graph embedding models, using graphs built from UMLS hierarchical relations. Semantic relatedness was measured by the cosine between the concepts’ embedding vectors. Performance was compared with 2 traditional path-based (shortest path and Leacock-Chodorow) measurements and the publicly available concept embeddings, cui2vec, generated from large biomedical corpora. The concept sentence embeddings were also evaluated on a word sense disambiguation (WSD) task. Reference standards used included the semantic relatedness and semantic similarity datasets from the University of Minnesota, concept pairs generated from the Standardized MedDRA Queries and the MeSH (Medical Subject Headings) WSD corpus. RESULTS: Sentence embeddings generated by BioWordVec outperformed all other methods used individually in semantic relatedness measurements. Graph convolutional network graph embedding uniformly outperformed path-based measurements and was better than some word embeddings for the Standardized MedDRA Queries dataset. When used together, combined word and graph embedding achieved the best performance in all datasets. For WSD, the enhanced versions of BERT outperformed BioWordVec. CONCLUSIONS: Word and graph embedding techniques can be used to harness terms and relations in the UMLS to measure semantic relatedness between concepts. Concept sentence embedding outperforms path-based measurements and cui2vec, and can be further enhanced by combining with graph embedding.
format	Online Article Text
id	pubmed-7566472
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-75664722020-10-20 Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts Mao, Yuqing Fung, Kin Wah J Am Med Inform Assoc Research and Applications OBJECTIVE: The study sought to explore the use of deep learning techniques to measure the semantic relatedness between Unified Medical Language System (UMLS) concepts. MATERIALS AND METHODS: Concept sentence embeddings were generated for UMLS concepts by applying the word embedding models BioWordVec and various flavors of BERT to concept sentences formed by concatenating UMLS terms. Graph embeddings were generated by the graph convolutional networks and 4 knowledge graph embedding models, using graphs built from UMLS hierarchical relations. Semantic relatedness was measured by the cosine between the concepts’ embedding vectors. Performance was compared with 2 traditional path-based (shortest path and Leacock-Chodorow) measurements and the publicly available concept embeddings, cui2vec, generated from large biomedical corpora. The concept sentence embeddings were also evaluated on a word sense disambiguation (WSD) task. Reference standards used included the semantic relatedness and semantic similarity datasets from the University of Minnesota, concept pairs generated from the Standardized MedDRA Queries and the MeSH (Medical Subject Headings) WSD corpus. RESULTS: Sentence embeddings generated by BioWordVec outperformed all other methods used individually in semantic relatedness measurements. Graph convolutional network graph embedding uniformly outperformed path-based measurements and was better than some word embeddings for the Standardized MedDRA Queries dataset. When used together, combined word and graph embedding achieved the best performance in all datasets. For WSD, the enhanced versions of BERT outperformed BioWordVec. CONCLUSIONS: Word and graph embedding techniques can be used to harness terms and relations in the UMLS to measure semantic relatedness between concepts. Concept sentence embedding outperforms path-based measurements and cui2vec, and can be further enhanced by combining with graph embedding. Oxford University Press 2020-10-08 /pmc/articles/PMC7566472/ /pubmed/33029614 http://dx.doi.org/10.1093/jamia/ocaa136 Text en Published by Oxford University Press on behalf of the American Medical Informatics Association 2020. This work is written by US Government employees and is in the public domain in the US. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Research and Applications Mao, Yuqing Fung, Kin Wah Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts
title	Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts
title_full	Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts
title_fullStr	Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts
title_full_unstemmed	Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts
title_short	Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts
title_sort	use of word and graph embedding to measure semantic relatedness between unified medical language system concepts
topic	Research and Applications
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7566472/ https://www.ncbi.nlm.nih.gov/pubmed/33029614 http://dx.doi.org/10.1093/jamia/ocaa136
work_keys_str_mv	AT maoyuqing useofwordandgraphembeddingtomeasuresemanticrelatednessbetweenunifiedmedicallanguagesystemconcepts AT fungkinwah useofwordandgraphembeddingtomeasuresemanticrelatednessbetweenunifiedmedicallanguagesystemconcepts

Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts

Ejemplares similares