Cargando…

A transformation-based method for auditing the IS-A hierarchy of biomedical terminologies in the Unified Medical Language System

OBJECTIVE: The Unified Medical Language System (UMLS) integrates various source terminologies to support interoperability between biomedical information systems. In this article, we introduce a novel transformation-based auditing method that leverages the UMLS knowledge to systematically identify mi...

Descripción completa

Detalles Bibliográficos
Autores principales: Zheng, Fengbo, Shi, Jay, Yang, Yuntao, Zheng, W Jim, Cui, Licong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7566369/
https://www.ncbi.nlm.nih.gov/pubmed/32918476
http://dx.doi.org/10.1093/jamia/ocaa123
_version_ 1783596124176646144
author Zheng, Fengbo
Shi, Jay
Yang, Yuntao
Zheng, W Jim
Cui, Licong
author_facet Zheng, Fengbo
Shi, Jay
Yang, Yuntao
Zheng, W Jim
Cui, Licong
author_sort Zheng, Fengbo
collection PubMed
description OBJECTIVE: The Unified Medical Language System (UMLS) integrates various source terminologies to support interoperability between biomedical information systems. In this article, we introduce a novel transformation-based auditing method that leverages the UMLS knowledge to systematically identify missing hierarchical IS-A relations in the source terminologies. MATERIALS AND METHODS: Given a concept name in the UMLS, we first identify its base and secondary noun chunks. For each identified noun chunk, we generate replacement candidates that are more general than the noun chunk. Then, we replace the noun chunks with their replacement candidates to generate new potential concept names that may serve as supertypes of the original concept. If a newly generated name is an existing concept name in the same source terminology with the original concept, then a potentially missing IS-A relation between the original and the new concept is identified. RESULTS: Applying our transformation-based method to English-language concept names in the UMLS (2019AB release), a total of 39 359 potentially missing IS-A relations were detected in 13 source terminologies. Domain experts evaluated a random sample of 200 potentially missing IS-A relations identified in the SNOMED CT (U.S. edition) and 100 in Gene Ontology. A total of 173 of 200 and 63 of 100 potentially missing IS-A relations were confirmed by domain experts, indicating that our method achieved a precision of 86.5% and 63% for the SNOMED CT and Gene Ontology, respectively. CONCLUSIONS: Our results showed that our transformation-based method is effective in identifying missing IS-A relations in the UMLS source terminologies.
format Online
Article
Text
id pubmed-7566369
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-75663692020-10-20 A transformation-based method for auditing the IS-A hierarchy of biomedical terminologies in the Unified Medical Language System Zheng, Fengbo Shi, Jay Yang, Yuntao Zheng, W Jim Cui, Licong J Am Med Inform Assoc Research and Applications OBJECTIVE: The Unified Medical Language System (UMLS) integrates various source terminologies to support interoperability between biomedical information systems. In this article, we introduce a novel transformation-based auditing method that leverages the UMLS knowledge to systematically identify missing hierarchical IS-A relations in the source terminologies. MATERIALS AND METHODS: Given a concept name in the UMLS, we first identify its base and secondary noun chunks. For each identified noun chunk, we generate replacement candidates that are more general than the noun chunk. Then, we replace the noun chunks with their replacement candidates to generate new potential concept names that may serve as supertypes of the original concept. If a newly generated name is an existing concept name in the same source terminology with the original concept, then a potentially missing IS-A relation between the original and the new concept is identified. RESULTS: Applying our transformation-based method to English-language concept names in the UMLS (2019AB release), a total of 39 359 potentially missing IS-A relations were detected in 13 source terminologies. Domain experts evaluated a random sample of 200 potentially missing IS-A relations identified in the SNOMED CT (U.S. edition) and 100 in Gene Ontology. A total of 173 of 200 and 63 of 100 potentially missing IS-A relations were confirmed by domain experts, indicating that our method achieved a precision of 86.5% and 63% for the SNOMED CT and Gene Ontology, respectively. CONCLUSIONS: Our results showed that our transformation-based method is effective in identifying missing IS-A relations in the UMLS source terminologies. Oxford University Press 2020-10-12 /pmc/articles/PMC7566369/ /pubmed/32918476 http://dx.doi.org/10.1093/jamia/ocaa123 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research and Applications
Zheng, Fengbo
Shi, Jay
Yang, Yuntao
Zheng, W Jim
Cui, Licong
A transformation-based method for auditing the IS-A hierarchy of biomedical terminologies in the Unified Medical Language System
title A transformation-based method for auditing the IS-A hierarchy of biomedical terminologies in the Unified Medical Language System
title_full A transformation-based method for auditing the IS-A hierarchy of biomedical terminologies in the Unified Medical Language System
title_fullStr A transformation-based method for auditing the IS-A hierarchy of biomedical terminologies in the Unified Medical Language System
title_full_unstemmed A transformation-based method for auditing the IS-A hierarchy of biomedical terminologies in the Unified Medical Language System
title_short A transformation-based method for auditing the IS-A hierarchy of biomedical terminologies in the Unified Medical Language System
title_sort transformation-based method for auditing the is-a hierarchy of biomedical terminologies in the unified medical language system
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7566369/
https://www.ncbi.nlm.nih.gov/pubmed/32918476
http://dx.doi.org/10.1093/jamia/ocaa123
work_keys_str_mv AT zhengfengbo atransformationbasedmethodforauditingtheisahierarchyofbiomedicalterminologiesintheunifiedmedicallanguagesystem
AT shijay atransformationbasedmethodforauditingtheisahierarchyofbiomedicalterminologiesintheunifiedmedicallanguagesystem
AT yangyuntao atransformationbasedmethodforauditingtheisahierarchyofbiomedicalterminologiesintheunifiedmedicallanguagesystem
AT zhengwjim atransformationbasedmethodforauditingtheisahierarchyofbiomedicalterminologiesintheunifiedmedicallanguagesystem
AT cuilicong atransformationbasedmethodforauditingtheisahierarchyofbiomedicalterminologiesintheunifiedmedicallanguagesystem
AT zhengfengbo transformationbasedmethodforauditingtheisahierarchyofbiomedicalterminologiesintheunifiedmedicallanguagesystem
AT shijay transformationbasedmethodforauditingtheisahierarchyofbiomedicalterminologiesintheunifiedmedicallanguagesystem
AT yangyuntao transformationbasedmethodforauditingtheisahierarchyofbiomedicalterminologiesintheunifiedmedicallanguagesystem
AT zhengwjim transformationbasedmethodforauditingtheisahierarchyofbiomedicalterminologiesintheunifiedmedicallanguagesystem
AT cuilicong transformationbasedmethodforauditingtheisahierarchyofbiomedicalterminologiesintheunifiedmedicallanguagesystem