Cargando…

Detecting missing IS-A relations in the NCI Thesaurus using an enhanced hybrid approach

BACKGROUND: The National Cancer Institute (NCI) Thesaurus provides reference terminology for NCI and other systems. Previously, we proposed a hybrid prototype utilizing lexical features and role definitions of concepts in non-lattice subgraphs to identify missing IS-A relations in the NCI Thesaurus....

Descripción completa

Detalles Bibliográficos
Autores principales: Zheng, Fengbo, Abeysinghe, Rashmie, Sioutos, Nicholas, Whiteman, Lori, Remennik, Lyubov, Cui, Licong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7737275/
https://www.ncbi.nlm.nih.gov/pubmed/33319703
http://dx.doi.org/10.1186/s12911-020-01289-6
_version_ 1783622914336096256
author Zheng, Fengbo
Abeysinghe, Rashmie
Sioutos, Nicholas
Whiteman, Lori
Remennik, Lyubov
Cui, Licong
author_facet Zheng, Fengbo
Abeysinghe, Rashmie
Sioutos, Nicholas
Whiteman, Lori
Remennik, Lyubov
Cui, Licong
author_sort Zheng, Fengbo
collection PubMed
description BACKGROUND: The National Cancer Institute (NCI) Thesaurus provides reference terminology for NCI and other systems. Previously, we proposed a hybrid prototype utilizing lexical features and role definitions of concepts in non-lattice subgraphs to identify missing IS-A relations in the NCI Thesaurus. However, no domain expert evaluation was provided in our previous work. In this paper, we further enhance the hybrid approach by leveraging a novel lexical feature—roots of noun chunks within concept names. Formal evaluation of our enhanced approach is also performed. METHOD: We first compute all the non-lattice subgraphs in the NCI Thesaurus. We model each concept using its role definitions, words and roots of noun chunks within its concept name and its ancestor’s names. Then we perform subsumption testing for candidate concept pairs in the non-lattice subgraphs to automatically detect potentially missing IS-A relations. Domain experts evaluated the validity of these relations. RESULTS: We applied our approach to 19.08d version of the NCI Thesaurus. A total of 55 potentially missing IS-A relations were identified by our approach and reviewed by domain experts. 29 out of 55 were confirmed as valid by domain experts and have been incorporated in the newer versions of the NCI Thesaurus. 7 out of 55 further revealed incorrect existing IS-A relations in the NCI Thesaurus. CONCLUSIONS: The results showed that our hybrid approach by leveraging lexical features and role definitions is effective in identifying potentially missing IS-A relations in the NCI Thesaurus.
format Online
Article
Text
id pubmed-7737275
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-77372752020-12-17 Detecting missing IS-A relations in the NCI Thesaurus using an enhanced hybrid approach Zheng, Fengbo Abeysinghe, Rashmie Sioutos, Nicholas Whiteman, Lori Remennik, Lyubov Cui, Licong BMC Med Inform Decis Mak Research BACKGROUND: The National Cancer Institute (NCI) Thesaurus provides reference terminology for NCI and other systems. Previously, we proposed a hybrid prototype utilizing lexical features and role definitions of concepts in non-lattice subgraphs to identify missing IS-A relations in the NCI Thesaurus. However, no domain expert evaluation was provided in our previous work. In this paper, we further enhance the hybrid approach by leveraging a novel lexical feature—roots of noun chunks within concept names. Formal evaluation of our enhanced approach is also performed. METHOD: We first compute all the non-lattice subgraphs in the NCI Thesaurus. We model each concept using its role definitions, words and roots of noun chunks within its concept name and its ancestor’s names. Then we perform subsumption testing for candidate concept pairs in the non-lattice subgraphs to automatically detect potentially missing IS-A relations. Domain experts evaluated the validity of these relations. RESULTS: We applied our approach to 19.08d version of the NCI Thesaurus. A total of 55 potentially missing IS-A relations were identified by our approach and reviewed by domain experts. 29 out of 55 were confirmed as valid by domain experts and have been incorporated in the newer versions of the NCI Thesaurus. 7 out of 55 further revealed incorrect existing IS-A relations in the NCI Thesaurus. CONCLUSIONS: The results showed that our hybrid approach by leveraging lexical features and role definitions is effective in identifying potentially missing IS-A relations in the NCI Thesaurus. BioMed Central 2020-12-15 /pmc/articles/PMC7737275/ /pubmed/33319703 http://dx.doi.org/10.1186/s12911-020-01289-6 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Zheng, Fengbo
Abeysinghe, Rashmie
Sioutos, Nicholas
Whiteman, Lori
Remennik, Lyubov
Cui, Licong
Detecting missing IS-A relations in the NCI Thesaurus using an enhanced hybrid approach
title Detecting missing IS-A relations in the NCI Thesaurus using an enhanced hybrid approach
title_full Detecting missing IS-A relations in the NCI Thesaurus using an enhanced hybrid approach
title_fullStr Detecting missing IS-A relations in the NCI Thesaurus using an enhanced hybrid approach
title_full_unstemmed Detecting missing IS-A relations in the NCI Thesaurus using an enhanced hybrid approach
title_short Detecting missing IS-A relations in the NCI Thesaurus using an enhanced hybrid approach
title_sort detecting missing is-a relations in the nci thesaurus using an enhanced hybrid approach
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7737275/
https://www.ncbi.nlm.nih.gov/pubmed/33319703
http://dx.doi.org/10.1186/s12911-020-01289-6
work_keys_str_mv AT zhengfengbo detectingmissingisarelationsinthencithesaurususinganenhancedhybridapproach
AT abeysingherashmie detectingmissingisarelationsinthencithesaurususinganenhancedhybridapproach
AT sioutosnicholas detectingmissingisarelationsinthencithesaurususinganenhancedhybridapproach
AT whitemanlori detectingmissingisarelationsinthencithesaurususinganenhancedhybridapproach
AT remenniklyubov detectingmissingisarelationsinthencithesaurususinganenhancedhybridapproach
AT cuilicong detectingmissingisarelationsinthencithesaurususinganenhancedhybridapproach