Cargando…
Detecting missing IS-A relations in the NCI Thesaurus using an enhanced hybrid approach
BACKGROUND: The National Cancer Institute (NCI) Thesaurus provides reference terminology for NCI and other systems. Previously, we proposed a hybrid prototype utilizing lexical features and role definitions of concepts in non-lattice subgraphs to identify missing IS-A relations in the NCI Thesaurus....
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7737275/ https://www.ncbi.nlm.nih.gov/pubmed/33319703 http://dx.doi.org/10.1186/s12911-020-01289-6 |
_version_ | 1783622914336096256 |
---|---|
author | Zheng, Fengbo Abeysinghe, Rashmie Sioutos, Nicholas Whiteman, Lori Remennik, Lyubov Cui, Licong |
author_facet | Zheng, Fengbo Abeysinghe, Rashmie Sioutos, Nicholas Whiteman, Lori Remennik, Lyubov Cui, Licong |
author_sort | Zheng, Fengbo |
collection | PubMed |
description | BACKGROUND: The National Cancer Institute (NCI) Thesaurus provides reference terminology for NCI and other systems. Previously, we proposed a hybrid prototype utilizing lexical features and role definitions of concepts in non-lattice subgraphs to identify missing IS-A relations in the NCI Thesaurus. However, no domain expert evaluation was provided in our previous work. In this paper, we further enhance the hybrid approach by leveraging a novel lexical feature—roots of noun chunks within concept names. Formal evaluation of our enhanced approach is also performed. METHOD: We first compute all the non-lattice subgraphs in the NCI Thesaurus. We model each concept using its role definitions, words and roots of noun chunks within its concept name and its ancestor’s names. Then we perform subsumption testing for candidate concept pairs in the non-lattice subgraphs to automatically detect potentially missing IS-A relations. Domain experts evaluated the validity of these relations. RESULTS: We applied our approach to 19.08d version of the NCI Thesaurus. A total of 55 potentially missing IS-A relations were identified by our approach and reviewed by domain experts. 29 out of 55 were confirmed as valid by domain experts and have been incorporated in the newer versions of the NCI Thesaurus. 7 out of 55 further revealed incorrect existing IS-A relations in the NCI Thesaurus. CONCLUSIONS: The results showed that our hybrid approach by leveraging lexical features and role definitions is effective in identifying potentially missing IS-A relations in the NCI Thesaurus. |
format | Online Article Text |
id | pubmed-7737275 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-77372752020-12-17 Detecting missing IS-A relations in the NCI Thesaurus using an enhanced hybrid approach Zheng, Fengbo Abeysinghe, Rashmie Sioutos, Nicholas Whiteman, Lori Remennik, Lyubov Cui, Licong BMC Med Inform Decis Mak Research BACKGROUND: The National Cancer Institute (NCI) Thesaurus provides reference terminology for NCI and other systems. Previously, we proposed a hybrid prototype utilizing lexical features and role definitions of concepts in non-lattice subgraphs to identify missing IS-A relations in the NCI Thesaurus. However, no domain expert evaluation was provided in our previous work. In this paper, we further enhance the hybrid approach by leveraging a novel lexical feature—roots of noun chunks within concept names. Formal evaluation of our enhanced approach is also performed. METHOD: We first compute all the non-lattice subgraphs in the NCI Thesaurus. We model each concept using its role definitions, words and roots of noun chunks within its concept name and its ancestor’s names. Then we perform subsumption testing for candidate concept pairs in the non-lattice subgraphs to automatically detect potentially missing IS-A relations. Domain experts evaluated the validity of these relations. RESULTS: We applied our approach to 19.08d version of the NCI Thesaurus. A total of 55 potentially missing IS-A relations were identified by our approach and reviewed by domain experts. 29 out of 55 were confirmed as valid by domain experts and have been incorporated in the newer versions of the NCI Thesaurus. 7 out of 55 further revealed incorrect existing IS-A relations in the NCI Thesaurus. CONCLUSIONS: The results showed that our hybrid approach by leveraging lexical features and role definitions is effective in identifying potentially missing IS-A relations in the NCI Thesaurus. BioMed Central 2020-12-15 /pmc/articles/PMC7737275/ /pubmed/33319703 http://dx.doi.org/10.1186/s12911-020-01289-6 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Zheng, Fengbo Abeysinghe, Rashmie Sioutos, Nicholas Whiteman, Lori Remennik, Lyubov Cui, Licong Detecting missing IS-A relations in the NCI Thesaurus using an enhanced hybrid approach |
title | Detecting missing IS-A relations in the NCI Thesaurus using an enhanced hybrid approach |
title_full | Detecting missing IS-A relations in the NCI Thesaurus using an enhanced hybrid approach |
title_fullStr | Detecting missing IS-A relations in the NCI Thesaurus using an enhanced hybrid approach |
title_full_unstemmed | Detecting missing IS-A relations in the NCI Thesaurus using an enhanced hybrid approach |
title_short | Detecting missing IS-A relations in the NCI Thesaurus using an enhanced hybrid approach |
title_sort | detecting missing is-a relations in the nci thesaurus using an enhanced hybrid approach |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7737275/ https://www.ncbi.nlm.nih.gov/pubmed/33319703 http://dx.doi.org/10.1186/s12911-020-01289-6 |
work_keys_str_mv | AT zhengfengbo detectingmissingisarelationsinthencithesaurususinganenhancedhybridapproach AT abeysingherashmie detectingmissingisarelationsinthencithesaurususinganenhancedhybridapproach AT sioutosnicholas detectingmissingisarelationsinthencithesaurususinganenhancedhybridapproach AT whitemanlori detectingmissingisarelationsinthencithesaurususinganenhancedhybridapproach AT remenniklyubov detectingmissingisarelationsinthencithesaurususinganenhancedhybridapproach AT cuilicong detectingmissingisarelationsinthencithesaurususinganenhancedhybridapproach |