Cargando…
Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT
Objective: Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of missing hierarchical relations and concepts in SNOMED CT. Material an...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6080685/ https://www.ncbi.nlm.nih.gov/pubmed/28339775 http://dx.doi.org/10.1093/jamia/ocw175 |
_version_ | 1783345522977800192 |
---|---|
author | Cui, Licong Zhu, Wei Tao, Shiqiang Case, James T Bodenreider, Olivier Zhang, Guo-Qiang |
author_facet | Cui, Licong Zhu, Wei Tao, Shiqiang Case, James T Bodenreider, Olivier Zhang, Guo-Qiang |
author_sort | Cui, Licong |
collection | PubMed |
description | Objective: Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of missing hierarchical relations and concepts in SNOMED CT. Material and Methods: All non-lattice subgraphs (the structural part) in SNOMED CT are exhaustively extracted using a scalable MapReduce algorithm. Four lexical patterns (the lexical part) are identified among the extracted non-lattice subgraphs. Non-lattice subgraphs exhibiting such lexical patterns are often indicative of missing hierarchical relations or concepts. Each lexical pattern is associated with a potential specific type of error. Results: Applying the structural-lexical method to SNOMED CT (September 2015 US edition), we found 6801 non-lattice subgraphs that matched these lexical patterns, of which 2046 were amenable to visual inspection. We evaluated a random sample of 100 small subgraphs, of which 59 were reviewed in detail by domain experts. All the subgraphs reviewed contained errors confirmed by the experts. The most frequent type of error was missing is-a relations due to incomplete or inconsistent modeling of the concepts. Conclusions: Our hybrid structural-lexical method is innovative and proved effective not only in detecting errors in SNOMED CT, but also in suggesting remediation for these errors. |
format | Online Article Text |
id | pubmed-6080685 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-60806852018-08-10 Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT Cui, Licong Zhu, Wei Tao, Shiqiang Case, James T Bodenreider, Olivier Zhang, Guo-Qiang J Am Med Inform Assoc Research and Applications Objective: Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of missing hierarchical relations and concepts in SNOMED CT. Material and Methods: All non-lattice subgraphs (the structural part) in SNOMED CT are exhaustively extracted using a scalable MapReduce algorithm. Four lexical patterns (the lexical part) are identified among the extracted non-lattice subgraphs. Non-lattice subgraphs exhibiting such lexical patterns are often indicative of missing hierarchical relations or concepts. Each lexical pattern is associated with a potential specific type of error. Results: Applying the structural-lexical method to SNOMED CT (September 2015 US edition), we found 6801 non-lattice subgraphs that matched these lexical patterns, of which 2046 were amenable to visual inspection. We evaluated a random sample of 100 small subgraphs, of which 59 were reviewed in detail by domain experts. All the subgraphs reviewed contained errors confirmed by the experts. The most frequent type of error was missing is-a relations due to incomplete or inconsistent modeling of the concepts. Conclusions: Our hybrid structural-lexical method is innovative and proved effective not only in detecting errors in SNOMED CT, but also in suggesting remediation for these errors. Oxford University Press 2017-07 2017-02-19 /pmc/articles/PMC6080685/ /pubmed/28339775 http://dx.doi.org/10.1093/jamia/ocw175 Text en © The Author, 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Research and Applications Cui, Licong Zhu, Wei Tao, Shiqiang Case, James T Bodenreider, Olivier Zhang, Guo-Qiang Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT |
title | Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT |
title_full | Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT |
title_fullStr | Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT |
title_full_unstemmed | Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT |
title_short | Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT |
title_sort | mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in snomed ct |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6080685/ https://www.ncbi.nlm.nih.gov/pubmed/28339775 http://dx.doi.org/10.1093/jamia/ocw175 |
work_keys_str_mv | AT cuilicong miningnonlatticesubgraphsfordetectingmissinghierarchicalrelationsandconceptsinsnomedct AT zhuwei miningnonlatticesubgraphsfordetectingmissinghierarchicalrelationsandconceptsinsnomedct AT taoshiqiang miningnonlatticesubgraphsfordetectingmissinghierarchicalrelationsandconceptsinsnomedct AT casejamest miningnonlatticesubgraphsfordetectingmissinghierarchicalrelationsandconceptsinsnomedct AT bodenreiderolivier miningnonlatticesubgraphsfordetectingmissinghierarchicalrelationsandconceptsinsnomedct AT zhangguoqiang miningnonlatticesubgraphsfordetectingmissinghierarchicalrelationsandconceptsinsnomedct |