Cargando…

Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT

Objective: Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of missing hierarchical relations and concepts in SNOMED CT. Material an...

Descripción completa

Detalles Bibliográficos
Autores principales: Cui, Licong, Zhu, Wei, Tao, Shiqiang, Case, James T, Bodenreider, Olivier, Zhang, Guo-Qiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6080685/
https://www.ncbi.nlm.nih.gov/pubmed/28339775
http://dx.doi.org/10.1093/jamia/ocw175
_version_ 1783345522977800192
author Cui, Licong
Zhu, Wei
Tao, Shiqiang
Case, James T
Bodenreider, Olivier
Zhang, Guo-Qiang
author_facet Cui, Licong
Zhu, Wei
Tao, Shiqiang
Case, James T
Bodenreider, Olivier
Zhang, Guo-Qiang
author_sort Cui, Licong
collection PubMed
description Objective: Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of missing hierarchical relations and concepts in SNOMED CT. Material and Methods: All non-lattice subgraphs (the structural part) in SNOMED CT are exhaustively extracted using a scalable MapReduce algorithm. Four lexical patterns (the lexical part) are identified among the extracted non-lattice subgraphs. Non-lattice subgraphs exhibiting such lexical patterns are often indicative of missing hierarchical relations or concepts. Each lexical pattern is associated with a potential specific type of error. Results: Applying the structural-lexical method to SNOMED CT (September 2015 US edition), we found 6801 non-lattice subgraphs that matched these lexical patterns, of which 2046 were amenable to visual inspection. We evaluated a random sample of 100 small subgraphs, of which 59 were reviewed in detail by domain experts. All the subgraphs reviewed contained errors confirmed by the experts. The most frequent type of error was missing is-a relations due to incomplete or inconsistent modeling of the concepts. Conclusions: Our hybrid structural-lexical method is innovative and proved effective not only in detecting errors in SNOMED CT, but also in suggesting remediation for these errors.
format Online
Article
Text
id pubmed-6080685
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60806852018-08-10 Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT Cui, Licong Zhu, Wei Tao, Shiqiang Case, James T Bodenreider, Olivier Zhang, Guo-Qiang J Am Med Inform Assoc Research and Applications Objective: Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of missing hierarchical relations and concepts in SNOMED CT. Material and Methods: All non-lattice subgraphs (the structural part) in SNOMED CT are exhaustively extracted using a scalable MapReduce algorithm. Four lexical patterns (the lexical part) are identified among the extracted non-lattice subgraphs. Non-lattice subgraphs exhibiting such lexical patterns are often indicative of missing hierarchical relations or concepts. Each lexical pattern is associated with a potential specific type of error. Results: Applying the structural-lexical method to SNOMED CT (September 2015 US edition), we found 6801 non-lattice subgraphs that matched these lexical patterns, of which 2046 were amenable to visual inspection. We evaluated a random sample of 100 small subgraphs, of which 59 were reviewed in detail by domain experts. All the subgraphs reviewed contained errors confirmed by the experts. The most frequent type of error was missing is-a relations due to incomplete or inconsistent modeling of the concepts. Conclusions: Our hybrid structural-lexical method is innovative and proved effective not only in detecting errors in SNOMED CT, but also in suggesting remediation for these errors. Oxford University Press 2017-07 2017-02-19 /pmc/articles/PMC6080685/ /pubmed/28339775 http://dx.doi.org/10.1093/jamia/ocw175 Text en © The Author, 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research and Applications
Cui, Licong
Zhu, Wei
Tao, Shiqiang
Case, James T
Bodenreider, Olivier
Zhang, Guo-Qiang
Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT
title Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT
title_full Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT
title_fullStr Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT
title_full_unstemmed Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT
title_short Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT
title_sort mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in snomed ct
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6080685/
https://www.ncbi.nlm.nih.gov/pubmed/28339775
http://dx.doi.org/10.1093/jamia/ocw175
work_keys_str_mv AT cuilicong miningnonlatticesubgraphsfordetectingmissinghierarchicalrelationsandconceptsinsnomedct
AT zhuwei miningnonlatticesubgraphsfordetectingmissinghierarchicalrelationsandconceptsinsnomedct
AT taoshiqiang miningnonlatticesubgraphsfordetectingmissinghierarchicalrelationsandconceptsinsnomedct
AT casejamest miningnonlatticesubgraphsfordetectingmissinghierarchicalrelationsandconceptsinsnomedct
AT bodenreiderolivier miningnonlatticesubgraphsfordetectingmissinghierarchicalrelationsandconceptsinsnomedct
AT zhangguoqiang miningnonlatticesubgraphsfordetectingmissinghierarchicalrelationsandconceptsinsnomedct