Cargando…
FEDRR: fast, exhaustive detection of redundant hierarchical relations for quality improvement of large biomedical ontologies
BACKGROUND: Redundant hierarchical relations refer to such patterns as two paths from one concept to another, one with length one (direct) and the other with length greater than one (indirect). Each redundant relation represents a possibly unintended defect that needs to be corrected in the ontology...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5057496/ https://www.ncbi.nlm.nih.gov/pubmed/27777627 http://dx.doi.org/10.1186/s13040-016-0110-8 |
Sumario: | BACKGROUND: Redundant hierarchical relations refer to such patterns as two paths from one concept to another, one with length one (direct) and the other with length greater than one (indirect). Each redundant relation represents a possibly unintended defect that needs to be corrected in the ontology quality assurance process. Detecting and eliminating redundant relations would help improve the results of all methods relying on the relevant ontological systems as knowledge source, such as the computation of semantic distance between concepts and for ontology matching and alignment. RESULTS: This paper introduces a novel and scalable approach, called FEDRR – Fast, Exhaustive Detection of Redundant Relations – for quality assurance work during ontological evolution. FEDRR combines the algorithm ideas of Dynamic Programming with Topological Sort, for exhaustive mining of all redundant hierarchical relations in ontological hierarchies, in O(c·|V|+|E|) time, where |V| is the number of concepts, |E| is the number of the relations, and c is a constant in practice. Using FEDRR, we performed exhaustive search of all redundant is-a relations in two of the largest ontological systems in biomedicine: SNOMED CT and Gene Ontology (GO). 372 and 1609 redundant is-a relations were found in the 2015-09-01 version of SNOMED CT and 2015-05-01 version of GO, respectively. We have also performed FEDRR on over 190 source vocabularies in the UMLS - a large integrated repository of biomedical ontologies, and identified six sources containing redundant is-a relations. Randomly generated ontologies have also been used to further validate the efficiency of FEDRR. CONCLUSIONS: FEDRR provides a generally applicable, effective tool for systematic detecting redundant relations in large ontological systems for quality improvement. |
---|