Cargando…

Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system

BACKGROUND: We develop medical-specialty specific ontologies that contain the settled science and common term usage. We leverage current practices in information and relationship extraction to streamline the ontology development process. Our system combines different text types with information and...

Descripción completa

Detalles Bibliográficos
Autores principales: Doing-Harris, Kristina, Livnat, Yarden, Meystre, Stephane
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4396714/
https://www.ncbi.nlm.nih.gov/pubmed/25874077
http://dx.doi.org/10.1186/s13326-015-0011-7
_version_ 1782366619717074944
author Doing-Harris, Kristina
Livnat, Yarden
Meystre, Stephane
author_facet Doing-Harris, Kristina
Livnat, Yarden
Meystre, Stephane
author_sort Doing-Harris, Kristina
collection PubMed
description BACKGROUND: We develop medical-specialty specific ontologies that contain the settled science and common term usage. We leverage current practices in information and relationship extraction to streamline the ontology development process. Our system combines different text types with information and relationship extraction techniques in a low overhead modifiable system. Our SEmi-Automated ontology Maintenance (SEAM) system features a natural language processing pipeline for information extraction. Synonym and hierarchical groups are identified using corpus-based semantics and lexico-syntactic patterns. The semantic vectors we use are term frequency by inverse document frequency and context vectors. Clinical documents contain the terms we want in an ontology. They also contain idiosyncratic usage and are unlikely to contain the linguistic constructs associated with synonym and hierarchy identification. By including both clinical and biomedical texts, SEAM can recommend terms from those appearing in both document types. The set of recommended terms is then used to filter the synonyms and hierarchical relationships extracted from the biomedical corpus. We demonstrate the generality of the system across three use cases: ontologies for acute changes in mental status, Medically Unexplained Syndromes, and echocardiogram summary statements. RESULTS: Across the three uses cases, we held the number of recommended terms relatively constant by changing SEAM’s parameters. Experts seem to find more than 300 recommended terms to be overwhelming. The approval rate of recommended terms increased as the number and specificity of clinical documents in the corpus increased. It was 60% when there were 199 clinical documents that were not specific to the ontology domain and 90% when there were 2879 documents very specific to the target domain. We found that fewer than 100 recommended synonym groups were also preferred. Approval rates for synonym recommendations remained low varying from 43% to 25% as the number of journal articles increased from 19 to 47. Overall the number of recommended hierarchical relationships was very low although approval was good. It varied between 67% and 31%. CONCLUSION: SEAM produced a concise list of recommended clinical terms, synonyms and hierarchical relationships regardless of medical domain.
format Online
Article
Text
id pubmed-4396714
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43967142015-04-15 Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system Doing-Harris, Kristina Livnat, Yarden Meystre, Stephane J Biomed Semantics Software BACKGROUND: We develop medical-specialty specific ontologies that contain the settled science and common term usage. We leverage current practices in information and relationship extraction to streamline the ontology development process. Our system combines different text types with information and relationship extraction techniques in a low overhead modifiable system. Our SEmi-Automated ontology Maintenance (SEAM) system features a natural language processing pipeline for information extraction. Synonym and hierarchical groups are identified using corpus-based semantics and lexico-syntactic patterns. The semantic vectors we use are term frequency by inverse document frequency and context vectors. Clinical documents contain the terms we want in an ontology. They also contain idiosyncratic usage and are unlikely to contain the linguistic constructs associated with synonym and hierarchy identification. By including both clinical and biomedical texts, SEAM can recommend terms from those appearing in both document types. The set of recommended terms is then used to filter the synonyms and hierarchical relationships extracted from the biomedical corpus. We demonstrate the generality of the system across three use cases: ontologies for acute changes in mental status, Medically Unexplained Syndromes, and echocardiogram summary statements. RESULTS: Across the three uses cases, we held the number of recommended terms relatively constant by changing SEAM’s parameters. Experts seem to find more than 300 recommended terms to be overwhelming. The approval rate of recommended terms increased as the number and specificity of clinical documents in the corpus increased. It was 60% when there were 199 clinical documents that were not specific to the ontology domain and 90% when there were 2879 documents very specific to the target domain. We found that fewer than 100 recommended synonym groups were also preferred. Approval rates for synonym recommendations remained low varying from 43% to 25% as the number of journal articles increased from 19 to 47. Overall the number of recommended hierarchical relationships was very low although approval was good. It varied between 67% and 31%. CONCLUSION: SEAM produced a concise list of recommended clinical terms, synonyms and hierarchical relationships regardless of medical domain. BioMed Central 2015-04-02 /pmc/articles/PMC4396714/ /pubmed/25874077 http://dx.doi.org/10.1186/s13326-015-0011-7 Text en © Doing-Harris et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Doing-Harris, Kristina
Livnat, Yarden
Meystre, Stephane
Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system
title Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system
title_full Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system
title_fullStr Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system
title_full_unstemmed Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system
title_short Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system
title_sort automated concept and relationship extraction for the semi-automated ontology management (seam) system
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4396714/
https://www.ncbi.nlm.nih.gov/pubmed/25874077
http://dx.doi.org/10.1186/s13326-015-0011-7
work_keys_str_mv AT doingharriskristina automatedconceptandrelationshipextractionforthesemiautomatedontologymanagementseamsystem
AT livnatyarden automatedconceptandrelationshipextractionforthesemiautomatedontologymanagementseamsystem
AT meystrestephane automatedconceptandrelationshipextractionforthesemiautomatedontologymanagementseamsystem