Cargando…

Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system

BACKGROUND: We develop medical-specialty specific ontologies that contain the settled science and common term usage. We leverage current practices in information and relationship extraction to streamline the ontology development process. Our system combines different text types with information and...

Descripción completa

Detalles Bibliográficos
Autores principales:	Doing-Harris, Kristina, Livnat, Yarden, Meystre, Stephane
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4396714/ https://www.ncbi.nlm.nih.gov/pubmed/25874077 http://dx.doi.org/10.1186/s13326-015-0011-7

_version_	1782366619717074944
author	Doing-Harris, Kristina Livnat, Yarden Meystre, Stephane
author_facet	Doing-Harris, Kristina Livnat, Yarden Meystre, Stephane
author_sort	Doing-Harris, Kristina
collection	PubMed
description	BACKGROUND: We develop medical-specialty specific ontologies that contain the settled science and common term usage. We leverage current practices in information and relationship extraction to streamline the ontology development process. Our system combines different text types with information and relationship extraction techniques in a low overhead modifiable system. Our SEmi-Automated ontology Maintenance (SEAM) system features a natural language processing pipeline for information extraction. Synonym and hierarchical groups are identified using corpus-based semantics and lexico-syntactic patterns. The semantic vectors we use are term frequency by inverse document frequency and context vectors. Clinical documents contain the terms we want in an ontology. They also contain idiosyncratic usage and are unlikely to contain the linguistic constructs associated with synonym and hierarchy identification. By including both clinical and biomedical texts, SEAM can recommend terms from those appearing in both document types. The set of recommended terms is then used to filter the synonyms and hierarchical relationships extracted from the biomedical corpus. We demonstrate the generality of the system across three use cases: ontologies for acute changes in mental status, Medically Unexplained Syndromes, and echocardiogram summary statements. RESULTS: Across the three uses cases, we held the number of recommended terms relatively constant by changing SEAM’s parameters. Experts seem to find more than 300 recommended terms to be overwhelming. The approval rate of recommended terms increased as the number and specificity of clinical documents in the corpus increased. It was 60% when there were 199 clinical documents that were not specific to the ontology domain and 90% when there were 2879 documents very specific to the target domain. We found that fewer than 100 recommended synonym groups were also preferred. Approval rates for synonym recommendations remained low varying from 43% to 25% as the number of journal articles increased from 19 to 47. Overall the number of recommended hierarchical relationships was very low although approval was good. It varied between 67% and 31%. CONCLUSION: SEAM produced a concise list of recommended clinical terms, synonyms and hierarchical relationships regardless of medical domain.
format	Online Article Text
id	pubmed-4396714
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-43967142015-04-15 Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system Doing-Harris, Kristina Livnat, Yarden Meystre, Stephane J Biomed Semantics Software BACKGROUND: We develop medical-specialty specific ontologies that contain the settled science and common term usage. We leverage current practices in information and relationship extraction to streamline the ontology development process. Our system combines different text types with information and relationship extraction techniques in a low overhead modifiable system. Our SEmi-Automated ontology Maintenance (SEAM) system features a natural language processing pipeline for information extraction. Synonym and hierarchical groups are identified using corpus-based semantics and lexico-syntactic patterns. The semantic vectors we use are term frequency by inverse document frequency and context vectors. Clinical documents contain the terms we want in an ontology. They also contain idiosyncratic usage and are unlikely to contain the linguistic constructs associated with synonym and hierarchy identification. By including both clinical and biomedical texts, SEAM can recommend terms from those appearing in both document types. The set of recommended terms is then used to filter the synonyms and hierarchical relationships extracted from the biomedical corpus. We demonstrate the generality of the system across three use cases: ontologies for acute changes in mental status, Medically Unexplained Syndromes, and echocardiogram summary statements. RESULTS: Across the three uses cases, we held the number of recommended terms relatively constant by changing SEAM’s parameters. Experts seem to find more than 300 recommended terms to be overwhelming. The approval rate of recommended terms increased as the number and specificity of clinical documents in the corpus increased. It was 60% when there were 199 clinical documents that were not specific to the ontology domain and 90% when there were 2879 documents very specific to the target domain. We found that fewer than 100 recommended synonym groups were also preferred. Approval rates for synonym recommendations remained low varying from 43% to 25% as the number of journal articles increased from 19 to 47. Overall the number of recommended hierarchical relationships was very low although approval was good. It varied between 67% and 31%. CONCLUSION: SEAM produced a concise list of recommended clinical terms, synonyms and hierarchical relationships regardless of medical domain. BioMed Central 2015-04-02 /pmc/articles/PMC4396714/ /pubmed/25874077 http://dx.doi.org/10.1186/s13326-015-0011-7 Text en © Doing-Harris et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Doing-Harris, Kristina Livnat, Yarden Meystre, Stephane Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system
title	Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system
title_full	Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system
title_fullStr	Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system
title_full_unstemmed	Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system
title_short	Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system
title_sort	automated concept and relationship extraction for the semi-automated ontology management (seam) system
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4396714/ https://www.ncbi.nlm.nih.gov/pubmed/25874077 http://dx.doi.org/10.1186/s13326-015-0011-7
work_keys_str_mv	AT doingharriskristina automatedconceptandrelationshipextractionforthesemiautomatedontologymanagementseamsystem AT livnatyarden automatedconceptandrelationshipextractionforthesemiautomatedontologymanagementseamsystem AT meystrestephane automatedconceptandrelationshipextractionforthesemiautomatedontologymanagementseamsystem

Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system

Ejemplares similares