Cargando…
Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system
BACKGROUND: We develop medical-specialty specific ontologies that contain the settled science and common term usage. We leverage current practices in information and relationship extraction to streamline the ontology development process. Our system combines different text types with information and...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4396714/ https://www.ncbi.nlm.nih.gov/pubmed/25874077 http://dx.doi.org/10.1186/s13326-015-0011-7 |
_version_ | 1782366619717074944 |
---|---|
author | Doing-Harris, Kristina Livnat, Yarden Meystre, Stephane |
author_facet | Doing-Harris, Kristina Livnat, Yarden Meystre, Stephane |
author_sort | Doing-Harris, Kristina |
collection | PubMed |
description | BACKGROUND: We develop medical-specialty specific ontologies that contain the settled science and common term usage. We leverage current practices in information and relationship extraction to streamline the ontology development process. Our system combines different text types with information and relationship extraction techniques in a low overhead modifiable system. Our SEmi-Automated ontology Maintenance (SEAM) system features a natural language processing pipeline for information extraction. Synonym and hierarchical groups are identified using corpus-based semantics and lexico-syntactic patterns. The semantic vectors we use are term frequency by inverse document frequency and context vectors. Clinical documents contain the terms we want in an ontology. They also contain idiosyncratic usage and are unlikely to contain the linguistic constructs associated with synonym and hierarchy identification. By including both clinical and biomedical texts, SEAM can recommend terms from those appearing in both document types. The set of recommended terms is then used to filter the synonyms and hierarchical relationships extracted from the biomedical corpus. We demonstrate the generality of the system across three use cases: ontologies for acute changes in mental status, Medically Unexplained Syndromes, and echocardiogram summary statements. RESULTS: Across the three uses cases, we held the number of recommended terms relatively constant by changing SEAM’s parameters. Experts seem to find more than 300 recommended terms to be overwhelming. The approval rate of recommended terms increased as the number and specificity of clinical documents in the corpus increased. It was 60% when there were 199 clinical documents that were not specific to the ontology domain and 90% when there were 2879 documents very specific to the target domain. We found that fewer than 100 recommended synonym groups were also preferred. Approval rates for synonym recommendations remained low varying from 43% to 25% as the number of journal articles increased from 19 to 47. Overall the number of recommended hierarchical relationships was very low although approval was good. It varied between 67% and 31%. CONCLUSION: SEAM produced a concise list of recommended clinical terms, synonyms and hierarchical relationships regardless of medical domain. |
format | Online Article Text |
id | pubmed-4396714 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43967142015-04-15 Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system Doing-Harris, Kristina Livnat, Yarden Meystre, Stephane J Biomed Semantics Software BACKGROUND: We develop medical-specialty specific ontologies that contain the settled science and common term usage. We leverage current practices in information and relationship extraction to streamline the ontology development process. Our system combines different text types with information and relationship extraction techniques in a low overhead modifiable system. Our SEmi-Automated ontology Maintenance (SEAM) system features a natural language processing pipeline for information extraction. Synonym and hierarchical groups are identified using corpus-based semantics and lexico-syntactic patterns. The semantic vectors we use are term frequency by inverse document frequency and context vectors. Clinical documents contain the terms we want in an ontology. They also contain idiosyncratic usage and are unlikely to contain the linguistic constructs associated with synonym and hierarchy identification. By including both clinical and biomedical texts, SEAM can recommend terms from those appearing in both document types. The set of recommended terms is then used to filter the synonyms and hierarchical relationships extracted from the biomedical corpus. We demonstrate the generality of the system across three use cases: ontologies for acute changes in mental status, Medically Unexplained Syndromes, and echocardiogram summary statements. RESULTS: Across the three uses cases, we held the number of recommended terms relatively constant by changing SEAM’s parameters. Experts seem to find more than 300 recommended terms to be overwhelming. The approval rate of recommended terms increased as the number and specificity of clinical documents in the corpus increased. It was 60% when there were 199 clinical documents that were not specific to the ontology domain and 90% when there were 2879 documents very specific to the target domain. We found that fewer than 100 recommended synonym groups were also preferred. Approval rates for synonym recommendations remained low varying from 43% to 25% as the number of journal articles increased from 19 to 47. Overall the number of recommended hierarchical relationships was very low although approval was good. It varied between 67% and 31%. CONCLUSION: SEAM produced a concise list of recommended clinical terms, synonyms and hierarchical relationships regardless of medical domain. BioMed Central 2015-04-02 /pmc/articles/PMC4396714/ /pubmed/25874077 http://dx.doi.org/10.1186/s13326-015-0011-7 Text en © Doing-Harris et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Doing-Harris, Kristina Livnat, Yarden Meystre, Stephane Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system |
title | Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system |
title_full | Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system |
title_fullStr | Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system |
title_full_unstemmed | Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system |
title_short | Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system |
title_sort | automated concept and relationship extraction for the semi-automated ontology management (seam) system |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4396714/ https://www.ncbi.nlm.nih.gov/pubmed/25874077 http://dx.doi.org/10.1186/s13326-015-0011-7 |
work_keys_str_mv | AT doingharriskristina automatedconceptandrelationshipextractionforthesemiautomatedontologymanagementseamsystem AT livnatyarden automatedconceptandrelationshipextractionforthesemiautomatedontologymanagementseamsystem AT meystrestephane automatedconceptandrelationshipextractionforthesemiautomatedontologymanagementseamsystem |