Cargando…

Formalizing biomedical concepts from textual definitions

BACKGROUND: Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Petrova, Alina, Ma, Yue, Tsatsaronis, George, Kissa, Maria, Distel, Felix, Baader, Franz, Schroeder, Michael
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4422531/ https://www.ncbi.nlm.nih.gov/pubmed/25949785 http://dx.doi.org/10.1186/s13326-015-0015-3

_version_	1782370067288162304
author	Petrova, Alina Ma, Yue Tsatsaronis, George Kissa, Maria Distel, Felix Baader, Franz Schroeder, Michael
author_facet	Petrova, Alina Ma, Yue Tsatsaronis, George Kissa, Maria Distel, Felix Baader, Franz Schroeder, Michael
author_sort	Petrova, Alina
collection	PubMed
description	BACKGROUND: Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions. RESULTS: We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations’ domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations’ domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions. CONCLUSIONS: The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL.
format	Online Article Text
id	pubmed-4422531
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-44225312015-05-07 Formalizing biomedical concepts from textual definitions Petrova, Alina Ma, Yue Tsatsaronis, George Kissa, Maria Distel, Felix Baader, Franz Schroeder, Michael J Biomed Semantics Research Article BACKGROUND: Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions. RESULTS: We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations’ domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations’ domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions. CONCLUSIONS: The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL. BioMed Central 2015-04-02 /pmc/articles/PMC4422531/ /pubmed/25949785 http://dx.doi.org/10.1186/s13326-015-0015-3 Text en © Petrova et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Petrova, Alina Ma, Yue Tsatsaronis, George Kissa, Maria Distel, Felix Baader, Franz Schroeder, Michael Formalizing biomedical concepts from textual definitions
title	Formalizing biomedical concepts from textual definitions
title_full	Formalizing biomedical concepts from textual definitions
title_fullStr	Formalizing biomedical concepts from textual definitions
title_full_unstemmed	Formalizing biomedical concepts from textual definitions
title_short	Formalizing biomedical concepts from textual definitions
title_sort	formalizing biomedical concepts from textual definitions
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4422531/ https://www.ncbi.nlm.nih.gov/pubmed/25949785 http://dx.doi.org/10.1186/s13326-015-0015-3
work_keys_str_mv	AT petrovaalina formalizingbiomedicalconceptsfromtextualdefinitions AT mayue formalizingbiomedicalconceptsfromtextualdefinitions AT tsatsaronisgeorge formalizingbiomedicalconceptsfromtextualdefinitions AT kissamaria formalizingbiomedicalconceptsfromtextualdefinitions AT distelfelix formalizingbiomedicalconceptsfromtextualdefinitions AT baaderfranz formalizingbiomedicalconceptsfromtextualdefinitions AT schroedermichael formalizingbiomedicalconceptsfromtextualdefinitions

Formalizing biomedical concepts from textual definitions

Ejemplares similares