Cargando…

Calculating semantic relatedness for biomedical use in a knowledge-poor environment

BACKGROUND: Computing semantic relatedness between textual labels representing biological and medical concepts is a crucial task in many automated knowledge extraction and processing applications relevant to the biomedical domain, specifically due to the huge amount of new findings being published e...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rybinski, Maciej, Aldana-Montes, José Francisco
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4255738/ https://www.ncbi.nlm.nih.gov/pubmed/25471751 http://dx.doi.org/10.1186/1471-2105-15-S14-S2

_version_	1782347480166301696
author	Rybinski, Maciej Aldana-Montes, José Francisco
author_facet	Rybinski, Maciej Aldana-Montes, José Francisco
author_sort	Rybinski, Maciej
collection	PubMed
description	BACKGROUND: Computing semantic relatedness between textual labels representing biological and medical concepts is a crucial task in many automated knowledge extraction and processing applications relevant to the biomedical domain, specifically due to the huge amount of new findings being published each year. Most methods benefit from making use of highly specific resources, thus reducing their usability in many real world scenarios that differ from the original assumptions. In this paper we present a simple resource-efficient method for calculating semantic relatedness in a knowledge-poor environment. The method obtains results comparable to state-of-the-art methods, while being more generic and flexible. The solution being presented here was designed to use only a relatively generic and small document corpus and its statistics, without referring to a previously defined knowledge base, thus it does not assume a 'closed' problem. RESULTS: We propose a method in which computation for two input texts is based on the idea of comparing the vocabulary associated with the best-fit documents related to those texts. As keyterm extraction is a costly process, it is done in a preprocessing step on a 'per-document' basis in order to limit the on-line processing. The actual computations are executed in a compact vector space, limited by the most informative extraction results. The method has been evaluated on five direct benchmarks by calculating correlation coefficients w.r.t. average human answers. It also has been used on Gene - Disease and Disease- Disease data pairs to highlight its potential use as a data analysis tool. Apart from comparisons with reported results, some interesting features of the method have been studied, i.e. the relationship between result quality, efficiency and applicable trimming threshold for size reduction. Experimental evaluation shows that the presented method obtains results that are comparable with current state of the art methods, even surpassing them on a majority of the benchmarks. Additionally, a possible usage scenario for the method is showcased with a real-world data experiment. CONCLUSIONS: Our method improves flexibility of the existing methods without a notable loss of quality. It is a legitimate alternative to the costly construction of specialized knowledge-rich resources.
format	Online Article Text
id	pubmed-4255738
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42557382014-12-05 Calculating semantic relatedness for biomedical use in a knowledge-poor environment Rybinski, Maciej Aldana-Montes, José Francisco BMC Bioinformatics Research BACKGROUND: Computing semantic relatedness between textual labels representing biological and medical concepts is a crucial task in many automated knowledge extraction and processing applications relevant to the biomedical domain, specifically due to the huge amount of new findings being published each year. Most methods benefit from making use of highly specific resources, thus reducing their usability in many real world scenarios that differ from the original assumptions. In this paper we present a simple resource-efficient method for calculating semantic relatedness in a knowledge-poor environment. The method obtains results comparable to state-of-the-art methods, while being more generic and flexible. The solution being presented here was designed to use only a relatively generic and small document corpus and its statistics, without referring to a previously defined knowledge base, thus it does not assume a 'closed' problem. RESULTS: We propose a method in which computation for two input texts is based on the idea of comparing the vocabulary associated with the best-fit documents related to those texts. As keyterm extraction is a costly process, it is done in a preprocessing step on a 'per-document' basis in order to limit the on-line processing. The actual computations are executed in a compact vector space, limited by the most informative extraction results. The method has been evaluated on five direct benchmarks by calculating correlation coefficients w.r.t. average human answers. It also has been used on Gene - Disease and Disease- Disease data pairs to highlight its potential use as a data analysis tool. Apart from comparisons with reported results, some interesting features of the method have been studied, i.e. the relationship between result quality, efficiency and applicable trimming threshold for size reduction. Experimental evaluation shows that the presented method obtains results that are comparable with current state of the art methods, even surpassing them on a majority of the benchmarks. Additionally, a possible usage scenario for the method is showcased with a real-world data experiment. CONCLUSIONS: Our method improves flexibility of the existing methods without a notable loss of quality. It is a legitimate alternative to the costly construction of specialized knowledge-rich resources. BioMed Central 2014-11-27 /pmc/articles/PMC4255738/ /pubmed/25471751 http://dx.doi.org/10.1186/1471-2105-15-S14-S2 Text en Copyright © 2014 Rybinski and Aldana-Montes; licensee BioMed Central. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Rybinski, Maciej Aldana-Montes, José Francisco Calculating semantic relatedness for biomedical use in a knowledge-poor environment
title	Calculating semantic relatedness for biomedical use in a knowledge-poor environment
title_full	Calculating semantic relatedness for biomedical use in a knowledge-poor environment
title_fullStr	Calculating semantic relatedness for biomedical use in a knowledge-poor environment
title_full_unstemmed	Calculating semantic relatedness for biomedical use in a knowledge-poor environment
title_short	Calculating semantic relatedness for biomedical use in a knowledge-poor environment
title_sort	calculating semantic relatedness for biomedical use in a knowledge-poor environment
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4255738/ https://www.ncbi.nlm.nih.gov/pubmed/25471751 http://dx.doi.org/10.1186/1471-2105-15-S14-S2
work_keys_str_mv	AT rybinskimaciej calculatingsemanticrelatednessforbiomedicaluseinaknowledgepoorenvironment AT aldanamontesjosefrancisco calculatingsemanticrelatednessforbiomedicaluseinaknowledgepoorenvironment

Calculating semantic relatedness for biomedical use in a knowledge-poor environment

Ejemplares similares