Cargando…

A Maximum-Entropy approach for accurate document annotation in the biomedical domain

The increasing number of scientific literature on the Web and the absence of efficient tools used for classifying and searching the documents are the two most important factors that influence the speed of the search and the quality of the results. Previous studies have shown that the usage of ontolo...

Descripción completa

Detalles Bibliográficos
Autores principales: Tsatsaronis, George, Macari, Natalia, Torge, Sunna, Dietze, Heiko, Schroeder, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3337257/
https://www.ncbi.nlm.nih.gov/pubmed/22541593
http://dx.doi.org/10.1186/2041-1480-3-S1-S2
_version_ 1782231050859053056
author Tsatsaronis, George
Macari, Natalia
Torge, Sunna
Dietze, Heiko
Schroeder, Michael
author_facet Tsatsaronis, George
Macari, Natalia
Torge, Sunna
Dietze, Heiko
Schroeder, Michael
author_sort Tsatsaronis, George
collection PubMed
description The increasing number of scientific literature on the Web and the absence of efficient tools used for classifying and searching the documents are the two most important factors that influence the speed of the search and the quality of the results. Previous studies have shown that the usage of ontologies makes it possible to process document and query information at the semantic level, which greatly improves the search for the relevant information and makes one step further towards the Semantic Web. A fundamental step in these approaches is the annotation of documents with ontology concepts, which can also be seen as a classification task. In this paper we address this issue for the biomedical domain and present a new automated and robust method, based on a Maximum Entropy approach, for annotating biomedical literature documents with terms from the Medical Subject Headings (MeSH). The experimental evaluation shows that the suggested Maximum Entropy approach for annotating biomedical documents with MeSH terms is highly accurate, robust to the ambiguity of terms, and can provide very good performance even when a very small number of training documents is used. More precisely, we show that the proposed algorithm obtained an average F-measure of 92.4% (precision 99.41%, recall 86.77%) for the full range of the explored terms (4,078 MeSH terms), and that the algorithm’s performance is resilient to terms’ ambiguity, achieving an average F-measure of 92.42% (precision 99.32%, recall 86.87%) in the explored MeSH terms which were found to be ambiguous according to the Unified Medical Language System (UMLS) thesaurus. Finally, we compared the results of the suggested methodology with a Naive Bayes and a Decision Trees classification approach, and we show that the Maximum Entropy based approach performed with higher F-Measure in both ambiguous and monosemous MeSH terms.
format Online
Article
Text
id pubmed-3337257
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33372572012-04-26 A Maximum-Entropy approach for accurate document annotation in the biomedical domain Tsatsaronis, George Macari, Natalia Torge, Sunna Dietze, Heiko Schroeder, Michael J Biomed Semantics Proceedings The increasing number of scientific literature on the Web and the absence of efficient tools used for classifying and searching the documents are the two most important factors that influence the speed of the search and the quality of the results. Previous studies have shown that the usage of ontologies makes it possible to process document and query information at the semantic level, which greatly improves the search for the relevant information and makes one step further towards the Semantic Web. A fundamental step in these approaches is the annotation of documents with ontology concepts, which can also be seen as a classification task. In this paper we address this issue for the biomedical domain and present a new automated and robust method, based on a Maximum Entropy approach, for annotating biomedical literature documents with terms from the Medical Subject Headings (MeSH). The experimental evaluation shows that the suggested Maximum Entropy approach for annotating biomedical documents with MeSH terms is highly accurate, robust to the ambiguity of terms, and can provide very good performance even when a very small number of training documents is used. More precisely, we show that the proposed algorithm obtained an average F-measure of 92.4% (precision 99.41%, recall 86.77%) for the full range of the explored terms (4,078 MeSH terms), and that the algorithm’s performance is resilient to terms’ ambiguity, achieving an average F-measure of 92.42% (precision 99.32%, recall 86.87%) in the explored MeSH terms which were found to be ambiguous according to the Unified Medical Language System (UMLS) thesaurus. Finally, we compared the results of the suggested methodology with a Naive Bayes and a Decision Trees classification approach, and we show that the Maximum Entropy based approach performed with higher F-Measure in both ambiguous and monosemous MeSH terms. BioMed Central 2012-04-24 /pmc/articles/PMC3337257/ /pubmed/22541593 http://dx.doi.org/10.1186/2041-1480-3-S1-S2 Text en Copyright ©2012 Tsatsaronis et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Tsatsaronis, George
Macari, Natalia
Torge, Sunna
Dietze, Heiko
Schroeder, Michael
A Maximum-Entropy approach for accurate document annotation in the biomedical domain
title A Maximum-Entropy approach for accurate document annotation in the biomedical domain
title_full A Maximum-Entropy approach for accurate document annotation in the biomedical domain
title_fullStr A Maximum-Entropy approach for accurate document annotation in the biomedical domain
title_full_unstemmed A Maximum-Entropy approach for accurate document annotation in the biomedical domain
title_short A Maximum-Entropy approach for accurate document annotation in the biomedical domain
title_sort maximum-entropy approach for accurate document annotation in the biomedical domain
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3337257/
https://www.ncbi.nlm.nih.gov/pubmed/22541593
http://dx.doi.org/10.1186/2041-1480-3-S1-S2
work_keys_str_mv AT tsatsaronisgeorge amaximumentropyapproachforaccuratedocumentannotationinthebiomedicaldomain
AT macarinatalia amaximumentropyapproachforaccuratedocumentannotationinthebiomedicaldomain
AT torgesunna amaximumentropyapproachforaccuratedocumentannotationinthebiomedicaldomain
AT dietzeheiko amaximumentropyapproachforaccuratedocumentannotationinthebiomedicaldomain
AT schroedermichael amaximumentropyapproachforaccuratedocumentannotationinthebiomedicaldomain
AT tsatsaronisgeorge maximumentropyapproachforaccuratedocumentannotationinthebiomedicaldomain
AT macarinatalia maximumentropyapproachforaccuratedocumentannotationinthebiomedicaldomain
AT torgesunna maximumentropyapproachforaccuratedocumentannotationinthebiomedicaldomain
AT dietzeheiko maximumentropyapproachforaccuratedocumentannotationinthebiomedicaldomain
AT schroedermichael maximumentropyapproachforaccuratedocumentannotationinthebiomedicaldomain