Cargando…

Constructing a semantic predication gold standard from the biomedical literature

BACKGROUND: Semantic relations increasingly underpin biomedical text mining and knowledge discovery applications. The success of such practical applications crucially depends on the quality of extracted relations, which can be assessed against a gold standard reference. Most such references in biome...

Descripción completa

Detalles Bibliográficos
Autores principales: Kilicoglu, Halil, Rosemblat, Graciela, Fiszman, Marcelo, Rindflesch, Thomas C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3281188/
https://www.ncbi.nlm.nih.gov/pubmed/22185221
http://dx.doi.org/10.1186/1471-2105-12-486
_version_ 1782223931578515456
author Kilicoglu, Halil
Rosemblat, Graciela
Fiszman, Marcelo
Rindflesch, Thomas C
author_facet Kilicoglu, Halil
Rosemblat, Graciela
Fiszman, Marcelo
Rindflesch, Thomas C
author_sort Kilicoglu, Halil
collection PubMed
description BACKGROUND: Semantic relations increasingly underpin biomedical text mining and knowledge discovery applications. The success of such practical applications crucially depends on the quality of extracted relations, which can be assessed against a gold standard reference. Most such references in biomedical text mining focus on narrow subdomains and adopt different semantic representations, rendering them difficult to use for benchmarking independently developed relation extraction systems. In this article, we present a multi-phase gold standard annotation study, in which we annotated 500 sentences randomly selected from MEDLINE abstracts on a wide range of biomedical topics with 1371 semantic predications. The UMLS Metathesaurus served as the main source for conceptual information and the UMLS Semantic Network for relational information. We measured interannotator agreement and analyzed the annotations closely to identify some of the challenges in annotating biomedical text with relations based on an ontology or a terminology. RESULTS: We obtain fair to moderate interannotator agreement in the practice phase (0.378-0.475). With improved guidelines and additional semantic equivalence criteria, the agreement increases by 12% (0.415 to 0.536) in the main annotation phase. In addition, we find that agreement increases to 0.688 when the agreement calculation is limited to those predications that are based only on the explicitly provided UMLS concepts and relations. CONCLUSIONS: While interannotator agreement in the practice phase confirms that conceptual annotation is a challenging task, the increasing agreement in the main annotation phase points out that an acceptable level of agreement can be achieved in multiple iterations, by setting stricter guidelines and establishing semantic equivalence criteria. Mapping text to ontological concepts emerges as the main challenge in conceptual annotation. Annotating predications involving biomolecular entities and processes is particularly challenging. While the resulting gold standard is mainly intended to serve as a test collection for our semantic interpreter, we believe that the lessons learned are applicable generally.
format Online
Article
Text
id pubmed-3281188
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32811882012-02-17 Constructing a semantic predication gold standard from the biomedical literature Kilicoglu, Halil Rosemblat, Graciela Fiszman, Marcelo Rindflesch, Thomas C BMC Bioinformatics Research Article BACKGROUND: Semantic relations increasingly underpin biomedical text mining and knowledge discovery applications. The success of such practical applications crucially depends on the quality of extracted relations, which can be assessed against a gold standard reference. Most such references in biomedical text mining focus on narrow subdomains and adopt different semantic representations, rendering them difficult to use for benchmarking independently developed relation extraction systems. In this article, we present a multi-phase gold standard annotation study, in which we annotated 500 sentences randomly selected from MEDLINE abstracts on a wide range of biomedical topics with 1371 semantic predications. The UMLS Metathesaurus served as the main source for conceptual information and the UMLS Semantic Network for relational information. We measured interannotator agreement and analyzed the annotations closely to identify some of the challenges in annotating biomedical text with relations based on an ontology or a terminology. RESULTS: We obtain fair to moderate interannotator agreement in the practice phase (0.378-0.475). With improved guidelines and additional semantic equivalence criteria, the agreement increases by 12% (0.415 to 0.536) in the main annotation phase. In addition, we find that agreement increases to 0.688 when the agreement calculation is limited to those predications that are based only on the explicitly provided UMLS concepts and relations. CONCLUSIONS: While interannotator agreement in the practice phase confirms that conceptual annotation is a challenging task, the increasing agreement in the main annotation phase points out that an acceptable level of agreement can be achieved in multiple iterations, by setting stricter guidelines and establishing semantic equivalence criteria. Mapping text to ontological concepts emerges as the main challenge in conceptual annotation. Annotating predications involving biomolecular entities and processes is particularly challenging. While the resulting gold standard is mainly intended to serve as a test collection for our semantic interpreter, we believe that the lessons learned are applicable generally. BioMed Central 2011-12-20 /pmc/articles/PMC3281188/ /pubmed/22185221 http://dx.doi.org/10.1186/1471-2105-12-486 Text en Copyright ©2011 Kilicoglu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Kilicoglu, Halil
Rosemblat, Graciela
Fiszman, Marcelo
Rindflesch, Thomas C
Constructing a semantic predication gold standard from the biomedical literature
title Constructing a semantic predication gold standard from the biomedical literature
title_full Constructing a semantic predication gold standard from the biomedical literature
title_fullStr Constructing a semantic predication gold standard from the biomedical literature
title_full_unstemmed Constructing a semantic predication gold standard from the biomedical literature
title_short Constructing a semantic predication gold standard from the biomedical literature
title_sort constructing a semantic predication gold standard from the biomedical literature
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3281188/
https://www.ncbi.nlm.nih.gov/pubmed/22185221
http://dx.doi.org/10.1186/1471-2105-12-486
work_keys_str_mv AT kilicogluhalil constructingasemanticpredicationgoldstandardfromthebiomedicalliterature
AT rosemblatgraciela constructingasemanticpredicationgoldstandardfromthebiomedicalliterature
AT fiszmanmarcelo constructingasemanticpredicationgoldstandardfromthebiomedicalliterature
AT rindfleschthomasc constructingasemanticpredicationgoldstandardfromthebiomedicalliterature