Cargando…

Sortal anaphora resolution to enhance relation extraction from biomedical literature

BACKGROUND: Entity coreference is common in biomedical literature and it can affect text understanding systems that rely on accurate identification of named entities, such as relation extraction and automatic summarization. Coreference resolution is a foundational yet challenging natural language pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Kilicoglu, Halil, Rosemblat, Graciela, Fiszman, Marcelo, Rindflesch, Thomas C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4832532/
https://www.ncbi.nlm.nih.gov/pubmed/27080229
http://dx.doi.org/10.1186/s12859-016-1009-6
_version_ 1782427270610157568
author Kilicoglu, Halil
Rosemblat, Graciela
Fiszman, Marcelo
Rindflesch, Thomas C.
author_facet Kilicoglu, Halil
Rosemblat, Graciela
Fiszman, Marcelo
Rindflesch, Thomas C.
author_sort Kilicoglu, Halil
collection PubMed
description BACKGROUND: Entity coreference is common in biomedical literature and it can affect text understanding systems that rely on accurate identification of named entities, such as relation extraction and automatic summarization. Coreference resolution is a foundational yet challenging natural language processing task which, if performed successfully, is likely to enhance such systems significantly. In this paper, we propose a semantically oriented, rule-based method to resolve sortal anaphora, a specific type of coreference that forms the majority of coreference instances in biomedical literature. The method addresses all entity types and relies on linguistic components of SemRep, a broad-coverage biomedical relation extraction system. It has been incorporated into SemRep, extending its core semantic interpretation capability from sentence level to discourse level. RESULTS: We evaluated our sortal anaphora resolution method in several ways. The first evaluation specifically focused on sortal anaphora relations. Our methodology achieved a F(1) score of 59.6 on the test portion of a manually annotated corpus of 320 Medline abstracts, a 4-fold improvement over the baseline method. Investigating the impact of sortal anaphora resolution on relation extraction, we found that the overall effect was positive, with 50 % of the changes involving uninformative relations being replaced by more specific and informative ones, while 35 % of the changes had no effect, and only 15 % were negative. We estimate that anaphora resolution results in changes in about 1.5 % of approximately 82 million semantic relations extracted from the entire PubMed. CONCLUSIONS: Our results demonstrate that a heavily semantic approach to sortal anaphora resolution is largely effective for biomedical literature. Our evaluation and error analysis highlight some areas for further improvements, such as coordination processing and intra-sentential antecedent selection. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1009-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4832532
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48325322016-04-16 Sortal anaphora resolution to enhance relation extraction from biomedical literature Kilicoglu, Halil Rosemblat, Graciela Fiszman, Marcelo Rindflesch, Thomas C. BMC Bioinformatics Research Article BACKGROUND: Entity coreference is common in biomedical literature and it can affect text understanding systems that rely on accurate identification of named entities, such as relation extraction and automatic summarization. Coreference resolution is a foundational yet challenging natural language processing task which, if performed successfully, is likely to enhance such systems significantly. In this paper, we propose a semantically oriented, rule-based method to resolve sortal anaphora, a specific type of coreference that forms the majority of coreference instances in biomedical literature. The method addresses all entity types and relies on linguistic components of SemRep, a broad-coverage biomedical relation extraction system. It has been incorporated into SemRep, extending its core semantic interpretation capability from sentence level to discourse level. RESULTS: We evaluated our sortal anaphora resolution method in several ways. The first evaluation specifically focused on sortal anaphora relations. Our methodology achieved a F(1) score of 59.6 on the test portion of a manually annotated corpus of 320 Medline abstracts, a 4-fold improvement over the baseline method. Investigating the impact of sortal anaphora resolution on relation extraction, we found that the overall effect was positive, with 50 % of the changes involving uninformative relations being replaced by more specific and informative ones, while 35 % of the changes had no effect, and only 15 % were negative. We estimate that anaphora resolution results in changes in about 1.5 % of approximately 82 million semantic relations extracted from the entire PubMed. CONCLUSIONS: Our results demonstrate that a heavily semantic approach to sortal anaphora resolution is largely effective for biomedical literature. Our evaluation and error analysis highlight some areas for further improvements, such as coordination processing and intra-sentential antecedent selection. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1009-6) contains supplementary material, which is available to authorized users. BioMed Central 2016-04-14 /pmc/articles/PMC4832532/ /pubmed/27080229 http://dx.doi.org/10.1186/s12859-016-1009-6 Text en © Kilicoglu et al. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Kilicoglu, Halil
Rosemblat, Graciela
Fiszman, Marcelo
Rindflesch, Thomas C.
Sortal anaphora resolution to enhance relation extraction from biomedical literature
title Sortal anaphora resolution to enhance relation extraction from biomedical literature
title_full Sortal anaphora resolution to enhance relation extraction from biomedical literature
title_fullStr Sortal anaphora resolution to enhance relation extraction from biomedical literature
title_full_unstemmed Sortal anaphora resolution to enhance relation extraction from biomedical literature
title_short Sortal anaphora resolution to enhance relation extraction from biomedical literature
title_sort sortal anaphora resolution to enhance relation extraction from biomedical literature
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4832532/
https://www.ncbi.nlm.nih.gov/pubmed/27080229
http://dx.doi.org/10.1186/s12859-016-1009-6
work_keys_str_mv AT kilicogluhalil sortalanaphoraresolutiontoenhancerelationextractionfrombiomedicalliterature
AT rosemblatgraciela sortalanaphoraresolutiontoenhancerelationextractionfrombiomedicalliterature
AT fiszmanmarcelo sortalanaphoraresolutiontoenhancerelationextractionfrombiomedicalliterature
AT rindfleschthomasc sortalanaphoraresolutiontoenhancerelationextractionfrombiomedicalliterature