Cargando…

Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text

Coreference resolution is one of the fundamental and challenging tasks in natural language processing. Resolving coreference successfully can have a significant positive effect on downstream natural language processing tasks, such as information extraction and question answering. The importance of c...

Descripción completa

Detalles Bibliográficos
Autores principales: Kilicoglu, Halil, Demner-Fushman, Dina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4774913/
https://www.ncbi.nlm.nih.gov/pubmed/26934708
http://dx.doi.org/10.1371/journal.pone.0148538
_version_ 1782418985365536768
author Kilicoglu, Halil
Demner-Fushman, Dina
author_facet Kilicoglu, Halil
Demner-Fushman, Dina
author_sort Kilicoglu, Halil
collection PubMed
description Coreference resolution is one of the fundamental and challenging tasks in natural language processing. Resolving coreference successfully can have a significant positive effect on downstream natural language processing tasks, such as information extraction and question answering. The importance of coreference resolution for biomedical text analysis applications has increasingly been acknowledged. One of the difficulties in coreference resolution stems from the fact that distinct types of coreference (e.g., anaphora, appositive) are expressed with a variety of lexical and syntactic means (e.g., personal pronouns, definite noun phrases), and that resolution of each combination often requires a different approach. In the biomedical domain, it is common for coreference annotation and resolution efforts to focus on specific subcategories of coreference deemed important for the downstream task. In the current work, we aim to address some of these concerns regarding coreference resolution in biomedical text. We propose a general, modular framework underpinned by a smorgasbord architecture (Bio-SCoRes), which incorporates a variety of coreference types, their mentions and allows fine-grained specification of resolution strategies to resolve coreference of distinct coreference type-mention pairs. For development and evaluation, we used a corpus of structured drug labels annotated with fine-grained coreference information. In addition, we evaluated our approach on two other corpora (i2b2/VA discharge summaries and protein coreference dataset) to investigate its generality and ease of adaptation to other biomedical text types. Our results demonstrate the usefulness of our novel smorgasbord architecture. The specific pipelines based on the architecture perform successfully in linking coreferential mention pairs, while we find that recognition of full mention clusters is more challenging. The corpus of structured drug labels (SPL) as well as the components of Bio-SCoRes and some of the pipelines based on it are publicly available at https://github.com/kilicogluh/Bio-SCoRes. We believe that Bio-SCoRes can serve as a strong and extensible baseline system for coreference resolution of biomedical text.
format Online
Article
Text
id pubmed-4774913
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-47749132016-03-10 Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text Kilicoglu, Halil Demner-Fushman, Dina PLoS One Research Article Coreference resolution is one of the fundamental and challenging tasks in natural language processing. Resolving coreference successfully can have a significant positive effect on downstream natural language processing tasks, such as information extraction and question answering. The importance of coreference resolution for biomedical text analysis applications has increasingly been acknowledged. One of the difficulties in coreference resolution stems from the fact that distinct types of coreference (e.g., anaphora, appositive) are expressed with a variety of lexical and syntactic means (e.g., personal pronouns, definite noun phrases), and that resolution of each combination often requires a different approach. In the biomedical domain, it is common for coreference annotation and resolution efforts to focus on specific subcategories of coreference deemed important for the downstream task. In the current work, we aim to address some of these concerns regarding coreference resolution in biomedical text. We propose a general, modular framework underpinned by a smorgasbord architecture (Bio-SCoRes), which incorporates a variety of coreference types, their mentions and allows fine-grained specification of resolution strategies to resolve coreference of distinct coreference type-mention pairs. For development and evaluation, we used a corpus of structured drug labels annotated with fine-grained coreference information. In addition, we evaluated our approach on two other corpora (i2b2/VA discharge summaries and protein coreference dataset) to investigate its generality and ease of adaptation to other biomedical text types. Our results demonstrate the usefulness of our novel smorgasbord architecture. The specific pipelines based on the architecture perform successfully in linking coreferential mention pairs, while we find that recognition of full mention clusters is more challenging. The corpus of structured drug labels (SPL) as well as the components of Bio-SCoRes and some of the pipelines based on it are publicly available at https://github.com/kilicogluh/Bio-SCoRes. We believe that Bio-SCoRes can serve as a strong and extensible baseline system for coreference resolution of biomedical text. Public Library of Science 2016-03-02 /pmc/articles/PMC4774913/ /pubmed/26934708 http://dx.doi.org/10.1371/journal.pone.0148538 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Kilicoglu, Halil
Demner-Fushman, Dina
Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text
title Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text
title_full Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text
title_fullStr Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text
title_full_unstemmed Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text
title_short Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text
title_sort bio-scores: a smorgasbord architecture for coreference resolution in biomedical text
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4774913/
https://www.ncbi.nlm.nih.gov/pubmed/26934708
http://dx.doi.org/10.1371/journal.pone.0148538
work_keys_str_mv AT kilicogluhalil bioscoresasmorgasbordarchitectureforcoreferenceresolutioninbiomedicaltext
AT demnerfushmandina bioscoresasmorgasbordarchitectureforcoreferenceresolutioninbiomedicaltext