Cargando…
Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text
Coreference resolution is one of the fundamental and challenging tasks in natural language processing. Resolving coreference successfully can have a significant positive effect on downstream natural language processing tasks, such as information extraction and question answering. The importance of c...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4774913/ https://www.ncbi.nlm.nih.gov/pubmed/26934708 http://dx.doi.org/10.1371/journal.pone.0148538 |
_version_ | 1782418985365536768 |
---|---|
author | Kilicoglu, Halil Demner-Fushman, Dina |
author_facet | Kilicoglu, Halil Demner-Fushman, Dina |
author_sort | Kilicoglu, Halil |
collection | PubMed |
description | Coreference resolution is one of the fundamental and challenging tasks in natural language processing. Resolving coreference successfully can have a significant positive effect on downstream natural language processing tasks, such as information extraction and question answering. The importance of coreference resolution for biomedical text analysis applications has increasingly been acknowledged. One of the difficulties in coreference resolution stems from the fact that distinct types of coreference (e.g., anaphora, appositive) are expressed with a variety of lexical and syntactic means (e.g., personal pronouns, definite noun phrases), and that resolution of each combination often requires a different approach. In the biomedical domain, it is common for coreference annotation and resolution efforts to focus on specific subcategories of coreference deemed important for the downstream task. In the current work, we aim to address some of these concerns regarding coreference resolution in biomedical text. We propose a general, modular framework underpinned by a smorgasbord architecture (Bio-SCoRes), which incorporates a variety of coreference types, their mentions and allows fine-grained specification of resolution strategies to resolve coreference of distinct coreference type-mention pairs. For development and evaluation, we used a corpus of structured drug labels annotated with fine-grained coreference information. In addition, we evaluated our approach on two other corpora (i2b2/VA discharge summaries and protein coreference dataset) to investigate its generality and ease of adaptation to other biomedical text types. Our results demonstrate the usefulness of our novel smorgasbord architecture. The specific pipelines based on the architecture perform successfully in linking coreferential mention pairs, while we find that recognition of full mention clusters is more challenging. The corpus of structured drug labels (SPL) as well as the components of Bio-SCoRes and some of the pipelines based on it are publicly available at https://github.com/kilicogluh/Bio-SCoRes. We believe that Bio-SCoRes can serve as a strong and extensible baseline system for coreference resolution of biomedical text. |
format | Online Article Text |
id | pubmed-4774913 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-47749132016-03-10 Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text Kilicoglu, Halil Demner-Fushman, Dina PLoS One Research Article Coreference resolution is one of the fundamental and challenging tasks in natural language processing. Resolving coreference successfully can have a significant positive effect on downstream natural language processing tasks, such as information extraction and question answering. The importance of coreference resolution for biomedical text analysis applications has increasingly been acknowledged. One of the difficulties in coreference resolution stems from the fact that distinct types of coreference (e.g., anaphora, appositive) are expressed with a variety of lexical and syntactic means (e.g., personal pronouns, definite noun phrases), and that resolution of each combination often requires a different approach. In the biomedical domain, it is common for coreference annotation and resolution efforts to focus on specific subcategories of coreference deemed important for the downstream task. In the current work, we aim to address some of these concerns regarding coreference resolution in biomedical text. We propose a general, modular framework underpinned by a smorgasbord architecture (Bio-SCoRes), which incorporates a variety of coreference types, their mentions and allows fine-grained specification of resolution strategies to resolve coreference of distinct coreference type-mention pairs. For development and evaluation, we used a corpus of structured drug labels annotated with fine-grained coreference information. In addition, we evaluated our approach on two other corpora (i2b2/VA discharge summaries and protein coreference dataset) to investigate its generality and ease of adaptation to other biomedical text types. Our results demonstrate the usefulness of our novel smorgasbord architecture. The specific pipelines based on the architecture perform successfully in linking coreferential mention pairs, while we find that recognition of full mention clusters is more challenging. The corpus of structured drug labels (SPL) as well as the components of Bio-SCoRes and some of the pipelines based on it are publicly available at https://github.com/kilicogluh/Bio-SCoRes. We believe that Bio-SCoRes can serve as a strong and extensible baseline system for coreference resolution of biomedical text. Public Library of Science 2016-03-02 /pmc/articles/PMC4774913/ /pubmed/26934708 http://dx.doi.org/10.1371/journal.pone.0148538 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication. |
spellingShingle | Research Article Kilicoglu, Halil Demner-Fushman, Dina Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text |
title | Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text |
title_full | Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text |
title_fullStr | Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text |
title_full_unstemmed | Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text |
title_short | Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text |
title_sort | bio-scores: a smorgasbord architecture for coreference resolution in biomedical text |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4774913/ https://www.ncbi.nlm.nih.gov/pubmed/26934708 http://dx.doi.org/10.1371/journal.pone.0148538 |
work_keys_str_mv | AT kilicogluhalil bioscoresasmorgasbordarchitectureforcoreferenceresolutioninbiomedicaltext AT demnerfushmandina bioscoresasmorgasbordarchitectureforcoreferenceresolutioninbiomedicaltext |