Cargando…
Improving protein coreference resolution by simple semantic classification
BACKGROUND: Current research has shown that major difficulties in event extraction for the biomedical domain are traceable to coreference. Therefore, coreference resolution is believed to be useful for improving event extraction. To address coreference resolution in molecular biology literature, the...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3582588/ https://www.ncbi.nlm.nih.gov/pubmed/23157272 http://dx.doi.org/10.1186/1471-2105-13-304 |
_version_ | 1782260597952348160 |
---|---|
author | Nguyen, Ngan Kim, Jin-Dong Miwa, Makoto Matsuzaki, Takuya Tsujii, Junichi |
author_facet | Nguyen, Ngan Kim, Jin-Dong Miwa, Makoto Matsuzaki, Takuya Tsujii, Junichi |
author_sort | Nguyen, Ngan |
collection | PubMed |
description | BACKGROUND: Current research has shown that major difficulties in event extraction for the biomedical domain are traceable to coreference. Therefore, coreference resolution is believed to be useful for improving event extraction. To address coreference resolution in molecular biology literature, the Protein Coreference (COREF) task was arranged in the BioNLP Shared Task (BioNLP-ST, hereafter) 2011, as a supporting task. However, the shared task results indicated that transferring coreference resolution methods developed for other domains to the biological domain was not a straight-forward task, due to the domain differences in the coreference phenomena. RESULTS: We analyzed the contribution of domain-specific information, including the information that indicates the protein type, in a rule-based protein coreference resolution system. In particular, the domain-specific information is encoded into semantic classification modules for which the output is used in different components of the coreference resolution. We compared our system with the top four systems in the BioNLP-ST 2011; surprisingly, we found that the minimal configuration had outperformed the best system in the BioNLP-ST 2011. Analysis of the experimental results revealed that semantic classification, using protein information, has contributed to an increase in performance by 2.3% on the test data, and 4.0% on the development data, in F-score. CONCLUSIONS: The use of domain-specific information in semantic classification is important for effective coreference resolution. Since it is difficult to transfer domain-specific information across different domains, we need to continue seek for methods to utilize such information in coreference resolution. |
format | Online Article Text |
id | pubmed-3582588 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35825882013-02-27 Improving protein coreference resolution by simple semantic classification Nguyen, Ngan Kim, Jin-Dong Miwa, Makoto Matsuzaki, Takuya Tsujii, Junichi BMC Bioinformatics Research Article BACKGROUND: Current research has shown that major difficulties in event extraction for the biomedical domain are traceable to coreference. Therefore, coreference resolution is believed to be useful for improving event extraction. To address coreference resolution in molecular biology literature, the Protein Coreference (COREF) task was arranged in the BioNLP Shared Task (BioNLP-ST, hereafter) 2011, as a supporting task. However, the shared task results indicated that transferring coreference resolution methods developed for other domains to the biological domain was not a straight-forward task, due to the domain differences in the coreference phenomena. RESULTS: We analyzed the contribution of domain-specific information, including the information that indicates the protein type, in a rule-based protein coreference resolution system. In particular, the domain-specific information is encoded into semantic classification modules for which the output is used in different components of the coreference resolution. We compared our system with the top four systems in the BioNLP-ST 2011; surprisingly, we found that the minimal configuration had outperformed the best system in the BioNLP-ST 2011. Analysis of the experimental results revealed that semantic classification, using protein information, has contributed to an increase in performance by 2.3% on the test data, and 4.0% on the development data, in F-score. CONCLUSIONS: The use of domain-specific information in semantic classification is important for effective coreference resolution. Since it is difficult to transfer domain-specific information across different domains, we need to continue seek for methods to utilize such information in coreference resolution. BioMed Central 2012-11-17 /pmc/articles/PMC3582588/ /pubmed/23157272 http://dx.doi.org/10.1186/1471-2105-13-304 Text en Copyright ©2012 Nguyen et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Nguyen, Ngan Kim, Jin-Dong Miwa, Makoto Matsuzaki, Takuya Tsujii, Junichi Improving protein coreference resolution by simple semantic classification |
title | Improving protein coreference resolution by simple semantic classification |
title_full | Improving protein coreference resolution by simple semantic classification |
title_fullStr | Improving protein coreference resolution by simple semantic classification |
title_full_unstemmed | Improving protein coreference resolution by simple semantic classification |
title_short | Improving protein coreference resolution by simple semantic classification |
title_sort | improving protein coreference resolution by simple semantic classification |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3582588/ https://www.ncbi.nlm.nih.gov/pubmed/23157272 http://dx.doi.org/10.1186/1471-2105-13-304 |
work_keys_str_mv | AT nguyenngan improvingproteincoreferenceresolutionbysimplesemanticclassification AT kimjindong improvingproteincoreferenceresolutionbysimplesemanticclassification AT miwamakoto improvingproteincoreferenceresolutionbysimplesemanticclassification AT matsuzakitakuya improvingproteincoreferenceresolutionbysimplesemanticclassification AT tsujiijunichi improvingproteincoreferenceresolutionbysimplesemanticclassification |