Cargando…

Improving protein coreference resolution by simple semantic classification

BACKGROUND: Current research has shown that major difficulties in event extraction for the biomedical domain are traceable to coreference. Therefore, coreference resolution is believed to be useful for improving event extraction. To address coreference resolution in molecular biology literature, the...

Descripción completa

Detalles Bibliográficos
Autores principales: Nguyen, Ngan, Kim, Jin-Dong, Miwa, Makoto, Matsuzaki, Takuya, Tsujii, Junichi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3582588/
https://www.ncbi.nlm.nih.gov/pubmed/23157272
http://dx.doi.org/10.1186/1471-2105-13-304
_version_ 1782260597952348160
author Nguyen, Ngan
Kim, Jin-Dong
Miwa, Makoto
Matsuzaki, Takuya
Tsujii, Junichi
author_facet Nguyen, Ngan
Kim, Jin-Dong
Miwa, Makoto
Matsuzaki, Takuya
Tsujii, Junichi
author_sort Nguyen, Ngan
collection PubMed
description BACKGROUND: Current research has shown that major difficulties in event extraction for the biomedical domain are traceable to coreference. Therefore, coreference resolution is believed to be useful for improving event extraction. To address coreference resolution in molecular biology literature, the Protein Coreference (COREF) task was arranged in the BioNLP Shared Task (BioNLP-ST, hereafter) 2011, as a supporting task. However, the shared task results indicated that transferring coreference resolution methods developed for other domains to the biological domain was not a straight-forward task, due to the domain differences in the coreference phenomena. RESULTS: We analyzed the contribution of domain-specific information, including the information that indicates the protein type, in a rule-based protein coreference resolution system. In particular, the domain-specific information is encoded into semantic classification modules for which the output is used in different components of the coreference resolution. We compared our system with the top four systems in the BioNLP-ST 2011; surprisingly, we found that the minimal configuration had outperformed the best system in the BioNLP-ST 2011. Analysis of the experimental results revealed that semantic classification, using protein information, has contributed to an increase in performance by 2.3% on the test data, and 4.0% on the development data, in F-score. CONCLUSIONS: The use of domain-specific information in semantic classification is important for effective coreference resolution. Since it is difficult to transfer domain-specific information across different domains, we need to continue seek for methods to utilize such information in coreference resolution.
format Online
Article
Text
id pubmed-3582588
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35825882013-02-27 Improving protein coreference resolution by simple semantic classification Nguyen, Ngan Kim, Jin-Dong Miwa, Makoto Matsuzaki, Takuya Tsujii, Junichi BMC Bioinformatics Research Article BACKGROUND: Current research has shown that major difficulties in event extraction for the biomedical domain are traceable to coreference. Therefore, coreference resolution is believed to be useful for improving event extraction. To address coreference resolution in molecular biology literature, the Protein Coreference (COREF) task was arranged in the BioNLP Shared Task (BioNLP-ST, hereafter) 2011, as a supporting task. However, the shared task results indicated that transferring coreference resolution methods developed for other domains to the biological domain was not a straight-forward task, due to the domain differences in the coreference phenomena. RESULTS: We analyzed the contribution of domain-specific information, including the information that indicates the protein type, in a rule-based protein coreference resolution system. In particular, the domain-specific information is encoded into semantic classification modules for which the output is used in different components of the coreference resolution. We compared our system with the top four systems in the BioNLP-ST 2011; surprisingly, we found that the minimal configuration had outperformed the best system in the BioNLP-ST 2011. Analysis of the experimental results revealed that semantic classification, using protein information, has contributed to an increase in performance by 2.3% on the test data, and 4.0% on the development data, in F-score. CONCLUSIONS: The use of domain-specific information in semantic classification is important for effective coreference resolution. Since it is difficult to transfer domain-specific information across different domains, we need to continue seek for methods to utilize such information in coreference resolution. BioMed Central 2012-11-17 /pmc/articles/PMC3582588/ /pubmed/23157272 http://dx.doi.org/10.1186/1471-2105-13-304 Text en Copyright ©2012 Nguyen et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Nguyen, Ngan
Kim, Jin-Dong
Miwa, Makoto
Matsuzaki, Takuya
Tsujii, Junichi
Improving protein coreference resolution by simple semantic classification
title Improving protein coreference resolution by simple semantic classification
title_full Improving protein coreference resolution by simple semantic classification
title_fullStr Improving protein coreference resolution by simple semantic classification
title_full_unstemmed Improving protein coreference resolution by simple semantic classification
title_short Improving protein coreference resolution by simple semantic classification
title_sort improving protein coreference resolution by simple semantic classification
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3582588/
https://www.ncbi.nlm.nih.gov/pubmed/23157272
http://dx.doi.org/10.1186/1471-2105-13-304
work_keys_str_mv AT nguyenngan improvingproteincoreferenceresolutionbysimplesemanticclassification
AT kimjindong improvingproteincoreferenceresolutionbysimplesemanticclassification
AT miwamakoto improvingproteincoreferenceresolutionbysimplesemanticclassification
AT matsuzakitakuya improvingproteincoreferenceresolutionbysimplesemanticclassification
AT tsujiijunichi improvingproteincoreferenceresolutionbysimplesemanticclassification