Cargando…
BioCreAtIvE Task1A: entity identification with a stochastic tagger
BACKGROUND: Our approach to Task 1A was inspired by Tanabe and Wilbur's ABGene system [1,2]. Like Tanabe and Wilbur, we approached the problem as one of part-of-speech tagging, adding a GENE tag to the standard tag set. Where their system uses the Brill tagger, we used TnT, the Trigrams 'n...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2005
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869018/ https://www.ncbi.nlm.nih.gov/pubmed/15960838 http://dx.doi.org/10.1186/1471-2105-6-S1-S4 |
_version_ | 1782133428951449600 |
---|---|
author | Kinoshita, Shuhei Cohen, K Bretonnel Ogren, Philip V Hunter, Lawrence |
author_facet | Kinoshita, Shuhei Cohen, K Bretonnel Ogren, Philip V Hunter, Lawrence |
author_sort | Kinoshita, Shuhei |
collection | PubMed |
description | BACKGROUND: Our approach to Task 1A was inspired by Tanabe and Wilbur's ABGene system [1,2]. Like Tanabe and Wilbur, we approached the problem as one of part-of-speech tagging, adding a GENE tag to the standard tag set. Where their system uses the Brill tagger, we used TnT, the Trigrams 'n' Tags HMM-based part-of-speech tagger [3]. Based on careful error analysis, we implemented a set of post-processing rules to correct both false positives and false negatives. We participated in both the open and the closed divisions; for the open division, we made use of data from NCBI. RESULTS: Our base system without post-processing achieved a precision and recall of 68.0% and 77.2%, respectively, giving an F-measure of 72.3%. The full system with post-processing achieved a precision and recall of 80.3% and 80.5% giving an F-measure of 80.4%. We achieved a slight improvement (F-measure = 80.9%) by employing a dictionary-based post-processing step for the open division. We placed third in both the open and the closed division. CONCLUSION: Our results show that a part-of-speech tagger can be augmented with post-processing rules resulting in an entity identification system that competes well with other approaches. |
format | Text |
id | pubmed-1869018 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2005 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-18690182007-05-18 BioCreAtIvE Task1A: entity identification with a stochastic tagger Kinoshita, Shuhei Cohen, K Bretonnel Ogren, Philip V Hunter, Lawrence BMC Bioinformatics Report BACKGROUND: Our approach to Task 1A was inspired by Tanabe and Wilbur's ABGene system [1,2]. Like Tanabe and Wilbur, we approached the problem as one of part-of-speech tagging, adding a GENE tag to the standard tag set. Where their system uses the Brill tagger, we used TnT, the Trigrams 'n' Tags HMM-based part-of-speech tagger [3]. Based on careful error analysis, we implemented a set of post-processing rules to correct both false positives and false negatives. We participated in both the open and the closed divisions; for the open division, we made use of data from NCBI. RESULTS: Our base system without post-processing achieved a precision and recall of 68.0% and 77.2%, respectively, giving an F-measure of 72.3%. The full system with post-processing achieved a precision and recall of 80.3% and 80.5% giving an F-measure of 80.4%. We achieved a slight improvement (F-measure = 80.9%) by employing a dictionary-based post-processing step for the open division. We placed third in both the open and the closed division. CONCLUSION: Our results show that a part-of-speech tagger can be augmented with post-processing rules resulting in an entity identification system that competes well with other approaches. BioMed Central 2005-05-24 /pmc/articles/PMC1869018/ /pubmed/15960838 http://dx.doi.org/10.1186/1471-2105-6-S1-S4 Text en Copyright © 2005 Kinoshita et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Report Kinoshita, Shuhei Cohen, K Bretonnel Ogren, Philip V Hunter, Lawrence BioCreAtIvE Task1A: entity identification with a stochastic tagger |
title | BioCreAtIvE Task1A: entity identification with a stochastic tagger |
title_full | BioCreAtIvE Task1A: entity identification with a stochastic tagger |
title_fullStr | BioCreAtIvE Task1A: entity identification with a stochastic tagger |
title_full_unstemmed | BioCreAtIvE Task1A: entity identification with a stochastic tagger |
title_short | BioCreAtIvE Task1A: entity identification with a stochastic tagger |
title_sort | biocreative task1a: entity identification with a stochastic tagger |
topic | Report |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869018/ https://www.ncbi.nlm.nih.gov/pubmed/15960838 http://dx.doi.org/10.1186/1471-2105-6-S1-S4 |
work_keys_str_mv | AT kinoshitashuhei biocreativetask1aentityidentificationwithastochastictagger AT cohenkbretonnel biocreativetask1aentityidentificationwithastochastictagger AT ogrenphilipv biocreativetask1aentityidentificationwithastochastictagger AT hunterlawrence biocreativetask1aentityidentificationwithastochastictagger |