Cargando…
Exploring the boundaries: gene and protein identification in biomedical text
BACKGROUND: Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools. METHODS: We present a maximum-entropy based system incorporating a diverse set of features fo...
Autores principales: | , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2005
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869019/ https://www.ncbi.nlm.nih.gov/pubmed/15960839 http://dx.doi.org/10.1186/1471-2105-6-S1-S5 |
_version_ | 1782133429334179840 |
---|---|
author | Finkel, Jenny Dingare, Shipra Manning, Christopher D Nissim, Malvina Alex, Beatrice Grover, Claire |
author_facet | Finkel, Jenny Dingare, Shipra Manning, Christopher D Nissim, Malvina Alex, Beatrice Grover, Claire |
author_sort | Finkel, Jenny |
collection | PubMed |
description | BACKGROUND: Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools. METHODS: We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts. RESULTS: This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the "open" evaluation and a precision of 0.78 and recall of 0.85 in the "closed" evaluation. CONCLUSION: Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches. |
format | Text |
id | pubmed-1869019 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2005 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-18690192007-05-18 Exploring the boundaries: gene and protein identification in biomedical text Finkel, Jenny Dingare, Shipra Manning, Christopher D Nissim, Malvina Alex, Beatrice Grover, Claire BMC Bioinformatics Report BACKGROUND: Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools. METHODS: We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts. RESULTS: This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the "open" evaluation and a precision of 0.78 and recall of 0.85 in the "closed" evaluation. CONCLUSION: Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches. BioMed Central 2005-05-24 /pmc/articles/PMC1869019/ /pubmed/15960839 http://dx.doi.org/10.1186/1471-2105-6-S1-S5 Text en Copyright © 2005 Finkel et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Report Finkel, Jenny Dingare, Shipra Manning, Christopher D Nissim, Malvina Alex, Beatrice Grover, Claire Exploring the boundaries: gene and protein identification in biomedical text |
title | Exploring the boundaries: gene and protein identification in biomedical text |
title_full | Exploring the boundaries: gene and protein identification in biomedical text |
title_fullStr | Exploring the boundaries: gene and protein identification in biomedical text |
title_full_unstemmed | Exploring the boundaries: gene and protein identification in biomedical text |
title_short | Exploring the boundaries: gene and protein identification in biomedical text |
title_sort | exploring the boundaries: gene and protein identification in biomedical text |
topic | Report |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869019/ https://www.ncbi.nlm.nih.gov/pubmed/15960839 http://dx.doi.org/10.1186/1471-2105-6-S1-S5 |
work_keys_str_mv | AT finkeljenny exploringtheboundariesgeneandproteinidentificationinbiomedicaltext AT dingareshipra exploringtheboundariesgeneandproteinidentificationinbiomedicaltext AT manningchristopherd exploringtheboundariesgeneandproteinidentificationinbiomedicaltext AT nissimmalvina exploringtheboundariesgeneandproteinidentificationinbiomedicaltext AT alexbeatrice exploringtheboundariesgeneandproteinidentificationinbiomedicaltext AT groverclaire exploringtheboundariesgeneandproteinidentificationinbiomedicaltext |