Cargando…

Exploring the boundaries: gene and protein identification in biomedical text

BACKGROUND: Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools. METHODS: We present a maximum-entropy based system incorporating a diverse set of features fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Finkel, Jenny, Dingare, Shipra, Manning, Christopher D, Nissim, Malvina, Alex, Beatrice, Grover, Claire
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869019/
https://www.ncbi.nlm.nih.gov/pubmed/15960839
http://dx.doi.org/10.1186/1471-2105-6-S1-S5
_version_ 1782133429334179840
author Finkel, Jenny
Dingare, Shipra
Manning, Christopher D
Nissim, Malvina
Alex, Beatrice
Grover, Claire
author_facet Finkel, Jenny
Dingare, Shipra
Manning, Christopher D
Nissim, Malvina
Alex, Beatrice
Grover, Claire
author_sort Finkel, Jenny
collection PubMed
description BACKGROUND: Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools. METHODS: We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts. RESULTS: This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the "open" evaluation and a precision of 0.78 and recall of 0.85 in the "closed" evaluation. CONCLUSION: Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches.
format Text
id pubmed-1869019
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18690192007-05-18 Exploring the boundaries: gene and protein identification in biomedical text Finkel, Jenny Dingare, Shipra Manning, Christopher D Nissim, Malvina Alex, Beatrice Grover, Claire BMC Bioinformatics Report BACKGROUND: Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools. METHODS: We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts. RESULTS: This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the "open" evaluation and a precision of 0.78 and recall of 0.85 in the "closed" evaluation. CONCLUSION: Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches. BioMed Central 2005-05-24 /pmc/articles/PMC1869019/ /pubmed/15960839 http://dx.doi.org/10.1186/1471-2105-6-S1-S5 Text en Copyright © 2005 Finkel et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Report
Finkel, Jenny
Dingare, Shipra
Manning, Christopher D
Nissim, Malvina
Alex, Beatrice
Grover, Claire
Exploring the boundaries: gene and protein identification in biomedical text
title Exploring the boundaries: gene and protein identification in biomedical text
title_full Exploring the boundaries: gene and protein identification in biomedical text
title_fullStr Exploring the boundaries: gene and protein identification in biomedical text
title_full_unstemmed Exploring the boundaries: gene and protein identification in biomedical text
title_short Exploring the boundaries: gene and protein identification in biomedical text
title_sort exploring the boundaries: gene and protein identification in biomedical text
topic Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869019/
https://www.ncbi.nlm.nih.gov/pubmed/15960839
http://dx.doi.org/10.1186/1471-2105-6-S1-S5
work_keys_str_mv AT finkeljenny exploringtheboundariesgeneandproteinidentificationinbiomedicaltext
AT dingareshipra exploringtheboundariesgeneandproteinidentificationinbiomedicaltext
AT manningchristopherd exploringtheboundariesgeneandproteinidentificationinbiomedicaltext
AT nissimmalvina exploringtheboundariesgeneandproteinidentificationinbiomedicaltext
AT alexbeatrice exploringtheboundariesgeneandproteinidentificationinbiomedicaltext
AT groverclaire exploringtheboundariesgeneandproteinidentificationinbiomedicaltext