Cargando…

ProMiner: rule-based protein and gene entity recognition

BACKGROUND: Identification of gene and protein names in biomedical text is a challenging task as the corresponding nomenclature has evolved over time. This has led to multiple synonyms for individual genes and proteins, as well as names that may be ambiguous with other gene names or with general Eng...

Descripción completa

Detalles Bibliográficos
Autores principales: Hanisch, Daniel, Fundel, Katrin, Mevissen, Heinz-Theodor, Zimmer, Ralf, Fluck, Juliane
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869006/
https://www.ncbi.nlm.nih.gov/pubmed/15960826
http://dx.doi.org/10.1186/1471-2105-6-S1-S14
_version_ 1782133425415651328
author Hanisch, Daniel
Fundel, Katrin
Mevissen, Heinz-Theodor
Zimmer, Ralf
Fluck, Juliane
author_facet Hanisch, Daniel
Fundel, Katrin
Mevissen, Heinz-Theodor
Zimmer, Ralf
Fluck, Juliane
author_sort Hanisch, Daniel
collection PubMed
description BACKGROUND: Identification of gene and protein names in biomedical text is a challenging task as the corresponding nomenclature has evolved over time. This has led to multiple synonyms for individual genes and proteins, as well as names that may be ambiguous with other gene names or with general English words. The Gene List Task of the BioCreAtIvE challenge evaluation enables comparison of systems addressing the problem of protein and gene name identification on common benchmark data. METHODS: The ProMiner system uses a pre-processed synonym dictionary to identify potential name occurrences in the biomedical text and associate protein and gene database identifiers with the detected matches. It follows a rule-based approach and its search algorithm is geared towards recognition of multi-word names [1]. To account for the large number of ambiguous synonyms in the considered organisms, the system has been extended to use specific variants of the detection procedure for highly ambiguous and case-sensitive synonyms. Based on all detected synonyms for one abstract, the most plausible database identifiers are associated with the text. Organism specificity is addressed by a simple procedure based on additionally detected organism names in an abstract. RESULTS: The extended ProMiner system has been applied to the test cases of the BioCreAtIvE competition with highly encouraging results. In blind predictions, the system achieved an F-measure of approximately 0.8 for the organisms mouse and fly and about 0.9 for the organism yeast.
format Text
id pubmed-1869006
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18690062007-05-18 ProMiner: rule-based protein and gene entity recognition Hanisch, Daniel Fundel, Katrin Mevissen, Heinz-Theodor Zimmer, Ralf Fluck, Juliane BMC Bioinformatics Report BACKGROUND: Identification of gene and protein names in biomedical text is a challenging task as the corresponding nomenclature has evolved over time. This has led to multiple synonyms for individual genes and proteins, as well as names that may be ambiguous with other gene names or with general English words. The Gene List Task of the BioCreAtIvE challenge evaluation enables comparison of systems addressing the problem of protein and gene name identification on common benchmark data. METHODS: The ProMiner system uses a pre-processed synonym dictionary to identify potential name occurrences in the biomedical text and associate protein and gene database identifiers with the detected matches. It follows a rule-based approach and its search algorithm is geared towards recognition of multi-word names [1]. To account for the large number of ambiguous synonyms in the considered organisms, the system has been extended to use specific variants of the detection procedure for highly ambiguous and case-sensitive synonyms. Based on all detected synonyms for one abstract, the most plausible database identifiers are associated with the text. Organism specificity is addressed by a simple procedure based on additionally detected organism names in an abstract. RESULTS: The extended ProMiner system has been applied to the test cases of the BioCreAtIvE competition with highly encouraging results. In blind predictions, the system achieved an F-measure of approximately 0.8 for the organisms mouse and fly and about 0.9 for the organism yeast. BioMed Central 2005-05-24 /pmc/articles/PMC1869006/ /pubmed/15960826 http://dx.doi.org/10.1186/1471-2105-6-S1-S14 Text en Copyright © 2005 Hanisch et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Report
Hanisch, Daniel
Fundel, Katrin
Mevissen, Heinz-Theodor
Zimmer, Ralf
Fluck, Juliane
ProMiner: rule-based protein and gene entity recognition
title ProMiner: rule-based protein and gene entity recognition
title_full ProMiner: rule-based protein and gene entity recognition
title_fullStr ProMiner: rule-based protein and gene entity recognition
title_full_unstemmed ProMiner: rule-based protein and gene entity recognition
title_short ProMiner: rule-based protein and gene entity recognition
title_sort prominer: rule-based protein and gene entity recognition
topic Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869006/
https://www.ncbi.nlm.nih.gov/pubmed/15960826
http://dx.doi.org/10.1186/1471-2105-6-S1-S14
work_keys_str_mv AT hanischdaniel prominerrulebasedproteinandgeneentityrecognition
AT fundelkatrin prominerrulebasedproteinandgeneentityrecognition
AT mevissenheinztheodor prominerrulebasedproteinandgeneentityrecognition
AT zimmerralf prominerrulebasedproteinandgeneentityrecognition
AT fluckjuliane prominerrulebasedproteinandgeneentityrecognition