Cargando…

Automatically annotating documents with normalized gene lists

BACKGROUND: Document gene normalization is the problem of creating a list of unique identifiers for genes that are mentioned within a document. Automating this process has many potential applications in both information extraction and database curation systems. Here we present two separate solutions...

Descripción completa

Detalles Bibliográficos
Autores principales: Crim, Jeremiah, McDonald, Ryan, Pereira, Fernando
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1771968/
https://www.ncbi.nlm.nih.gov/pubmed/15960825
http://dx.doi.org/10.1186/1471-2105-6-S1-S13
_version_ 1782131720245477376
author Crim, Jeremiah
McDonald, Ryan
Pereira, Fernando
author_facet Crim, Jeremiah
McDonald, Ryan
Pereira, Fernando
author_sort Crim, Jeremiah
collection PubMed
description BACKGROUND: Document gene normalization is the problem of creating a list of unique identifiers for genes that are mentioned within a document. Automating this process has many potential applications in both information extraction and database curation systems. Here we present two separate solutions to this problem. The first is primarily based on standard pattern matching and information extraction techniques. The second and more novel solution uses a statistical classifier to recognize valid gene matches from a list of known gene synonyms. RESULTS: We compare the results of the two systems, analyze their merits and argue that the classification based system is preferable for many reasons including performance, simplicity and robustness. Our best systems attain a balanced precision and recall in the range of 74%–92%, depending on the organism.
format Text
id pubmed-1771968
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-17719682007-01-18 Automatically annotating documents with normalized gene lists Crim, Jeremiah McDonald, Ryan Pereira, Fernando BMC Bioinformatics Report BACKGROUND: Document gene normalization is the problem of creating a list of unique identifiers for genes that are mentioned within a document. Automating this process has many potential applications in both information extraction and database curation systems. Here we present two separate solutions to this problem. The first is primarily based on standard pattern matching and information extraction techniques. The second and more novel solution uses a statistical classifier to recognize valid gene matches from a list of known gene synonyms. RESULTS: We compare the results of the two systems, analyze their merits and argue that the classification based system is preferable for many reasons including performance, simplicity and robustness. Our best systems attain a balanced precision and recall in the range of 74%–92%, depending on the organism. BioMed Central 2005-05-24 /pmc/articles/PMC1771968/ /pubmed/15960825 http://dx.doi.org/10.1186/1471-2105-6-S1-S13 Text en Copyright © 2006 Crim et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Report
Crim, Jeremiah
McDonald, Ryan
Pereira, Fernando
Automatically annotating documents with normalized gene lists
title Automatically annotating documents with normalized gene lists
title_full Automatically annotating documents with normalized gene lists
title_fullStr Automatically annotating documents with normalized gene lists
title_full_unstemmed Automatically annotating documents with normalized gene lists
title_short Automatically annotating documents with normalized gene lists
title_sort automatically annotating documents with normalized gene lists
topic Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1771968/
https://www.ncbi.nlm.nih.gov/pubmed/15960825
http://dx.doi.org/10.1186/1471-2105-6-S1-S13
work_keys_str_mv AT crimjeremiah automaticallyannotatingdocumentswithnormalizedgenelists
AT mcdonaldryan automaticallyannotatingdocumentswithnormalizedgenelists
AT pereirafernando automaticallyannotatingdocumentswithnormalizedgenelists