Cargando…
Automatically annotating documents with normalized gene lists
BACKGROUND: Document gene normalization is the problem of creating a list of unique identifiers for genes that are mentioned within a document. Automating this process has many potential applications in both information extraction and database curation systems. Here we present two separate solutions...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2005
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1771968/ https://www.ncbi.nlm.nih.gov/pubmed/15960825 http://dx.doi.org/10.1186/1471-2105-6-S1-S13 |
_version_ | 1782131720245477376 |
---|---|
author | Crim, Jeremiah McDonald, Ryan Pereira, Fernando |
author_facet | Crim, Jeremiah McDonald, Ryan Pereira, Fernando |
author_sort | Crim, Jeremiah |
collection | PubMed |
description | BACKGROUND: Document gene normalization is the problem of creating a list of unique identifiers for genes that are mentioned within a document. Automating this process has many potential applications in both information extraction and database curation systems. Here we present two separate solutions to this problem. The first is primarily based on standard pattern matching and information extraction techniques. The second and more novel solution uses a statistical classifier to recognize valid gene matches from a list of known gene synonyms. RESULTS: We compare the results of the two systems, analyze their merits and argue that the classification based system is preferable for many reasons including performance, simplicity and robustness. Our best systems attain a balanced precision and recall in the range of 74%–92%, depending on the organism. |
format | Text |
id | pubmed-1771968 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2005 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-17719682007-01-18 Automatically annotating documents with normalized gene lists Crim, Jeremiah McDonald, Ryan Pereira, Fernando BMC Bioinformatics Report BACKGROUND: Document gene normalization is the problem of creating a list of unique identifiers for genes that are mentioned within a document. Automating this process has many potential applications in both information extraction and database curation systems. Here we present two separate solutions to this problem. The first is primarily based on standard pattern matching and information extraction techniques. The second and more novel solution uses a statistical classifier to recognize valid gene matches from a list of known gene synonyms. RESULTS: We compare the results of the two systems, analyze their merits and argue that the classification based system is preferable for many reasons including performance, simplicity and robustness. Our best systems attain a balanced precision and recall in the range of 74%–92%, depending on the organism. BioMed Central 2005-05-24 /pmc/articles/PMC1771968/ /pubmed/15960825 http://dx.doi.org/10.1186/1471-2105-6-S1-S13 Text en Copyright © 2006 Crim et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Report Crim, Jeremiah McDonald, Ryan Pereira, Fernando Automatically annotating documents with normalized gene lists |
title | Automatically annotating documents with normalized gene lists |
title_full | Automatically annotating documents with normalized gene lists |
title_fullStr | Automatically annotating documents with normalized gene lists |
title_full_unstemmed | Automatically annotating documents with normalized gene lists |
title_short | Automatically annotating documents with normalized gene lists |
title_sort | automatically annotating documents with normalized gene lists |
topic | Report |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1771968/ https://www.ncbi.nlm.nih.gov/pubmed/15960825 http://dx.doi.org/10.1186/1471-2105-6-S1-S13 |
work_keys_str_mv | AT crimjeremiah automaticallyannotatingdocumentswithnormalizedgenelists AT mcdonaldryan automaticallyannotatingdocumentswithnormalizedgenelists AT pereirafernando automaticallyannotatingdocumentswithnormalizedgenelists |