Cargando…

A graph-search framework for associating gene identifiers with documents

BACKGROUND: One step in the model organism database curation process is to find, for each article, the identifier of every gene discussed in the article. We consider a relaxation of this problem suitable for semi-automated systems, in which each article is associated with a ranked list of possible g...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cohen, William W, Minkov, Einat
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1617121/ https://www.ncbi.nlm.nih.gov/pubmed/17032441 http://dx.doi.org/10.1186/1471-2105-7-440

_version_	1782130508871761920
author	Cohen, William W Minkov, Einat
author_facet	Cohen, William W Minkov, Einat
author_sort	Cohen, William W
collection	PubMed
description	BACKGROUND: One step in the model organism database curation process is to find, for each article, the identifier of every gene discussed in the article. We consider a relaxation of this problem suitable for semi-automated systems, in which each article is associated with a ranked list of possible gene identifiers, and experimentally compare methods for solving this geneId ranking problem. In addition to baseline approaches based on combining named entity recognition (NER) systems with a "soft dictionary" of gene synonyms, we evaluate a graph-based method which combines the outputs of multiple NER systems, as well as other sources of information, and a learning method for reranking the output of the graph-based method. RESULTS: We show that named entity recognition (NER) systems with similar F-measure performance can have significantly different performance when used with a soft dictionary for geneId-ranking. The graph-based approach can outperform any of its component NER systems, even without learning, and learning can further improve the performance of the graph-based ranking approach. CONCLUSION: The utility of a named entity recognition (NER) system for geneId-finding may not be accurately predicted by its entity-level F1 performance, the most common performance measure. GeneId-ranking systems are best implemented by combining several NER systems. With appropriate combination methods, usefully accurate geneId-ranking systems can be constructed based on easily-available resources, without resorting to problem-specific, engineered components.
format	Text
id	pubmed-1617121
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-16171212006-10-20 A graph-search framework for associating gene identifiers with documents Cohen, William W Minkov, Einat BMC Bioinformatics Methodology Article BACKGROUND: One step in the model organism database curation process is to find, for each article, the identifier of every gene discussed in the article. We consider a relaxation of this problem suitable for semi-automated systems, in which each article is associated with a ranked list of possible gene identifiers, and experimentally compare methods for solving this geneId ranking problem. In addition to baseline approaches based on combining named entity recognition (NER) systems with a "soft dictionary" of gene synonyms, we evaluate a graph-based method which combines the outputs of multiple NER systems, as well as other sources of information, and a learning method for reranking the output of the graph-based method. RESULTS: We show that named entity recognition (NER) systems with similar F-measure performance can have significantly different performance when used with a soft dictionary for geneId-ranking. The graph-based approach can outperform any of its component NER systems, even without learning, and learning can further improve the performance of the graph-based ranking approach. CONCLUSION: The utility of a named entity recognition (NER) system for geneId-finding may not be accurately predicted by its entity-level F1 performance, the most common performance measure. GeneId-ranking systems are best implemented by combining several NER systems. With appropriate combination methods, usefully accurate geneId-ranking systems can be constructed based on easily-available resources, without resorting to problem-specific, engineered components. BioMed Central 2006-10-10 /pmc/articles/PMC1617121/ /pubmed/17032441 http://dx.doi.org/10.1186/1471-2105-7-440 Text en Copyright © 2006 Cohen and Minkov; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Cohen, William W Minkov, Einat A graph-search framework for associating gene identifiers with documents
title	A graph-search framework for associating gene identifiers with documents
title_full	A graph-search framework for associating gene identifiers with documents
title_fullStr	A graph-search framework for associating gene identifiers with documents
title_full_unstemmed	A graph-search framework for associating gene identifiers with documents
title_short	A graph-search framework for associating gene identifiers with documents
title_sort	graph-search framework for associating gene identifiers with documents
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1617121/ https://www.ncbi.nlm.nih.gov/pubmed/17032441 http://dx.doi.org/10.1186/1471-2105-7-440
work_keys_str_mv	AT cohenwilliamw agraphsearchframeworkforassociatinggeneidentifierswithdocuments AT minkoveinat agraphsearchframeworkforassociatinggeneidentifierswithdocuments AT cohenwilliamw graphsearchframeworkforassociatinggeneidentifierswithdocuments AT minkoveinat graphsearchframeworkforassociatinggeneidentifierswithdocuments

A graph-search framework for associating gene identifiers with documents

Ejemplares similares