Cargando…
Distinguishing the species of biomedical named entities for term identification
BACKGROUND: Term identification is the task of grounding ambiguous mentions of biomedical named entities in text to unique database identifiers. Previous work on term identification has focused on studying species-specific documents. However, full-length articles often describe entities across a num...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2586755/ https://www.ncbi.nlm.nih.gov/pubmed/19025692 http://dx.doi.org/10.1186/1471-2105-9-S11-S6 |
Sumario: | BACKGROUND: Term identification is the task of grounding ambiguous mentions of biomedical named entities in text to unique database identifiers. Previous work on term identification has focused on studying species-specific documents. However, full-length articles often describe entities across a number of species, in which case resolving the ambiguity of model organisms in entities is critical to achieving accurate term identification. RESULTS: We developed and compared a number of rule-based and machine-learning based approaches to resolving species ambiguity in mentions of biomedical named entities, and demonstrated that a hybrid method achieved the best overall accuracy at 71.7%, as tested on the gold-standard ITI-TXM corpora. By utilising the species information predicted by the hybrid tagger, our rule-based term identification system was improved significantly by up to 11.6%. CONCLUSION: This paper shows that, in the context of identifying terms involving multiple model organisms, integration of an accurate species disambiguation system can significantly improve the performance of term identification systems. |
---|