Cargando…

Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts

BACKGROUND: Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilit...

Descripción completa

Detalles Bibliográficos
Autores principales: Cohen, AM, Hersh, WR, Dubay, C, Spackman, K
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1090552/
https://www.ncbi.nlm.nih.gov/pubmed/15847682
http://dx.doi.org/10.1186/1471-2105-6-103
Descripción
Sumario:BACKGROUND: Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilities of knowledge extraction. The method was applied to the task of automatic gene and protein name synonym extraction. RESULTS: Performance was measured on a test set consisting of about 50,000 abstracts from one year of MEDLINE. Synonyms retrieved from curated genomics databases were used as a gold standard. The system obtained a maximum F-score of 22.21% (23.18% precision and 21.36% recall), with high efficiency in the use of seed pairs. CONCLUSION: The method performs comparably with other studied methods, does not rely on sophisticated named-entity recognition, and requires little initial seed knowledge.