Cargando…

Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts

BACKGROUND: Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilit...

Descripción completa

Detalles Bibliográficos
Autores principales: Cohen, AM, Hersh, WR, Dubay, C, Spackman, K
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1090552/
https://www.ncbi.nlm.nih.gov/pubmed/15847682
http://dx.doi.org/10.1186/1471-2105-6-103
_version_ 1782123883091984384
author Cohen, AM
Hersh, WR
Dubay, C
Spackman, K
author_facet Cohen, AM
Hersh, WR
Dubay, C
Spackman, K
author_sort Cohen, AM
collection PubMed
description BACKGROUND: Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilities of knowledge extraction. The method was applied to the task of automatic gene and protein name synonym extraction. RESULTS: Performance was measured on a test set consisting of about 50,000 abstracts from one year of MEDLINE. Synonyms retrieved from curated genomics databases were used as a gold standard. The system obtained a maximum F-score of 22.21% (23.18% precision and 21.36% recall), with high efficiency in the use of seed pairs. CONCLUSION: The method performs comparably with other studied methods, does not rely on sophisticated named-entity recognition, and requires little initial seed knowledge.
format Text
id pubmed-1090552
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-10905522005-05-07 Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts Cohen, AM Hersh, WR Dubay, C Spackman, K BMC Bioinformatics Methodology Article BACKGROUND: Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilities of knowledge extraction. The method was applied to the task of automatic gene and protein name synonym extraction. RESULTS: Performance was measured on a test set consisting of about 50,000 abstracts from one year of MEDLINE. Synonyms retrieved from curated genomics databases were used as a gold standard. The system obtained a maximum F-score of 22.21% (23.18% precision and 21.36% recall), with high efficiency in the use of seed pairs. CONCLUSION: The method performs comparably with other studied methods, does not rely on sophisticated named-entity recognition, and requires little initial seed knowledge. BioMed Central 2005-04-22 /pmc/articles/PMC1090552/ /pubmed/15847682 http://dx.doi.org/10.1186/1471-2105-6-103 Text en Copyright © 2005 Cohen et al; licensee BioMed Central Ltd.
spellingShingle Methodology Article
Cohen, AM
Hersh, WR
Dubay, C
Spackman, K
Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts
title Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts
title_full Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts
title_fullStr Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts
title_full_unstemmed Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts
title_short Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts
title_sort using co-occurrence network structure to extract synonymous gene and protein names from medline abstracts
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1090552/
https://www.ncbi.nlm.nih.gov/pubmed/15847682
http://dx.doi.org/10.1186/1471-2105-6-103
work_keys_str_mv AT cohenam usingcooccurrencenetworkstructuretoextractsynonymousgeneandproteinnamesfrommedlineabstracts
AT hershwr usingcooccurrencenetworkstructuretoextractsynonymousgeneandproteinnamesfrommedlineabstracts
AT dubayc usingcooccurrencenetworkstructuretoextractsynonymousgeneandproteinnamesfrommedlineabstracts
AT spackmank usingcooccurrencenetworkstructuretoextractsynonymousgeneandproteinnamesfrommedlineabstracts