Cargando…
Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts
BACKGROUND: Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilit...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2005
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1090552/ https://www.ncbi.nlm.nih.gov/pubmed/15847682 http://dx.doi.org/10.1186/1471-2105-6-103 |
_version_ | 1782123883091984384 |
---|---|
author | Cohen, AM Hersh, WR Dubay, C Spackman, K |
author_facet | Cohen, AM Hersh, WR Dubay, C Spackman, K |
author_sort | Cohen, AM |
collection | PubMed |
description | BACKGROUND: Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilities of knowledge extraction. The method was applied to the task of automatic gene and protein name synonym extraction. RESULTS: Performance was measured on a test set consisting of about 50,000 abstracts from one year of MEDLINE. Synonyms retrieved from curated genomics databases were used as a gold standard. The system obtained a maximum F-score of 22.21% (23.18% precision and 21.36% recall), with high efficiency in the use of seed pairs. CONCLUSION: The method performs comparably with other studied methods, does not rely on sophisticated named-entity recognition, and requires little initial seed knowledge. |
format | Text |
id | pubmed-1090552 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2005 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-10905522005-05-07 Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts Cohen, AM Hersh, WR Dubay, C Spackman, K BMC Bioinformatics Methodology Article BACKGROUND: Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilities of knowledge extraction. The method was applied to the task of automatic gene and protein name synonym extraction. RESULTS: Performance was measured on a test set consisting of about 50,000 abstracts from one year of MEDLINE. Synonyms retrieved from curated genomics databases were used as a gold standard. The system obtained a maximum F-score of 22.21% (23.18% precision and 21.36% recall), with high efficiency in the use of seed pairs. CONCLUSION: The method performs comparably with other studied methods, does not rely on sophisticated named-entity recognition, and requires little initial seed knowledge. BioMed Central 2005-04-22 /pmc/articles/PMC1090552/ /pubmed/15847682 http://dx.doi.org/10.1186/1471-2105-6-103 Text en Copyright © 2005 Cohen et al; licensee BioMed Central Ltd. |
spellingShingle | Methodology Article Cohen, AM Hersh, WR Dubay, C Spackman, K Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts |
title | Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts |
title_full | Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts |
title_fullStr | Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts |
title_full_unstemmed | Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts |
title_short | Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts |
title_sort | using co-occurrence network structure to extract synonymous gene and protein names from medline abstracts |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1090552/ https://www.ncbi.nlm.nih.gov/pubmed/15847682 http://dx.doi.org/10.1186/1471-2105-6-103 |
work_keys_str_mv | AT cohenam usingcooccurrencenetworkstructuretoextractsynonymousgeneandproteinnamesfrommedlineabstracts AT hershwr usingcooccurrencenetworkstructuretoextractsynonymousgeneandproteinnamesfrommedlineabstracts AT dubayc usingcooccurrencenetworkstructuretoextractsynonymousgeneandproteinnamesfrommedlineabstracts AT spackmank usingcooccurrencenetworkstructuretoextractsynonymousgeneandproteinnamesfrommedlineabstracts |