Cargando…
PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine
BACKGROUND: The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2003
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC153503/ https://www.ncbi.nlm.nih.gov/pubmed/12689350 http://dx.doi.org/10.1186/1471-2105-4-11 |
_version_ | 1782120713755295744 |
---|---|
author | Donaldson, Ian Martin, Joel de Bruijn, Berry Wolting, Cheryl Lay, Vicki Tuekam, Brigitte Zhang, Shudong Baskin, Berivan Bader, Gary D Michalickova, Katerina Pawson, Tony Hogue, Christopher WV |
author_facet | Donaldson, Ian Martin, Joel de Bruijn, Berry Wolting, Cheryl Lay, Vicki Tuekam, Brigitte Zhang, Shudong Baskin, Berivan Bader, Gary D Michalickova, Katerina Pawson, Tony Hogue, Christopher WV |
author_sort | Donaldson, Ian |
collection | PubMed |
description | BACKGROUND: The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-readable format. We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature. We present an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND. RESULTS: Cross-validation estimated the support vector machine's test-set precision, accuracy and recall for classifying abstracts describing interaction information was 92%, 90% and 92% respectively. We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database. Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days. CONCLUSIONS: Machine learning methods are useful as tools to direct interaction and pathway database back-filling; however, this potential can only be realized if these techniques are coupled with human review and entry into a factual database such as BIND. The PreBIND system described here is available to the public at . Current capabilities allow searching for human, mouse and yeast protein-interaction information. |
format | Text |
id | pubmed-153503 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2003 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-1535032003-04-19 PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine Donaldson, Ian Martin, Joel de Bruijn, Berry Wolting, Cheryl Lay, Vicki Tuekam, Brigitte Zhang, Shudong Baskin, Berivan Bader, Gary D Michalickova, Katerina Pawson, Tony Hogue, Christopher WV BMC Bioinformatics Research Article BACKGROUND: The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-readable format. We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature. We present an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND. RESULTS: Cross-validation estimated the support vector machine's test-set precision, accuracy and recall for classifying abstracts describing interaction information was 92%, 90% and 92% respectively. We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database. Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days. CONCLUSIONS: Machine learning methods are useful as tools to direct interaction and pathway database back-filling; however, this potential can only be realized if these techniques are coupled with human review and entry into a factual database such as BIND. The PreBIND system described here is available to the public at . Current capabilities allow searching for human, mouse and yeast protein-interaction information. BioMed Central 2003-03-27 /pmc/articles/PMC153503/ /pubmed/12689350 http://dx.doi.org/10.1186/1471-2105-4-11 Text en Copyright © 2003 Donaldson et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL. |
spellingShingle | Research Article Donaldson, Ian Martin, Joel de Bruijn, Berry Wolting, Cheryl Lay, Vicki Tuekam, Brigitte Zhang, Shudong Baskin, Berivan Bader, Gary D Michalickova, Katerina Pawson, Tony Hogue, Christopher WV PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine |
title | PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine |
title_full | PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine |
title_fullStr | PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine |
title_full_unstemmed | PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine |
title_short | PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine |
title_sort | prebind and textomy – mining the biomedical literature for protein-protein interactions using a support vector machine |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC153503/ https://www.ncbi.nlm.nih.gov/pubmed/12689350 http://dx.doi.org/10.1186/1471-2105-4-11 |
work_keys_str_mv | AT donaldsonian prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine AT martinjoel prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine AT debruijnberry prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine AT woltingcheryl prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine AT layvicki prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine AT tuekambrigitte prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine AT zhangshudong prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine AT baskinberivan prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine AT badergaryd prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine AT michalickovakaterina prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine AT pawsontony prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine AT hoguechristopherwv prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine |