Cargando…

PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine

BACKGROUND: The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these...

Descripción completa

Detalles Bibliográficos
Autores principales: Donaldson, Ian, Martin, Joel, de Bruijn, Berry, Wolting, Cheryl, Lay, Vicki, Tuekam, Brigitte, Zhang, Shudong, Baskin, Berivan, Bader, Gary D, Michalickova, Katerina, Pawson, Tony, Hogue, Christopher WV
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2003
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC153503/
https://www.ncbi.nlm.nih.gov/pubmed/12689350
http://dx.doi.org/10.1186/1471-2105-4-11
_version_ 1782120713755295744
author Donaldson, Ian
Martin, Joel
de Bruijn, Berry
Wolting, Cheryl
Lay, Vicki
Tuekam, Brigitte
Zhang, Shudong
Baskin, Berivan
Bader, Gary D
Michalickova, Katerina
Pawson, Tony
Hogue, Christopher WV
author_facet Donaldson, Ian
Martin, Joel
de Bruijn, Berry
Wolting, Cheryl
Lay, Vicki
Tuekam, Brigitte
Zhang, Shudong
Baskin, Berivan
Bader, Gary D
Michalickova, Katerina
Pawson, Tony
Hogue, Christopher WV
author_sort Donaldson, Ian
collection PubMed
description BACKGROUND: The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-readable format. We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature. We present an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND. RESULTS: Cross-validation estimated the support vector machine's test-set precision, accuracy and recall for classifying abstracts describing interaction information was 92%, 90% and 92% respectively. We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database. Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days. CONCLUSIONS: Machine learning methods are useful as tools to direct interaction and pathway database back-filling; however, this potential can only be realized if these techniques are coupled with human review and entry into a factual database such as BIND. The PreBIND system described here is available to the public at . Current capabilities allow searching for human, mouse and yeast protein-interaction information.
format Text
id pubmed-153503
institution National Center for Biotechnology Information
language English
publishDate 2003
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-1535032003-04-19 PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine Donaldson, Ian Martin, Joel de Bruijn, Berry Wolting, Cheryl Lay, Vicki Tuekam, Brigitte Zhang, Shudong Baskin, Berivan Bader, Gary D Michalickova, Katerina Pawson, Tony Hogue, Christopher WV BMC Bioinformatics Research Article BACKGROUND: The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-readable format. We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature. We present an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND. RESULTS: Cross-validation estimated the support vector machine's test-set precision, accuracy and recall for classifying abstracts describing interaction information was 92%, 90% and 92% respectively. We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database. Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days. CONCLUSIONS: Machine learning methods are useful as tools to direct interaction and pathway database back-filling; however, this potential can only be realized if these techniques are coupled with human review and entry into a factual database such as BIND. The PreBIND system described here is available to the public at . Current capabilities allow searching for human, mouse and yeast protein-interaction information. BioMed Central 2003-03-27 /pmc/articles/PMC153503/ /pubmed/12689350 http://dx.doi.org/10.1186/1471-2105-4-11 Text en Copyright © 2003 Donaldson et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Research Article
Donaldson, Ian
Martin, Joel
de Bruijn, Berry
Wolting, Cheryl
Lay, Vicki
Tuekam, Brigitte
Zhang, Shudong
Baskin, Berivan
Bader, Gary D
Michalickova, Katerina
Pawson, Tony
Hogue, Christopher WV
PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine
title PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine
title_full PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine
title_fullStr PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine
title_full_unstemmed PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine
title_short PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine
title_sort prebind and textomy – mining the biomedical literature for protein-protein interactions using a support vector machine
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC153503/
https://www.ncbi.nlm.nih.gov/pubmed/12689350
http://dx.doi.org/10.1186/1471-2105-4-11
work_keys_str_mv AT donaldsonian prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine
AT martinjoel prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine
AT debruijnberry prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine
AT woltingcheryl prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine
AT layvicki prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine
AT tuekambrigitte prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine
AT zhangshudong prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine
AT baskinberivan prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine
AT badergaryd prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine
AT michalickovakaterina prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine
AT pawsontony prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine
AT hoguechristopherwv prebindandtextomyminingthebiomedicalliteratureforproteinproteininteractionsusingasupportvectormachine