Cargando…

Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks

BACKGROUND: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identificat...

Descripción completa

Detalles Bibliográficos
Autores principales:	Abi-Haidar, Alaa, Kaur, Jasleen, Maguitman, Ana, Radivojac, Predrag, Rechtsteiner, Andreas, Verspoor, Karin, Wang, Zhiping, Rocha, Luis M
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559982/ https://www.ncbi.nlm.nih.gov/pubmed/18834489 http://dx.doi.org/10.1186/gb-2008-9-s2-s11

_version_	1782159691327995904
author	Abi-Haidar, Alaa Kaur, Jasleen Maguitman, Ana Radivojac, Predrag Rechtsteiner, Andreas Verspoor, Karin Wang, Zhiping Rocha, Luis M
author_facet	Abi-Haidar, Alaa Kaur, Jasleen Maguitman, Ana Radivojac, Predrag Rechtsteiner, Andreas Verspoor, Karin Wang, Zhiping Rocha, Luis M
author_sort	Abi-Haidar, Alaa
collection	PubMed
description	BACKGROUND: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. RESULTS: Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. CONCLUSION: Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed.
format	Text
id	pubmed-2559982
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-25599822008-10-04 Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks Abi-Haidar, Alaa Kaur, Jasleen Maguitman, Ana Radivojac, Predrag Rechtsteiner, Andreas Verspoor, Karin Wang, Zhiping Rocha, Luis M Genome Biol Research BACKGROUND: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. RESULTS: Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. CONCLUSION: Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed. BioMed Central 2008 2008-09-01 /pmc/articles/PMC2559982/ /pubmed/18834489 http://dx.doi.org/10.1186/gb-2008-9-s2-s11 Text en Copyright © 2008 Abi-Haidar et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Abi-Haidar, Alaa Kaur, Jasleen Maguitman, Ana Radivojac, Predrag Rechtsteiner, Andreas Verspoor, Karin Wang, Zhiping Rocha, Luis M Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
title	Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
title_full	Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
title_fullStr	Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
title_full_unstemmed	Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
title_short	Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
title_sort	uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559982/ https://www.ncbi.nlm.nih.gov/pubmed/18834489 http://dx.doi.org/10.1186/gb-2008-9-s2-s11
work_keys_str_mv	AT abihaidaralaa uncoveringproteininteractioninabstractsandtextusinganovellinearmodelandwordproximitynetworks AT kaurjasleen uncoveringproteininteractioninabstractsandtextusinganovellinearmodelandwordproximitynetworks AT maguitmanana uncoveringproteininteractioninabstractsandtextusinganovellinearmodelandwordproximitynetworks AT radivojacpredrag uncoveringproteininteractioninabstractsandtextusinganovellinearmodelandwordproximitynetworks AT rechtsteinerandreas uncoveringproteininteractioninabstractsandtextusinganovellinearmodelandwordproximitynetworks AT verspoorkarin uncoveringproteininteractioninabstractsandtextusinganovellinearmodelandwordproximitynetworks AT wangzhiping uncoveringproteininteractioninabstractsandtextusinganovellinearmodelandwordproximitynetworks AT rochaluism uncoveringproteininteractioninabstractsandtextusinganovellinearmodelandwordproximitynetworks

Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks

Ejemplares similares