Cargando…

Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting

BACKGROUND: Supervised learning and many stochastic methods for predicting protein-protein interactions require both negative and positive interactions in the training data set. Unlike positive interactions, negative interactions cannot be readily obtained from interaction data, so these must be gen...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kim, Jisu, Huang, De-Shuang, Han, Kyungsook
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648735/ https://www.ncbi.nlm.nih.gov/pubmed/19208160 http://dx.doi.org/10.1186/1471-2105-10-S1-S57

_version_	1782164975908814848
author	Kim, Jisu Huang, De-Shuang Han, Kyungsook
author_facet	Kim, Jisu Huang, De-Shuang Han, Kyungsook
author_sort	Kim, Jisu
collection	PubMed
description	BACKGROUND: Supervised learning and many stochastic methods for predicting protein-protein interactions require both negative and positive interactions in the training data set. Unlike positive interactions, negative interactions cannot be readily obtained from interaction data, so these must be generated. In protein-protein interactions and other molecular interactions as well, taking all non-positive interactions as negative interactions produces too many negative interactions for the positive interactions. Random selection from non-positive interactions is unsuitable, since the selected data may not reflect the original distribution of data. RESULTS: We developed a bootstrapping algorithm for generating a negative data set of arbitrary size from protein-protein interaction data. We also developed an efficient boosting algorithm for finding interacting motif pairs in human and virus proteins. The boosting algorithm showed the best performance (84.4% sensitivity and 75.9% specificity) with balanced positive and negative data sets. The boosting algorithm was also used to find potential motif pairs in complexes of human and virus proteins, for which structural data was not used to train the algorithm. Interacting motif pairs common to multiple folds of structural data for the complexes were proven to be statistically significant. The data set for interactions between human and virus proteins was extracted from BOND and is available at . The complexes of human and virus proteins were extracted from PDB and their identifiers are available at . CONCLUSION: When the positive and negative training data sets are unbalanced, the result via the prediction model tends to be biased. Bootstrapping is effective for generating a negative data set, for which the size and distribution are easily controlled. Our boosting algorithm could efficiently predict interacting motif pairs from protein interaction and sequence data, which was trained with the balanced data sets generated via the bootstrapping method.
format	Text
id	pubmed-2648735
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-26487352009-03-03 Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting Kim, Jisu Huang, De-Shuang Han, Kyungsook BMC Bioinformatics Research BACKGROUND: Supervised learning and many stochastic methods for predicting protein-protein interactions require both negative and positive interactions in the training data set. Unlike positive interactions, negative interactions cannot be readily obtained from interaction data, so these must be generated. In protein-protein interactions and other molecular interactions as well, taking all non-positive interactions as negative interactions produces too many negative interactions for the positive interactions. Random selection from non-positive interactions is unsuitable, since the selected data may not reflect the original distribution of data. RESULTS: We developed a bootstrapping algorithm for generating a negative data set of arbitrary size from protein-protein interaction data. We also developed an efficient boosting algorithm for finding interacting motif pairs in human and virus proteins. The boosting algorithm showed the best performance (84.4% sensitivity and 75.9% specificity) with balanced positive and negative data sets. The boosting algorithm was also used to find potential motif pairs in complexes of human and virus proteins, for which structural data was not used to train the algorithm. Interacting motif pairs common to multiple folds of structural data for the complexes were proven to be statistically significant. The data set for interactions between human and virus proteins was extracted from BOND and is available at . The complexes of human and virus proteins were extracted from PDB and their identifiers are available at . CONCLUSION: When the positive and negative training data sets are unbalanced, the result via the prediction model tends to be biased. Bootstrapping is effective for generating a negative data set, for which the size and distribution are easily controlled. Our boosting algorithm could efficiently predict interacting motif pairs from protein interaction and sequence data, which was trained with the balanced data sets generated via the bootstrapping method. BioMed Central 2009-01-30 /pmc/articles/PMC2648735/ /pubmed/19208160 http://dx.doi.org/10.1186/1471-2105-10-S1-S57 Text en Copyright © 2009 Kim and Han; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Kim, Jisu Huang, De-Shuang Han, Kyungsook Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting
title	Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting
title_full	Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting
title_fullStr	Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting
title_full_unstemmed	Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting
title_short	Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting
title_sort	finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648735/ https://www.ncbi.nlm.nih.gov/pubmed/19208160 http://dx.doi.org/10.1186/1471-2105-10-S1-S57
work_keys_str_mv	AT kimjisu findingmotifpairsintheinteractionsbetweenheterogeneousproteinsviabootstrappingandboosting AT huangdeshuang findingmotifpairsintheinteractionsbetweenheterogeneousproteinsviabootstrappingandboosting AT hankyungsook findingmotifpairsintheinteractionsbetweenheterogeneousproteinsviabootstrappingandboosting

Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting

Ejemplares similares