Cargando…

Predicting protein-protein interactions in unbalanced data using the primary structure of proteins

BACKGROUND: Elucidating protein-protein interactions (PPIs) is essential to constructing protein interaction networks and facilitating our understanding of the general principles of biological systems. Previous studies have revealed that interacting protein pairs can be predicted by their primary st...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yu, Chi-Yuan, Chou, Lih-Ching, Chang, Darby Tien-Hao
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Research article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2868006/ https://www.ncbi.nlm.nih.gov/pubmed/20361868 http://dx.doi.org/10.1186/1471-2105-11-167

_version_	1782181025013563392
author	Yu, Chi-Yuan Chou, Lih-Ching Chang, Darby Tien-Hao
author_facet	Yu, Chi-Yuan Chou, Lih-Ching Chang, Darby Tien-Hao
author_sort	Yu, Chi-Yuan
collection	PubMed
description	BACKGROUND: Elucidating protein-protein interactions (PPIs) is essential to constructing protein interaction networks and facilitating our understanding of the general principles of biological systems. Previous studies have revealed that interacting protein pairs can be predicted by their primary structure. Most of these approaches have achieved satisfactory performance on datasets comprising equal number of interacting and non-interacting protein pairs. However, this ratio is highly unbalanced in nature, and these techniques have not been comprehensively evaluated with respect to the effect of the large number of non-interacting pairs in realistic datasets. Moreover, since highly unbalanced distributions usually lead to large datasets, more efficient predictors are desired when handling such challenging tasks. RESULTS: This study presents a method for PPI prediction based only on sequence information, which contributes in three aspects. First, we propose a probability-based mechanism for transforming protein sequences into feature vectors. Second, the proposed predictor is designed with an efficient classification algorithm, where the efficiency is essential for handling highly unbalanced datasets. Third, the proposed PPI predictor is assessed with several unbalanced datasets with different positive-to-negative ratios (from 1:1 to 1:15). This analysis provides solid evidence that the degree of dataset imbalance is important to PPI predictors. CONCLUSIONS: Dealing with data imbalance is a key issue in PPI prediction since there are far fewer interacting protein pairs than non-interacting ones. This article provides a comprehensive study on this issue and develops a practical tool that achieves both good prediction performance and efficiency using only protein sequence information.
format	Text
id	pubmed-2868006
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-28680062010-05-12 Predicting protein-protein interactions in unbalanced data using the primary structure of proteins Yu, Chi-Yuan Chou, Lih-Ching Chang, Darby Tien-Hao BMC Bioinformatics Research article BACKGROUND: Elucidating protein-protein interactions (PPIs) is essential to constructing protein interaction networks and facilitating our understanding of the general principles of biological systems. Previous studies have revealed that interacting protein pairs can be predicted by their primary structure. Most of these approaches have achieved satisfactory performance on datasets comprising equal number of interacting and non-interacting protein pairs. However, this ratio is highly unbalanced in nature, and these techniques have not been comprehensively evaluated with respect to the effect of the large number of non-interacting pairs in realistic datasets. Moreover, since highly unbalanced distributions usually lead to large datasets, more efficient predictors are desired when handling such challenging tasks. RESULTS: This study presents a method for PPI prediction based only on sequence information, which contributes in three aspects. First, we propose a probability-based mechanism for transforming protein sequences into feature vectors. Second, the proposed predictor is designed with an efficient classification algorithm, where the efficiency is essential for handling highly unbalanced datasets. Third, the proposed PPI predictor is assessed with several unbalanced datasets with different positive-to-negative ratios (from 1:1 to 1:15). This analysis provides solid evidence that the degree of dataset imbalance is important to PPI predictors. CONCLUSIONS: Dealing with data imbalance is a key issue in PPI prediction since there are far fewer interacting protein pairs than non-interacting ones. This article provides a comprehensive study on this issue and develops a practical tool that achieves both good prediction performance and efficiency using only protein sequence information. BioMed Central 2010-04-02 /pmc/articles/PMC2868006/ /pubmed/20361868 http://dx.doi.org/10.1186/1471-2105-11-167 Text en Copyright ©2010 Yu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research article Yu, Chi-Yuan Chou, Lih-Ching Chang, Darby Tien-Hao Predicting protein-protein interactions in unbalanced data using the primary structure of proteins
title	Predicting protein-protein interactions in unbalanced data using the primary structure of proteins
title_full	Predicting protein-protein interactions in unbalanced data using the primary structure of proteins
title_fullStr	Predicting protein-protein interactions in unbalanced data using the primary structure of proteins
title_full_unstemmed	Predicting protein-protein interactions in unbalanced data using the primary structure of proteins
title_short	Predicting protein-protein interactions in unbalanced data using the primary structure of proteins
title_sort	predicting protein-protein interactions in unbalanced data using the primary structure of proteins
topic	Research article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2868006/ https://www.ncbi.nlm.nih.gov/pubmed/20361868 http://dx.doi.org/10.1186/1471-2105-11-167
work_keys_str_mv	AT yuchiyuan predictingproteinproteininteractionsinunbalanceddatausingtheprimarystructureofproteins AT choulihching predictingproteinproteininteractionsinunbalanceddatausingtheprimarystructureofproteins AT changdarbytienhao predictingproteinproteininteractionsinunbalanceddatausingtheprimarystructureofproteins

Predicting protein-protein interactions in unbalanced data using the primary structure of proteins

Ejemplares similares