Cargando…

Prediction of protein-protein interaction sites using an ensemble method

BACKGROUND: Prediction of protein-protein interaction sites is one of the most challenging and intriguing problems in the field of computational biology. Although much progress has been achieved by using various machine learning methods and a variety of available features, the problem is still far f...

Descripción completa

Detalles Bibliográficos
Autores principales: Deng, Lei, Guan, Jihong, Dong, Qiwen, Zhou, Shuigeng
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808167/
https://www.ncbi.nlm.nih.gov/pubmed/20015386
http://dx.doi.org/10.1186/1471-2105-10-426
_version_ 1782176457049505792
author Deng, Lei
Guan, Jihong
Dong, Qiwen
Zhou, Shuigeng
author_facet Deng, Lei
Guan, Jihong
Dong, Qiwen
Zhou, Shuigeng
author_sort Deng, Lei
collection PubMed
description BACKGROUND: Prediction of protein-protein interaction sites is one of the most challenging and intriguing problems in the field of computational biology. Although much progress has been achieved by using various machine learning methods and a variety of available features, the problem is still far from being solved. RESULTS: In this paper, an ensemble method is proposed, which combines bootstrap resampling technique, SVM-based fusion classifiers and weighted voting strategy, to overcome the imbalanced problem and effectively utilize a wide variety of features. We evaluate the ensemble classifier using a dataset extracted from 99 polypeptide chains with 10-fold cross validation, and get a AUC score of 0.86, with a sensitivity of 0.76 and a specificity of 0.78, which are better than that of the existing methods. To improve the usefulness of the proposed method, two special ensemble classifiers are designed to handle the cases of missing homologues and structural information respectively, and the performance is still encouraging. The robustness of the ensemble method is also evaluated by effectively classifying interaction sites from surface residues as well as from all residues in proteins. Moreover, we demonstrate the applicability of the proposed method to identify interaction sites from the non-structural proteins (NS) of the influenza A virus, which may be utilized as potential drug target sites. CONCLUSION: Our experimental results show that the ensemble classifiers are quite effective in predicting protein interaction sites. The Sub-EnClassifiers with resampling technique can alleviate the imbalanced problem and the combination of Sub-EnClassifiers with a wide variety of feature groups can significantly improve prediction performance.
format Text
id pubmed-2808167
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28081672010-01-20 Prediction of protein-protein interaction sites using an ensemble method Deng, Lei Guan, Jihong Dong, Qiwen Zhou, Shuigeng BMC Bioinformatics Methodology Article BACKGROUND: Prediction of protein-protein interaction sites is one of the most challenging and intriguing problems in the field of computational biology. Although much progress has been achieved by using various machine learning methods and a variety of available features, the problem is still far from being solved. RESULTS: In this paper, an ensemble method is proposed, which combines bootstrap resampling technique, SVM-based fusion classifiers and weighted voting strategy, to overcome the imbalanced problem and effectively utilize a wide variety of features. We evaluate the ensemble classifier using a dataset extracted from 99 polypeptide chains with 10-fold cross validation, and get a AUC score of 0.86, with a sensitivity of 0.76 and a specificity of 0.78, which are better than that of the existing methods. To improve the usefulness of the proposed method, two special ensemble classifiers are designed to handle the cases of missing homologues and structural information respectively, and the performance is still encouraging. The robustness of the ensemble method is also evaluated by effectively classifying interaction sites from surface residues as well as from all residues in proteins. Moreover, we demonstrate the applicability of the proposed method to identify interaction sites from the non-structural proteins (NS) of the influenza A virus, which may be utilized as potential drug target sites. CONCLUSION: Our experimental results show that the ensemble classifiers are quite effective in predicting protein interaction sites. The Sub-EnClassifiers with resampling technique can alleviate the imbalanced problem and the combination of Sub-EnClassifiers with a wide variety of feature groups can significantly improve prediction performance. BioMed Central 2009-12-16 /pmc/articles/PMC2808167/ /pubmed/20015386 http://dx.doi.org/10.1186/1471-2105-10-426 Text en Copyright ©2009 Deng et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Deng, Lei
Guan, Jihong
Dong, Qiwen
Zhou, Shuigeng
Prediction of protein-protein interaction sites using an ensemble method
title Prediction of protein-protein interaction sites using an ensemble method
title_full Prediction of protein-protein interaction sites using an ensemble method
title_fullStr Prediction of protein-protein interaction sites using an ensemble method
title_full_unstemmed Prediction of protein-protein interaction sites using an ensemble method
title_short Prediction of protein-protein interaction sites using an ensemble method
title_sort prediction of protein-protein interaction sites using an ensemble method
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808167/
https://www.ncbi.nlm.nih.gov/pubmed/20015386
http://dx.doi.org/10.1186/1471-2105-10-426
work_keys_str_mv AT denglei predictionofproteinproteininteractionsitesusinganensemblemethod
AT guanjihong predictionofproteinproteininteractionsitesusinganensemblemethod
AT dongqiwen predictionofproteinproteininteractionsitesusinganensemblemethod
AT zhoushuigeng predictionofproteinproteininteractionsitesusinganensemblemethod