Cargando…
iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets
Knowledge of protein-protein interactions and their binding sites is indispensable for in-depth understanding of the networks in living cells. With the avalanche of protein sequences generated in the postgenomic age, it is critical to develop computational methods for identifying in a timely fashion...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6274413/ https://www.ncbi.nlm.nih.gov/pubmed/26797600 http://dx.doi.org/10.3390/molecules21010095 |
_version_ | 1783377612465242112 |
---|---|
author | Jia, Jianhua Liu, Zi Xiao, Xuan Liu, Bingxiang Chou, Kuo-Chen |
author_facet | Jia, Jianhua Liu, Zi Xiao, Xuan Liu, Bingxiang Chou, Kuo-Chen |
author_sort | Jia, Jianhua |
collection | PubMed |
description | Knowledge of protein-protein interactions and their binding sites is indispensable for in-depth understanding of the networks in living cells. With the avalanche of protein sequences generated in the postgenomic age, it is critical to develop computational methods for identifying in a timely fashion the protein-protein binding sites (PPBSs) based on the sequence information alone because the information obtained by this way can be used for both biomedical research and drug development. To address such a challenge, we have proposed a new predictor, called iPPBS-Opt, in which we have used: (1) the K-Nearest Neighbors Cleaning (KNNC) and Inserting Hypothetical Training Samples (IHTS) treatments to optimize the training dataset; (2) the ensemble voting approach to select the most relevant features; and (3) the stationary wavelet transform to formulate the statistical samples. Cross-validation tests by targeting the experiment-confirmed results have demonstrated that the new predictor is very promising, implying that the aforementioned practices are indeed very effective. Particularly, the approach of using the wavelets to express protein/peptide sequences might be the key in grasping the problem’s essence, fully consistent with the findings that many important biological functions of proteins can be elucidated with their low-frequency internal motions. To maximize the convenience of most experimental scientists, we have provided a step-by-step guide on how to use the predictor’s web server (http://www.jci-bioinfo.cn/iPPBS-Opt) to get the desired results without the need to go through the complicated mathematical equations involved. |
format | Online Article Text |
id | pubmed-6274413 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-62744132018-12-28 iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets Jia, Jianhua Liu, Zi Xiao, Xuan Liu, Bingxiang Chou, Kuo-Chen Molecules Article Knowledge of protein-protein interactions and their binding sites is indispensable for in-depth understanding of the networks in living cells. With the avalanche of protein sequences generated in the postgenomic age, it is critical to develop computational methods for identifying in a timely fashion the protein-protein binding sites (PPBSs) based on the sequence information alone because the information obtained by this way can be used for both biomedical research and drug development. To address such a challenge, we have proposed a new predictor, called iPPBS-Opt, in which we have used: (1) the K-Nearest Neighbors Cleaning (KNNC) and Inserting Hypothetical Training Samples (IHTS) treatments to optimize the training dataset; (2) the ensemble voting approach to select the most relevant features; and (3) the stationary wavelet transform to formulate the statistical samples. Cross-validation tests by targeting the experiment-confirmed results have demonstrated that the new predictor is very promising, implying that the aforementioned practices are indeed very effective. Particularly, the approach of using the wavelets to express protein/peptide sequences might be the key in grasping the problem’s essence, fully consistent with the findings that many important biological functions of proteins can be elucidated with their low-frequency internal motions. To maximize the convenience of most experimental scientists, we have provided a step-by-step guide on how to use the predictor’s web server (http://www.jci-bioinfo.cn/iPPBS-Opt) to get the desired results without the need to go through the complicated mathematical equations involved. MDPI 2016-01-19 /pmc/articles/PMC6274413/ /pubmed/26797600 http://dx.doi.org/10.3390/molecules21010095 Text en © 2016 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Jia, Jianhua Liu, Zi Xiao, Xuan Liu, Bingxiang Chou, Kuo-Chen iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets |
title | iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets |
title_full | iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets |
title_fullStr | iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets |
title_full_unstemmed | iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets |
title_short | iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets |
title_sort | ippbs-opt: a sequence-based ensemble classifier for identifying protein-protein binding sites by optimizing imbalanced training datasets |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6274413/ https://www.ncbi.nlm.nih.gov/pubmed/26797600 http://dx.doi.org/10.3390/molecules21010095 |
work_keys_str_mv | AT jiajianhua ippbsoptasequencebasedensembleclassifierforidentifyingproteinproteinbindingsitesbyoptimizingimbalancedtrainingdatasets AT liuzi ippbsoptasequencebasedensembleclassifierforidentifyingproteinproteinbindingsitesbyoptimizingimbalancedtrainingdatasets AT xiaoxuan ippbsoptasequencebasedensembleclassifierforidentifyingproteinproteinbindingsitesbyoptimizingimbalancedtrainingdatasets AT liubingxiang ippbsoptasequencebasedensembleclassifierforidentifyingproteinproteinbindingsitesbyoptimizingimbalancedtrainingdatasets AT choukuochen ippbsoptasequencebasedensembleclassifierforidentifyingproteinproteinbindingsitesbyoptimizingimbalancedtrainingdatasets |