Cargando…
A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction
Protein-nucleotide interactions are ubiquitous in a wide variety of biological processes. Accurately identifying interaction residues solely from protein sequences is useful for both protein function annotation and drug design, especially in the post-genomic era, as large volumes of protein data hav...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4168127/ https://www.ncbi.nlm.nih.gov/pubmed/25229688 http://dx.doi.org/10.1371/journal.pone.0107676 |
_version_ | 1782335491032481792 |
---|---|
author | Hu, Jun He, Xue Yu, Dong-Jun Yang, Xi-Bei Yang, Jing-Yu Shen, Hong-Bin |
author_facet | Hu, Jun He, Xue Yu, Dong-Jun Yang, Xi-Bei Yang, Jing-Yu Shen, Hong-Bin |
author_sort | Hu, Jun |
collection | PubMed |
description | Protein-nucleotide interactions are ubiquitous in a wide variety of biological processes. Accurately identifying interaction residues solely from protein sequences is useful for both protein function annotation and drug design, especially in the post-genomic era, as large volumes of protein data have not been functionally annotated. Protein-nucleotide binding residue prediction is a typical imbalanced learning problem, where binding residues are extremely fewer in number than non-binding residues. Alleviating the severity of class imbalance has been demonstrated to be a promising means of improving the prediction performance of a machine-learning-based predictor for class imbalance problems. However, little attention has been paid to the negative impact of class imbalance on protein-nucleotide binding residue prediction. In this study, we propose a new supervised over-sampling algorithm that synthesizes additional minority class samples to address class imbalance. The experimental results from protein-nucleotide interaction datasets demonstrate that the proposed supervised over-sampling algorithm can relieve the severity of class imbalance and help to improve prediction performance. Based on the proposed over-sampling algorithm, a predictor, called TargetSOS, is implemented for protein-nucleotide binding residue prediction. Cross-validation tests and independent validation tests demonstrate the effectiveness of TargetSOS. The web-server and datasets used in this study are freely available at http://www.csbio.sjtu.edu.cn/bioinf/TargetSOS/. |
format | Online Article Text |
id | pubmed-4168127 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-41681272014-09-22 A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction Hu, Jun He, Xue Yu, Dong-Jun Yang, Xi-Bei Yang, Jing-Yu Shen, Hong-Bin PLoS One Research Article Protein-nucleotide interactions are ubiquitous in a wide variety of biological processes. Accurately identifying interaction residues solely from protein sequences is useful for both protein function annotation and drug design, especially in the post-genomic era, as large volumes of protein data have not been functionally annotated. Protein-nucleotide binding residue prediction is a typical imbalanced learning problem, where binding residues are extremely fewer in number than non-binding residues. Alleviating the severity of class imbalance has been demonstrated to be a promising means of improving the prediction performance of a machine-learning-based predictor for class imbalance problems. However, little attention has been paid to the negative impact of class imbalance on protein-nucleotide binding residue prediction. In this study, we propose a new supervised over-sampling algorithm that synthesizes additional minority class samples to address class imbalance. The experimental results from protein-nucleotide interaction datasets demonstrate that the proposed supervised over-sampling algorithm can relieve the severity of class imbalance and help to improve prediction performance. Based on the proposed over-sampling algorithm, a predictor, called TargetSOS, is implemented for protein-nucleotide binding residue prediction. Cross-validation tests and independent validation tests demonstrate the effectiveness of TargetSOS. The web-server and datasets used in this study are freely available at http://www.csbio.sjtu.edu.cn/bioinf/TargetSOS/. Public Library of Science 2014-09-17 /pmc/articles/PMC4168127/ /pubmed/25229688 http://dx.doi.org/10.1371/journal.pone.0107676 Text en © 2014 Hu et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Hu, Jun He, Xue Yu, Dong-Jun Yang, Xi-Bei Yang, Jing-Yu Shen, Hong-Bin A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction |
title | A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction |
title_full | A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction |
title_fullStr | A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction |
title_full_unstemmed | A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction |
title_short | A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction |
title_sort | new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4168127/ https://www.ncbi.nlm.nih.gov/pubmed/25229688 http://dx.doi.org/10.1371/journal.pone.0107676 |
work_keys_str_mv | AT hujun anewsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT hexue anewsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT yudongjun anewsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT yangxibei anewsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT yangjingyu anewsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT shenhongbin anewsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT hujun newsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT hexue newsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT yudongjun newsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT yangxibei newsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT yangjingyu newsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT shenhongbin newsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction |