Cargando…

A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction

Protein-nucleotide interactions are ubiquitous in a wide variety of biological processes. Accurately identifying interaction residues solely from protein sequences is useful for both protein function annotation and drug design, especially in the post-genomic era, as large volumes of protein data hav...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Jun, He, Xue, Yu, Dong-Jun, Yang, Xi-Bei, Yang, Jing-Yu, Shen, Hong-Bin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4168127/
https://www.ncbi.nlm.nih.gov/pubmed/25229688
http://dx.doi.org/10.1371/journal.pone.0107676
_version_ 1782335491032481792
author Hu, Jun
He, Xue
Yu, Dong-Jun
Yang, Xi-Bei
Yang, Jing-Yu
Shen, Hong-Bin
author_facet Hu, Jun
He, Xue
Yu, Dong-Jun
Yang, Xi-Bei
Yang, Jing-Yu
Shen, Hong-Bin
author_sort Hu, Jun
collection PubMed
description Protein-nucleotide interactions are ubiquitous in a wide variety of biological processes. Accurately identifying interaction residues solely from protein sequences is useful for both protein function annotation and drug design, especially in the post-genomic era, as large volumes of protein data have not been functionally annotated. Protein-nucleotide binding residue prediction is a typical imbalanced learning problem, where binding residues are extremely fewer in number than non-binding residues. Alleviating the severity of class imbalance has been demonstrated to be a promising means of improving the prediction performance of a machine-learning-based predictor for class imbalance problems. However, little attention has been paid to the negative impact of class imbalance on protein-nucleotide binding residue prediction. In this study, we propose a new supervised over-sampling algorithm that synthesizes additional minority class samples to address class imbalance. The experimental results from protein-nucleotide interaction datasets demonstrate that the proposed supervised over-sampling algorithm can relieve the severity of class imbalance and help to improve prediction performance. Based on the proposed over-sampling algorithm, a predictor, called TargetSOS, is implemented for protein-nucleotide binding residue prediction. Cross-validation tests and independent validation tests demonstrate the effectiveness of TargetSOS. The web-server and datasets used in this study are freely available at http://www.csbio.sjtu.edu.cn/bioinf/TargetSOS/.
format Online
Article
Text
id pubmed-4168127
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-41681272014-09-22 A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction Hu, Jun He, Xue Yu, Dong-Jun Yang, Xi-Bei Yang, Jing-Yu Shen, Hong-Bin PLoS One Research Article Protein-nucleotide interactions are ubiquitous in a wide variety of biological processes. Accurately identifying interaction residues solely from protein sequences is useful for both protein function annotation and drug design, especially in the post-genomic era, as large volumes of protein data have not been functionally annotated. Protein-nucleotide binding residue prediction is a typical imbalanced learning problem, where binding residues are extremely fewer in number than non-binding residues. Alleviating the severity of class imbalance has been demonstrated to be a promising means of improving the prediction performance of a machine-learning-based predictor for class imbalance problems. However, little attention has been paid to the negative impact of class imbalance on protein-nucleotide binding residue prediction. In this study, we propose a new supervised over-sampling algorithm that synthesizes additional minority class samples to address class imbalance. The experimental results from protein-nucleotide interaction datasets demonstrate that the proposed supervised over-sampling algorithm can relieve the severity of class imbalance and help to improve prediction performance. Based on the proposed over-sampling algorithm, a predictor, called TargetSOS, is implemented for protein-nucleotide binding residue prediction. Cross-validation tests and independent validation tests demonstrate the effectiveness of TargetSOS. The web-server and datasets used in this study are freely available at http://www.csbio.sjtu.edu.cn/bioinf/TargetSOS/. Public Library of Science 2014-09-17 /pmc/articles/PMC4168127/ /pubmed/25229688 http://dx.doi.org/10.1371/journal.pone.0107676 Text en © 2014 Hu et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Hu, Jun
He, Xue
Yu, Dong-Jun
Yang, Xi-Bei
Yang, Jing-Yu
Shen, Hong-Bin
A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction
title A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction
title_full A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction
title_fullStr A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction
title_full_unstemmed A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction
title_short A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction
title_sort new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4168127/
https://www.ncbi.nlm.nih.gov/pubmed/25229688
http://dx.doi.org/10.1371/journal.pone.0107676
work_keys_str_mv AT hujun anewsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction
AT hexue anewsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction
AT yudongjun anewsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction
AT yangxibei anewsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction
AT yangjingyu anewsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction
AT shenhongbin anewsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction
AT hujun newsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction
AT hexue newsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction
AT yudongjun newsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction
AT yangxibei newsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction
AT yangjingyu newsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction
AT shenhongbin newsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction