Cargando…

A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction

Protein-nucleotide interactions are ubiquitous in a wide variety of biological processes. Accurately identifying interaction residues solely from protein sequences is useful for both protein function annotation and drug design, especially in the post-genomic era, as large volumes of protein data hav...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hu, Jun, He, Xue, Yu, Dong-Jun, Yang, Xi-Bei, Yang, Jing-Yu, Shen, Hong-Bin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4168127/ https://www.ncbi.nlm.nih.gov/pubmed/25229688 http://dx.doi.org/10.1371/journal.pone.0107676

_version_	1782335491032481792
author	Hu, Jun He, Xue Yu, Dong-Jun Yang, Xi-Bei Yang, Jing-Yu Shen, Hong-Bin
author_facet	Hu, Jun He, Xue Yu, Dong-Jun Yang, Xi-Bei Yang, Jing-Yu Shen, Hong-Bin
author_sort	Hu, Jun
collection	PubMed
description	Protein-nucleotide interactions are ubiquitous in a wide variety of biological processes. Accurately identifying interaction residues solely from protein sequences is useful for both protein function annotation and drug design, especially in the post-genomic era, as large volumes of protein data have not been functionally annotated. Protein-nucleotide binding residue prediction is a typical imbalanced learning problem, where binding residues are extremely fewer in number than non-binding residues. Alleviating the severity of class imbalance has been demonstrated to be a promising means of improving the prediction performance of a machine-learning-based predictor for class imbalance problems. However, little attention has been paid to the negative impact of class imbalance on protein-nucleotide binding residue prediction. In this study, we propose a new supervised over-sampling algorithm that synthesizes additional minority class samples to address class imbalance. The experimental results from protein-nucleotide interaction datasets demonstrate that the proposed supervised over-sampling algorithm can relieve the severity of class imbalance and help to improve prediction performance. Based on the proposed over-sampling algorithm, a predictor, called TargetSOS, is implemented for protein-nucleotide binding residue prediction. Cross-validation tests and independent validation tests demonstrate the effectiveness of TargetSOS. The web-server and datasets used in this study are freely available at http://www.csbio.sjtu.edu.cn/bioinf/TargetSOS/.
format	Online Article Text
id	pubmed-4168127
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-41681272014-09-22 A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction Hu, Jun He, Xue Yu, Dong-Jun Yang, Xi-Bei Yang, Jing-Yu Shen, Hong-Bin PLoS One Research Article Protein-nucleotide interactions are ubiquitous in a wide variety of biological processes. Accurately identifying interaction residues solely from protein sequences is useful for both protein function annotation and drug design, especially in the post-genomic era, as large volumes of protein data have not been functionally annotated. Protein-nucleotide binding residue prediction is a typical imbalanced learning problem, where binding residues are extremely fewer in number than non-binding residues. Alleviating the severity of class imbalance has been demonstrated to be a promising means of improving the prediction performance of a machine-learning-based predictor for class imbalance problems. However, little attention has been paid to the negative impact of class imbalance on protein-nucleotide binding residue prediction. In this study, we propose a new supervised over-sampling algorithm that synthesizes additional minority class samples to address class imbalance. The experimental results from protein-nucleotide interaction datasets demonstrate that the proposed supervised over-sampling algorithm can relieve the severity of class imbalance and help to improve prediction performance. Based on the proposed over-sampling algorithm, a predictor, called TargetSOS, is implemented for protein-nucleotide binding residue prediction. Cross-validation tests and independent validation tests demonstrate the effectiveness of TargetSOS. The web-server and datasets used in this study are freely available at http://www.csbio.sjtu.edu.cn/bioinf/TargetSOS/. Public Library of Science 2014-09-17 /pmc/articles/PMC4168127/ /pubmed/25229688 http://dx.doi.org/10.1371/journal.pone.0107676 Text en © 2014 Hu et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Hu, Jun He, Xue Yu, Dong-Jun Yang, Xi-Bei Yang, Jing-Yu Shen, Hong-Bin A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction
title	A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction
title_full	A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction
title_fullStr	A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction
title_full_unstemmed	A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction
title_short	A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction
title_sort	new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4168127/ https://www.ncbi.nlm.nih.gov/pubmed/25229688 http://dx.doi.org/10.1371/journal.pone.0107676
work_keys_str_mv	AT hujun anewsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT hexue anewsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT yudongjun anewsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT yangxibei anewsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT yangjingyu anewsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT shenhongbin anewsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT hujun newsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT hexue newsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT yudongjun newsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT yangxibei newsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT yangjingyu newsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction AT shenhongbin newsupervisedoversamplingalgorithmwithapplicationtoproteinnucleotidebindingresidueprediction

A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction

Ejemplares similares