Cargando…

Semi-supervised prediction of protein interaction sites from unlabeled sample information

BACKGROUND: The recognition of protein interaction sites is of great significance in many biological processes, signaling pathways and drug designs. However, most sites on protein sequences cannot be defined as interface or non-interface sites because only a small part of protein interactions had be...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Ye, Mei, Changqing, Zhou, Yuming, Wang, Yan, Zheng, Chunhou, Zhen, Xiao, Xiong, Yan, Chen, Peng, Zhang, Jun, Wang, Bing
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929468/ https://www.ncbi.nlm.nih.gov/pubmed/31874616 http://dx.doi.org/10.1186/s12859-019-3274-7

_version_	1783482707488014336
author	Wang, Ye Mei, Changqing Zhou, Yuming Wang, Yan Zheng, Chunhou Zhen, Xiao Xiong, Yan Chen, Peng Zhang, Jun Wang, Bing
author_facet	Wang, Ye Mei, Changqing Zhou, Yuming Wang, Yan Zheng, Chunhou Zhen, Xiao Xiong, Yan Chen, Peng Zhang, Jun Wang, Bing
author_sort	Wang, Ye
collection	PubMed
description	BACKGROUND: The recognition of protein interaction sites is of great significance in many biological processes, signaling pathways and drug designs. However, most sites on protein sequences cannot be defined as interface or non-interface sites because only a small part of protein interactions had been identified, which will cause the lack of prediction accuracy and generalization ability of predictors in protein interaction sites prediction. Therefore, it is necessary to effectively improve prediction performance of protein interaction sites using large amounts of unlabeled data together with small amounts of labeled data and background knowledge today. RESULTS: In this work, three semi-supervised support vector machine–based methods are proposed to improve the performance in the protein interaction sites prediction, in which the information of unlabeled protein sites can be involved. Herein, five features related with the evolutionary conservation of amino acids are extracted from HSSP database and Consurf Sever, i.e., residue spatial sequence spectrum, residue sequence information entropy and relative entropy, residue sequence conserved weight and residual Base evolution rate, to represent the residues within the protein sequence. Then three predictors are built for identifying the interface residues from protein surface using three types of semi-supervised support vector machine algorithms. CONCLUSION: The experimental results demonstrated that the semi-supervised approaches can effectively improve prediction performance of protein interaction sites when unlabeled information is involved into the predictors and one of them can achieve the best prediction performance, i.e., the accuracy of 70.7%, the sensitivity of 62.67% and the specificity of 78.72%, respectively. With comparison to the existing studies, the semi-supervised models show the improvement of the predication performance.
format	Online Article Text
id	pubmed-6929468
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-69294682019-12-30 Semi-supervised prediction of protein interaction sites from unlabeled sample information Wang, Ye Mei, Changqing Zhou, Yuming Wang, Yan Zheng, Chunhou Zhen, Xiao Xiong, Yan Chen, Peng Zhang, Jun Wang, Bing BMC Bioinformatics Research BACKGROUND: The recognition of protein interaction sites is of great significance in many biological processes, signaling pathways and drug designs. However, most sites on protein sequences cannot be defined as interface or non-interface sites because only a small part of protein interactions had been identified, which will cause the lack of prediction accuracy and generalization ability of predictors in protein interaction sites prediction. Therefore, it is necessary to effectively improve prediction performance of protein interaction sites using large amounts of unlabeled data together with small amounts of labeled data and background knowledge today. RESULTS: In this work, three semi-supervised support vector machine–based methods are proposed to improve the performance in the protein interaction sites prediction, in which the information of unlabeled protein sites can be involved. Herein, five features related with the evolutionary conservation of amino acids are extracted from HSSP database and Consurf Sever, i.e., residue spatial sequence spectrum, residue sequence information entropy and relative entropy, residue sequence conserved weight and residual Base evolution rate, to represent the residues within the protein sequence. Then three predictors are built for identifying the interface residues from protein surface using three types of semi-supervised support vector machine algorithms. CONCLUSION: The experimental results demonstrated that the semi-supervised approaches can effectively improve prediction performance of protein interaction sites when unlabeled information is involved into the predictors and one of them can achieve the best prediction performance, i.e., the accuracy of 70.7%, the sensitivity of 62.67% and the specificity of 78.72%, respectively. With comparison to the existing studies, the semi-supervised models show the improvement of the predication performance. BioMed Central 2019-12-24 /pmc/articles/PMC6929468/ /pubmed/31874616 http://dx.doi.org/10.1186/s12859-019-3274-7 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Wang, Ye Mei, Changqing Zhou, Yuming Wang, Yan Zheng, Chunhou Zhen, Xiao Xiong, Yan Chen, Peng Zhang, Jun Wang, Bing Semi-supervised prediction of protein interaction sites from unlabeled sample information
title	Semi-supervised prediction of protein interaction sites from unlabeled sample information
title_full	Semi-supervised prediction of protein interaction sites from unlabeled sample information
title_fullStr	Semi-supervised prediction of protein interaction sites from unlabeled sample information
title_full_unstemmed	Semi-supervised prediction of protein interaction sites from unlabeled sample information
title_short	Semi-supervised prediction of protein interaction sites from unlabeled sample information
title_sort	semi-supervised prediction of protein interaction sites from unlabeled sample information
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929468/ https://www.ncbi.nlm.nih.gov/pubmed/31874616 http://dx.doi.org/10.1186/s12859-019-3274-7
work_keys_str_mv	AT wangye semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation AT meichangqing semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation AT zhouyuming semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation AT wangyan semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation AT zhengchunhou semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation AT zhenxiao semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation AT xiongyan semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation AT chenpeng semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation AT zhangjun semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation AT wangbing semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation

Semi-supervised prediction of protein interaction sites from unlabeled sample information

Ejemplares similares