Cargando…

Semi-supervised prediction of protein interaction sites from unlabeled sample information

BACKGROUND: The recognition of protein interaction sites is of great significance in many biological processes, signaling pathways and drug designs. However, most sites on protein sequences cannot be defined as interface or non-interface sites because only a small part of protein interactions had be...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Ye, Mei, Changqing, Zhou, Yuming, Wang, Yan, Zheng, Chunhou, Zhen, Xiao, Xiong, Yan, Chen, Peng, Zhang, Jun, Wang, Bing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929468/
https://www.ncbi.nlm.nih.gov/pubmed/31874616
http://dx.doi.org/10.1186/s12859-019-3274-7
_version_ 1783482707488014336
author Wang, Ye
Mei, Changqing
Zhou, Yuming
Wang, Yan
Zheng, Chunhou
Zhen, Xiao
Xiong, Yan
Chen, Peng
Zhang, Jun
Wang, Bing
author_facet Wang, Ye
Mei, Changqing
Zhou, Yuming
Wang, Yan
Zheng, Chunhou
Zhen, Xiao
Xiong, Yan
Chen, Peng
Zhang, Jun
Wang, Bing
author_sort Wang, Ye
collection PubMed
description BACKGROUND: The recognition of protein interaction sites is of great significance in many biological processes, signaling pathways and drug designs. However, most sites on protein sequences cannot be defined as interface or non-interface sites because only a small part of protein interactions had been identified, which will cause the lack of prediction accuracy and generalization ability of predictors in protein interaction sites prediction. Therefore, it is necessary to effectively improve prediction performance of protein interaction sites using large amounts of unlabeled data together with small amounts of labeled data and background knowledge today. RESULTS: In this work, three semi-supervised support vector machine–based methods are proposed to improve the performance in the protein interaction sites prediction, in which the information of unlabeled protein sites can be involved. Herein, five features related with the evolutionary conservation of amino acids are extracted from HSSP database and Consurf Sever, i.e., residue spatial sequence spectrum, residue sequence information entropy and relative entropy, residue sequence conserved weight and residual Base evolution rate, to represent the residues within the protein sequence. Then three predictors are built for identifying the interface residues from protein surface using three types of semi-supervised support vector machine algorithms. CONCLUSION: The experimental results demonstrated that the semi-supervised approaches can effectively improve prediction performance of protein interaction sites when unlabeled information is involved into the predictors and one of them can achieve the best prediction performance, i.e., the accuracy of 70.7%, the sensitivity of 62.67% and the specificity of 78.72%, respectively. With comparison to the existing studies, the semi-supervised models show the improvement of the predication performance.
format Online
Article
Text
id pubmed-6929468
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69294682019-12-30 Semi-supervised prediction of protein interaction sites from unlabeled sample information Wang, Ye Mei, Changqing Zhou, Yuming Wang, Yan Zheng, Chunhou Zhen, Xiao Xiong, Yan Chen, Peng Zhang, Jun Wang, Bing BMC Bioinformatics Research BACKGROUND: The recognition of protein interaction sites is of great significance in many biological processes, signaling pathways and drug designs. However, most sites on protein sequences cannot be defined as interface or non-interface sites because only a small part of protein interactions had been identified, which will cause the lack of prediction accuracy and generalization ability of predictors in protein interaction sites prediction. Therefore, it is necessary to effectively improve prediction performance of protein interaction sites using large amounts of unlabeled data together with small amounts of labeled data and background knowledge today. RESULTS: In this work, three semi-supervised support vector machine–based methods are proposed to improve the performance in the protein interaction sites prediction, in which the information of unlabeled protein sites can be involved. Herein, five features related with the evolutionary conservation of amino acids are extracted from HSSP database and Consurf Sever, i.e., residue spatial sequence spectrum, residue sequence information entropy and relative entropy, residue sequence conserved weight and residual Base evolution rate, to represent the residues within the protein sequence. Then three predictors are built for identifying the interface residues from protein surface using three types of semi-supervised support vector machine algorithms. CONCLUSION: The experimental results demonstrated that the semi-supervised approaches can effectively improve prediction performance of protein interaction sites when unlabeled information is involved into the predictors and one of them can achieve the best prediction performance, i.e., the accuracy of 70.7%, the sensitivity of 62.67% and the specificity of 78.72%, respectively. With comparison to the existing studies, the semi-supervised models show the improvement of the predication performance. BioMed Central 2019-12-24 /pmc/articles/PMC6929468/ /pubmed/31874616 http://dx.doi.org/10.1186/s12859-019-3274-7 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Wang, Ye
Mei, Changqing
Zhou, Yuming
Wang, Yan
Zheng, Chunhou
Zhen, Xiao
Xiong, Yan
Chen, Peng
Zhang, Jun
Wang, Bing
Semi-supervised prediction of protein interaction sites from unlabeled sample information
title Semi-supervised prediction of protein interaction sites from unlabeled sample information
title_full Semi-supervised prediction of protein interaction sites from unlabeled sample information
title_fullStr Semi-supervised prediction of protein interaction sites from unlabeled sample information
title_full_unstemmed Semi-supervised prediction of protein interaction sites from unlabeled sample information
title_short Semi-supervised prediction of protein interaction sites from unlabeled sample information
title_sort semi-supervised prediction of protein interaction sites from unlabeled sample information
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929468/
https://www.ncbi.nlm.nih.gov/pubmed/31874616
http://dx.doi.org/10.1186/s12859-019-3274-7
work_keys_str_mv AT wangye semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation
AT meichangqing semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation
AT zhouyuming semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation
AT wangyan semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation
AT zhengchunhou semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation
AT zhenxiao semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation
AT xiongyan semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation
AT chenpeng semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation
AT zhangjun semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation
AT wangbing semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation