Cargando…
Semi-supervised prediction of protein interaction sites from unlabeled sample information
BACKGROUND: The recognition of protein interaction sites is of great significance in many biological processes, signaling pathways and drug designs. However, most sites on protein sequences cannot be defined as interface or non-interface sites because only a small part of protein interactions had be...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929468/ https://www.ncbi.nlm.nih.gov/pubmed/31874616 http://dx.doi.org/10.1186/s12859-019-3274-7 |
_version_ | 1783482707488014336 |
---|---|
author | Wang, Ye Mei, Changqing Zhou, Yuming Wang, Yan Zheng, Chunhou Zhen, Xiao Xiong, Yan Chen, Peng Zhang, Jun Wang, Bing |
author_facet | Wang, Ye Mei, Changqing Zhou, Yuming Wang, Yan Zheng, Chunhou Zhen, Xiao Xiong, Yan Chen, Peng Zhang, Jun Wang, Bing |
author_sort | Wang, Ye |
collection | PubMed |
description | BACKGROUND: The recognition of protein interaction sites is of great significance in many biological processes, signaling pathways and drug designs. However, most sites on protein sequences cannot be defined as interface or non-interface sites because only a small part of protein interactions had been identified, which will cause the lack of prediction accuracy and generalization ability of predictors in protein interaction sites prediction. Therefore, it is necessary to effectively improve prediction performance of protein interaction sites using large amounts of unlabeled data together with small amounts of labeled data and background knowledge today. RESULTS: In this work, three semi-supervised support vector machine–based methods are proposed to improve the performance in the protein interaction sites prediction, in which the information of unlabeled protein sites can be involved. Herein, five features related with the evolutionary conservation of amino acids are extracted from HSSP database and Consurf Sever, i.e., residue spatial sequence spectrum, residue sequence information entropy and relative entropy, residue sequence conserved weight and residual Base evolution rate, to represent the residues within the protein sequence. Then three predictors are built for identifying the interface residues from protein surface using three types of semi-supervised support vector machine algorithms. CONCLUSION: The experimental results demonstrated that the semi-supervised approaches can effectively improve prediction performance of protein interaction sites when unlabeled information is involved into the predictors and one of them can achieve the best prediction performance, i.e., the accuracy of 70.7%, the sensitivity of 62.67% and the specificity of 78.72%, respectively. With comparison to the existing studies, the semi-supervised models show the improvement of the predication performance. |
format | Online Article Text |
id | pubmed-6929468 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-69294682019-12-30 Semi-supervised prediction of protein interaction sites from unlabeled sample information Wang, Ye Mei, Changqing Zhou, Yuming Wang, Yan Zheng, Chunhou Zhen, Xiao Xiong, Yan Chen, Peng Zhang, Jun Wang, Bing BMC Bioinformatics Research BACKGROUND: The recognition of protein interaction sites is of great significance in many biological processes, signaling pathways and drug designs. However, most sites on protein sequences cannot be defined as interface or non-interface sites because only a small part of protein interactions had been identified, which will cause the lack of prediction accuracy and generalization ability of predictors in protein interaction sites prediction. Therefore, it is necessary to effectively improve prediction performance of protein interaction sites using large amounts of unlabeled data together with small amounts of labeled data and background knowledge today. RESULTS: In this work, three semi-supervised support vector machine–based methods are proposed to improve the performance in the protein interaction sites prediction, in which the information of unlabeled protein sites can be involved. Herein, five features related with the evolutionary conservation of amino acids are extracted from HSSP database and Consurf Sever, i.e., residue spatial sequence spectrum, residue sequence information entropy and relative entropy, residue sequence conserved weight and residual Base evolution rate, to represent the residues within the protein sequence. Then three predictors are built for identifying the interface residues from protein surface using three types of semi-supervised support vector machine algorithms. CONCLUSION: The experimental results demonstrated that the semi-supervised approaches can effectively improve prediction performance of protein interaction sites when unlabeled information is involved into the predictors and one of them can achieve the best prediction performance, i.e., the accuracy of 70.7%, the sensitivity of 62.67% and the specificity of 78.72%, respectively. With comparison to the existing studies, the semi-supervised models show the improvement of the predication performance. BioMed Central 2019-12-24 /pmc/articles/PMC6929468/ /pubmed/31874616 http://dx.doi.org/10.1186/s12859-019-3274-7 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Wang, Ye Mei, Changqing Zhou, Yuming Wang, Yan Zheng, Chunhou Zhen, Xiao Xiong, Yan Chen, Peng Zhang, Jun Wang, Bing Semi-supervised prediction of protein interaction sites from unlabeled sample information |
title | Semi-supervised prediction of protein interaction sites from unlabeled sample information |
title_full | Semi-supervised prediction of protein interaction sites from unlabeled sample information |
title_fullStr | Semi-supervised prediction of protein interaction sites from unlabeled sample information |
title_full_unstemmed | Semi-supervised prediction of protein interaction sites from unlabeled sample information |
title_short | Semi-supervised prediction of protein interaction sites from unlabeled sample information |
title_sort | semi-supervised prediction of protein interaction sites from unlabeled sample information |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929468/ https://www.ncbi.nlm.nih.gov/pubmed/31874616 http://dx.doi.org/10.1186/s12859-019-3274-7 |
work_keys_str_mv | AT wangye semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation AT meichangqing semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation AT zhouyuming semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation AT wangyan semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation AT zhengchunhou semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation AT zhenxiao semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation AT xiongyan semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation AT chenpeng semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation AT zhangjun semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation AT wangbing semisupervisedpredictionofproteininteractionsitesfromunlabeledsampleinformation |