Cargando…

Robust and accurate prediction of self-interacting proteins from protein sequence information by exploiting weighted sparse representation based classifier

BACKGROUND: Self-interacting proteins (SIPs), two or more copies of the protein that can interact with each other expressed by one gene, play a central role in the regulation of most living cells and cellular functions. Although numerous SIPs data can be provided by using high-throughput experimenta...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yang, Hu, Xue-Gang, You, Zhu-Hong, Li, Li-Ping, Li, Pei-Pei, Wang, Yan-Bin, Huang, Yu-An
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9713954/
https://www.ncbi.nlm.nih.gov/pubmed/36457083
http://dx.doi.org/10.1186/s12859-022-04880-y
_version_ 1784842120069120000
author Li, Yang
Hu, Xue-Gang
You, Zhu-Hong
Li, Li-Ping
Li, Pei-Pei
Wang, Yan-Bin
Huang, Yu-An
author_facet Li, Yang
Hu, Xue-Gang
You, Zhu-Hong
Li, Li-Ping
Li, Pei-Pei
Wang, Yan-Bin
Huang, Yu-An
author_sort Li, Yang
collection PubMed
description BACKGROUND: Self-interacting proteins (SIPs), two or more copies of the protein that can interact with each other expressed by one gene, play a central role in the regulation of most living cells and cellular functions. Although numerous SIPs data can be provided by using high-throughput experimental techniques, there are still several shortcomings such as in time-consuming, costly, inefficient, and inherently high in false-positive rates, for the experimental identification of SIPs even nowadays. Therefore, it is more and more significant how to develop efficient and accurate automatic approaches as a supplement of experimental methods for assisting and accelerating the study of predicting SIPs from protein sequence information. RESULTS: In this paper, we present a novel framework, termed GLCM-WSRC (gray level co-occurrence matrix-weighted sparse representation based classification), for predicting SIPs automatically based on protein evolutionary information from protein primary sequences. More specifically, we firstly convert the protein sequence into Position Specific Scoring Matrix (PSSM) containing protein sequence evolutionary information, exploiting the Position Specific Iterated BLAST (PSI-BLAST) tool. Secondly, using an efficient feature extraction approach, i.e., GLCM, we extract abstract salient and invariant feature vectors from the PSSM, and then perform a pre-processing operation, the adaptive synthetic (ADASYN) technique, to balance the SIPs dataset to generate new feature vectors for classification. Finally, we employ an efficient and reliable WSRC model to identify SIPs according to the known information of self-interacting and non-interacting proteins. CONCLUSIONS: Extensive experimental results show that the proposed approach exhibits high prediction performance with 98.10% accuracy on the yeast dataset, and 91.51% accuracy on the human dataset, which further reveals that the proposed model could be a useful tool for large-scale self-interacting protein prediction and other bioinformatics tasks detection in the future.
format Online
Article
Text
id pubmed-9713954
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-97139542022-12-02 Robust and accurate prediction of self-interacting proteins from protein sequence information by exploiting weighted sparse representation based classifier Li, Yang Hu, Xue-Gang You, Zhu-Hong Li, Li-Ping Li, Pei-Pei Wang, Yan-Bin Huang, Yu-An BMC Bioinformatics Research BACKGROUND: Self-interacting proteins (SIPs), two or more copies of the protein that can interact with each other expressed by one gene, play a central role in the regulation of most living cells and cellular functions. Although numerous SIPs data can be provided by using high-throughput experimental techniques, there are still several shortcomings such as in time-consuming, costly, inefficient, and inherently high in false-positive rates, for the experimental identification of SIPs even nowadays. Therefore, it is more and more significant how to develop efficient and accurate automatic approaches as a supplement of experimental methods for assisting and accelerating the study of predicting SIPs from protein sequence information. RESULTS: In this paper, we present a novel framework, termed GLCM-WSRC (gray level co-occurrence matrix-weighted sparse representation based classification), for predicting SIPs automatically based on protein evolutionary information from protein primary sequences. More specifically, we firstly convert the protein sequence into Position Specific Scoring Matrix (PSSM) containing protein sequence evolutionary information, exploiting the Position Specific Iterated BLAST (PSI-BLAST) tool. Secondly, using an efficient feature extraction approach, i.e., GLCM, we extract abstract salient and invariant feature vectors from the PSSM, and then perform a pre-processing operation, the adaptive synthetic (ADASYN) technique, to balance the SIPs dataset to generate new feature vectors for classification. Finally, we employ an efficient and reliable WSRC model to identify SIPs according to the known information of self-interacting and non-interacting proteins. CONCLUSIONS: Extensive experimental results show that the proposed approach exhibits high prediction performance with 98.10% accuracy on the yeast dataset, and 91.51% accuracy on the human dataset, which further reveals that the proposed model could be a useful tool for large-scale self-interacting protein prediction and other bioinformatics tasks detection in the future. BioMed Central 2022-12-01 /pmc/articles/PMC9713954/ /pubmed/36457083 http://dx.doi.org/10.1186/s12859-022-04880-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Li, Yang
Hu, Xue-Gang
You, Zhu-Hong
Li, Li-Ping
Li, Pei-Pei
Wang, Yan-Bin
Huang, Yu-An
Robust and accurate prediction of self-interacting proteins from protein sequence information by exploiting weighted sparse representation based classifier
title Robust and accurate prediction of self-interacting proteins from protein sequence information by exploiting weighted sparse representation based classifier
title_full Robust and accurate prediction of self-interacting proteins from protein sequence information by exploiting weighted sparse representation based classifier
title_fullStr Robust and accurate prediction of self-interacting proteins from protein sequence information by exploiting weighted sparse representation based classifier
title_full_unstemmed Robust and accurate prediction of self-interacting proteins from protein sequence information by exploiting weighted sparse representation based classifier
title_short Robust and accurate prediction of self-interacting proteins from protein sequence information by exploiting weighted sparse representation based classifier
title_sort robust and accurate prediction of self-interacting proteins from protein sequence information by exploiting weighted sparse representation based classifier
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9713954/
https://www.ncbi.nlm.nih.gov/pubmed/36457083
http://dx.doi.org/10.1186/s12859-022-04880-y
work_keys_str_mv AT liyang robustandaccuratepredictionofselfinteractingproteinsfromproteinsequenceinformationbyexploitingweightedsparserepresentationbasedclassifier
AT huxuegang robustandaccuratepredictionofselfinteractingproteinsfromproteinsequenceinformationbyexploitingweightedsparserepresentationbasedclassifier
AT youzhuhong robustandaccuratepredictionofselfinteractingproteinsfromproteinsequenceinformationbyexploitingweightedsparserepresentationbasedclassifier
AT liliping robustandaccuratepredictionofselfinteractingproteinsfromproteinsequenceinformationbyexploitingweightedsparserepresentationbasedclassifier
AT lipeipei robustandaccuratepredictionofselfinteractingproteinsfromproteinsequenceinformationbyexploitingweightedsparserepresentationbasedclassifier
AT wangyanbin robustandaccuratepredictionofselfinteractingproteinsfromproteinsequenceinformationbyexploitingweightedsparserepresentationbasedclassifier
AT huangyuan robustandaccuratepredictionofselfinteractingproteinsfromproteinsequenceinformationbyexploitingweightedsparserepresentationbasedclassifier