Cargando…

Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods

BACKGROUND: Moonlighting proteins (MPs) are a subclass of multifunctional proteins in which more than one independent or usually distinct function occurs in a single polypeptide chain. Identification of unknown cellular processes, understanding novel protein mechanisms, improving the prediction of p...

Descripción completa

Detalles Bibliográficos
Autores principales: Shirafkan, Farshid, Gharaghani, Sajjad, Rahimian, Karim, Sajedi, Reza Hasan, Zahiri, Javad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8142502/
https://www.ncbi.nlm.nih.gov/pubmed/34030624
http://dx.doi.org/10.1186/s12859-021-04194-5
_version_ 1783696565784805376
author Shirafkan, Farshid
Gharaghani, Sajjad
Rahimian, Karim
Sajedi, Reza Hasan
Zahiri, Javad
author_facet Shirafkan, Farshid
Gharaghani, Sajjad
Rahimian, Karim
Sajedi, Reza Hasan
Zahiri, Javad
author_sort Shirafkan, Farshid
collection PubMed
description BACKGROUND: Moonlighting proteins (MPs) are a subclass of multifunctional proteins in which more than one independent or usually distinct function occurs in a single polypeptide chain. Identification of unknown cellular processes, understanding novel protein mechanisms, improving the prediction of protein functions, and gaining information about protein evolution are the main reasons to study MPs. They also play an important role in disease pathways and drug-target discovery. Since detecting MPs experimentally is quite a challenge, most of them are detected randomly. Therefore, introducing an appropriate computational approach to predict MPs seems reasonable. RESULTS: In this study, we introduced a competent model for detecting moonlighting and non-MPs through extracted features from protein sequences. We attempted to set up a well-judged scheme for detecting outlier proteins. Consequently, 37 distinct feature vectors were utilized to study each protein’s impact on detecting MPs. Furthermore, 8 different classification methods were assessed to find the best performance. To detect outliers, each one of the classifications was executed 100 times by tenfold cross-validation on feature vectors; proteins which misclassified 90 times or more were grouped. This process was applied to every single feature vector and eventually the intersection of these groups was determined as the outlier proteins. The results of tenfold cross-validation on a dataset of 351 samples (containing 215 moonlighting and 136 non-moonlighting proteins) reveal that the SVM method on all feature vectors has the highest performance among all methods in this study and other available methods. Besides, the study of outliers showed that 57 of 351 proteins in the dataset could be an appropriate candidate for the outlier. Among the outlier proteins, there were non-MPs (such as P69797) that have been misclassified in 8 different classification methods with 16 different feature vectors. Because these proteins have been obtained by computational methods, the results of this study could reduce the likelihood of hypothesizing whether these proteins are non-moonlighting at all. CONCLUSIONS: MPs are difficult to be identified through experimentation. Using distinct feature vectors, our method enabled identification of novel moonlighting proteins. The study also pinpointed that a number of non-MPs are likely to be moonlighting. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04194-5.
format Online
Article
Text
id pubmed-8142502
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-81425022021-05-25 Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods Shirafkan, Farshid Gharaghani, Sajjad Rahimian, Karim Sajedi, Reza Hasan Zahiri, Javad BMC Bioinformatics Research BACKGROUND: Moonlighting proteins (MPs) are a subclass of multifunctional proteins in which more than one independent or usually distinct function occurs in a single polypeptide chain. Identification of unknown cellular processes, understanding novel protein mechanisms, improving the prediction of protein functions, and gaining information about protein evolution are the main reasons to study MPs. They also play an important role in disease pathways and drug-target discovery. Since detecting MPs experimentally is quite a challenge, most of them are detected randomly. Therefore, introducing an appropriate computational approach to predict MPs seems reasonable. RESULTS: In this study, we introduced a competent model for detecting moonlighting and non-MPs through extracted features from protein sequences. We attempted to set up a well-judged scheme for detecting outlier proteins. Consequently, 37 distinct feature vectors were utilized to study each protein’s impact on detecting MPs. Furthermore, 8 different classification methods were assessed to find the best performance. To detect outliers, each one of the classifications was executed 100 times by tenfold cross-validation on feature vectors; proteins which misclassified 90 times or more were grouped. This process was applied to every single feature vector and eventually the intersection of these groups was determined as the outlier proteins. The results of tenfold cross-validation on a dataset of 351 samples (containing 215 moonlighting and 136 non-moonlighting proteins) reveal that the SVM method on all feature vectors has the highest performance among all methods in this study and other available methods. Besides, the study of outliers showed that 57 of 351 proteins in the dataset could be an appropriate candidate for the outlier. Among the outlier proteins, there were non-MPs (such as P69797) that have been misclassified in 8 different classification methods with 16 different feature vectors. Because these proteins have been obtained by computational methods, the results of this study could reduce the likelihood of hypothesizing whether these proteins are non-moonlighting at all. CONCLUSIONS: MPs are difficult to be identified through experimentation. Using distinct feature vectors, our method enabled identification of novel moonlighting proteins. The study also pinpointed that a number of non-MPs are likely to be moonlighting. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04194-5. BioMed Central 2021-05-24 /pmc/articles/PMC8142502/ /pubmed/34030624 http://dx.doi.org/10.1186/s12859-021-04194-5 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Shirafkan, Farshid
Gharaghani, Sajjad
Rahimian, Karim
Sajedi, Reza Hasan
Zahiri, Javad
Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods
title Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods
title_full Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods
title_fullStr Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods
title_full_unstemmed Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods
title_short Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods
title_sort moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8142502/
https://www.ncbi.nlm.nih.gov/pubmed/34030624
http://dx.doi.org/10.1186/s12859-021-04194-5
work_keys_str_mv AT shirafkanfarshid moonlightingproteinpredictionusingphysicochemicalandevolutionalpropertiesviamachinelearningmethods
AT gharaghanisajjad moonlightingproteinpredictionusingphysicochemicalandevolutionalpropertiesviamachinelearningmethods
AT rahimiankarim moonlightingproteinpredictionusingphysicochemicalandevolutionalpropertiesviamachinelearningmethods
AT sajedirezahasan moonlightingproteinpredictionusingphysicochemicalandevolutionalpropertiesviamachinelearningmethods
AT zahirijavad moonlightingproteinpredictionusingphysicochemicalandevolutionalpropertiesviamachinelearningmethods