Cargando…

The predictive performance of short-linear motif features in the prediction of calmodulin-binding proteins

BACKGROUND: The prediction of calmodulin-binding (CaM-binding) proteins plays a very important role in the fields of biology and biochemistry, because the calmodulin protein binds and regulates a multitude of protein targets affecting different cellular processes. Computational methods that can accu...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yixun, Maleki, Mina, Carruthers, Nicholas J., Stemmer, Paul M., Ngom, Alioune, Rueda, Luis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245490/
https://www.ncbi.nlm.nih.gov/pubmed/30453876
http://dx.doi.org/10.1186/s12859-018-2378-9
_version_ 1783372252761292800
author Li, Yixun
Maleki, Mina
Carruthers, Nicholas J.
Stemmer, Paul M.
Ngom, Alioune
Rueda, Luis
author_facet Li, Yixun
Maleki, Mina
Carruthers, Nicholas J.
Stemmer, Paul M.
Ngom, Alioune
Rueda, Luis
author_sort Li, Yixun
collection PubMed
description BACKGROUND: The prediction of calmodulin-binding (CaM-binding) proteins plays a very important role in the fields of biology and biochemistry, because the calmodulin protein binds and regulates a multitude of protein targets affecting different cellular processes. Computational methods that can accurately identify CaM-binding proteins and CaM-binding domains would accelerate research in calcium signaling and calmodulin function. Short-linear motifs (SLiMs), on the other hand, have been effectively used as features for analyzing protein-protein interactions, though their properties have not been utilized in the prediction of CaM-binding proteins. RESULTS: We propose a new method for the prediction of CaM-binding proteins based on both the total and average scores of known and new SLiMs in protein sequences using a new scoring method called sliding window scoring (SWS) as features for the prediction module. A dataset of 194 manually curated human CaM-binding proteins and 193 mitochondrial proteins have been obtained and used for testing the proposed model. The motif generation tool, Multiple EM for Motif Elucidation (MEME), has been used to obtain new motifs from each of the positive and negative datasets individually (the SM approach) and from the combined negative and positive datasets (the CM approach). Moreover, the wrapper criterion with random forest for feature selection (FS) has been applied followed by classification using different algorithms such as k-nearest neighbors (k-NN), support vector machines (SVM), naive Bayes (NB) and random forest (RF). CONCLUSIONS: Our proposed method shows very good prediction results and demonstrates how information contained in SLiMs is highly relevant in predicting CaM-binding proteins. Further, three new CaM-binding motifs have been computationally selected and biologically validated in this study, and which can be used for predicting CaM-binding proteins. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2378-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6245490
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62454902018-11-26 The predictive performance of short-linear motif features in the prediction of calmodulin-binding proteins Li, Yixun Maleki, Mina Carruthers, Nicholas J. Stemmer, Paul M. Ngom, Alioune Rueda, Luis BMC Bioinformatics Research BACKGROUND: The prediction of calmodulin-binding (CaM-binding) proteins plays a very important role in the fields of biology and biochemistry, because the calmodulin protein binds and regulates a multitude of protein targets affecting different cellular processes. Computational methods that can accurately identify CaM-binding proteins and CaM-binding domains would accelerate research in calcium signaling and calmodulin function. Short-linear motifs (SLiMs), on the other hand, have been effectively used as features for analyzing protein-protein interactions, though their properties have not been utilized in the prediction of CaM-binding proteins. RESULTS: We propose a new method for the prediction of CaM-binding proteins based on both the total and average scores of known and new SLiMs in protein sequences using a new scoring method called sliding window scoring (SWS) as features for the prediction module. A dataset of 194 manually curated human CaM-binding proteins and 193 mitochondrial proteins have been obtained and used for testing the proposed model. The motif generation tool, Multiple EM for Motif Elucidation (MEME), has been used to obtain new motifs from each of the positive and negative datasets individually (the SM approach) and from the combined negative and positive datasets (the CM approach). Moreover, the wrapper criterion with random forest for feature selection (FS) has been applied followed by classification using different algorithms such as k-nearest neighbors (k-NN), support vector machines (SVM), naive Bayes (NB) and random forest (RF). CONCLUSIONS: Our proposed method shows very good prediction results and demonstrates how information contained in SLiMs is highly relevant in predicting CaM-binding proteins. Further, three new CaM-binding motifs have been computationally selected and biologically validated in this study, and which can be used for predicting CaM-binding proteins. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2378-9) contains supplementary material, which is available to authorized users. BioMed Central 2018-11-20 /pmc/articles/PMC6245490/ /pubmed/30453876 http://dx.doi.org/10.1186/s12859-018-2378-9 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Li, Yixun
Maleki, Mina
Carruthers, Nicholas J.
Stemmer, Paul M.
Ngom, Alioune
Rueda, Luis
The predictive performance of short-linear motif features in the prediction of calmodulin-binding proteins
title The predictive performance of short-linear motif features in the prediction of calmodulin-binding proteins
title_full The predictive performance of short-linear motif features in the prediction of calmodulin-binding proteins
title_fullStr The predictive performance of short-linear motif features in the prediction of calmodulin-binding proteins
title_full_unstemmed The predictive performance of short-linear motif features in the prediction of calmodulin-binding proteins
title_short The predictive performance of short-linear motif features in the prediction of calmodulin-binding proteins
title_sort predictive performance of short-linear motif features in the prediction of calmodulin-binding proteins
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245490/
https://www.ncbi.nlm.nih.gov/pubmed/30453876
http://dx.doi.org/10.1186/s12859-018-2378-9
work_keys_str_mv AT liyixun thepredictiveperformanceofshortlinearmotiffeaturesinthepredictionofcalmodulinbindingproteins
AT malekimina thepredictiveperformanceofshortlinearmotiffeaturesinthepredictionofcalmodulinbindingproteins
AT carruthersnicholasj thepredictiveperformanceofshortlinearmotiffeaturesinthepredictionofcalmodulinbindingproteins
AT stemmerpaulm thepredictiveperformanceofshortlinearmotiffeaturesinthepredictionofcalmodulinbindingproteins
AT ngomalioune thepredictiveperformanceofshortlinearmotiffeaturesinthepredictionofcalmodulinbindingproteins
AT ruedaluis thepredictiveperformanceofshortlinearmotiffeaturesinthepredictionofcalmodulinbindingproteins
AT liyixun predictiveperformanceofshortlinearmotiffeaturesinthepredictionofcalmodulinbindingproteins
AT malekimina predictiveperformanceofshortlinearmotiffeaturesinthepredictionofcalmodulinbindingproteins
AT carruthersnicholasj predictiveperformanceofshortlinearmotiffeaturesinthepredictionofcalmodulinbindingproteins
AT stemmerpaulm predictiveperformanceofshortlinearmotiffeaturesinthepredictionofcalmodulinbindingproteins
AT ngomalioune predictiveperformanceofshortlinearmotiffeaturesinthepredictionofcalmodulinbindingproteins
AT ruedaluis predictiveperformanceofshortlinearmotiffeaturesinthepredictionofcalmodulinbindingproteins