Cargando…
C10Pred: A First Machine Learning Based Tool to Predict C10 Family Cysteine Peptidases Using Sequence-Derived Features
Streptococcus pyogenes, or group A Streptococcus (GAS), a gram-positive bacterium, is implicated in a wide range of clinical manifestations and life-threatening diseases. One of the key virulence factors of GAS is streptopain, a C10 family cysteine peptidase. Since its discovery, various homologs of...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9455582/ https://www.ncbi.nlm.nih.gov/pubmed/36076915 http://dx.doi.org/10.3390/ijms23179518 |
_version_ | 1784785610538483712 |
---|---|
author | Malik, Adeel Mahajan, Nitin Dar, Tanveer Ali Kim, Chang-Bae |
author_facet | Malik, Adeel Mahajan, Nitin Dar, Tanveer Ali Kim, Chang-Bae |
author_sort | Malik, Adeel |
collection | PubMed |
description | Streptococcus pyogenes, or group A Streptococcus (GAS), a gram-positive bacterium, is implicated in a wide range of clinical manifestations and life-threatening diseases. One of the key virulence factors of GAS is streptopain, a C10 family cysteine peptidase. Since its discovery, various homologs of streptopain have been reported from other bacterial species. With the increased affordability of sequencing, a significant increase in the number of potential C10 family-like sequences in the public databases is anticipated, posing a challenge in classifying such sequences. Sequence-similarity-based tools are the methods of choice to identify such streptopain-like sequences. However, these methods depend on some level of sequence similarity between the existing C10 family and the target sequences. Therefore, in this work, we propose a novel predictor, C10Pred, for the prediction of C10 peptidases using sequence-derived optimal features. C10Pred is a support vector machine (SVM) based model which is efficient in predicting C10 enzymes with an overall accuracy of 92.7% and Matthews’ correlation coefficient (MCC) value of 0.855 when tested on an independent dataset. We anticipate that C10Pred will serve as a handy tool to classify novel streptopain-like proteins belonging to the C10 family and offer essential information. |
format | Online Article Text |
id | pubmed-9455582 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-94555822022-09-09 C10Pred: A First Machine Learning Based Tool to Predict C10 Family Cysteine Peptidases Using Sequence-Derived Features Malik, Adeel Mahajan, Nitin Dar, Tanveer Ali Kim, Chang-Bae Int J Mol Sci Article Streptococcus pyogenes, or group A Streptococcus (GAS), a gram-positive bacterium, is implicated in a wide range of clinical manifestations and life-threatening diseases. One of the key virulence factors of GAS is streptopain, a C10 family cysteine peptidase. Since its discovery, various homologs of streptopain have been reported from other bacterial species. With the increased affordability of sequencing, a significant increase in the number of potential C10 family-like sequences in the public databases is anticipated, posing a challenge in classifying such sequences. Sequence-similarity-based tools are the methods of choice to identify such streptopain-like sequences. However, these methods depend on some level of sequence similarity between the existing C10 family and the target sequences. Therefore, in this work, we propose a novel predictor, C10Pred, for the prediction of C10 peptidases using sequence-derived optimal features. C10Pred is a support vector machine (SVM) based model which is efficient in predicting C10 enzymes with an overall accuracy of 92.7% and Matthews’ correlation coefficient (MCC) value of 0.855 when tested on an independent dataset. We anticipate that C10Pred will serve as a handy tool to classify novel streptopain-like proteins belonging to the C10 family and offer essential information. MDPI 2022-08-23 /pmc/articles/PMC9455582/ /pubmed/36076915 http://dx.doi.org/10.3390/ijms23179518 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Malik, Adeel Mahajan, Nitin Dar, Tanveer Ali Kim, Chang-Bae C10Pred: A First Machine Learning Based Tool to Predict C10 Family Cysteine Peptidases Using Sequence-Derived Features |
title | C10Pred: A First Machine Learning Based Tool to Predict C10 Family Cysteine Peptidases Using Sequence-Derived Features |
title_full | C10Pred: A First Machine Learning Based Tool to Predict C10 Family Cysteine Peptidases Using Sequence-Derived Features |
title_fullStr | C10Pred: A First Machine Learning Based Tool to Predict C10 Family Cysteine Peptidases Using Sequence-Derived Features |
title_full_unstemmed | C10Pred: A First Machine Learning Based Tool to Predict C10 Family Cysteine Peptidases Using Sequence-Derived Features |
title_short | C10Pred: A First Machine Learning Based Tool to Predict C10 Family Cysteine Peptidases Using Sequence-Derived Features |
title_sort | c10pred: a first machine learning based tool to predict c10 family cysteine peptidases using sequence-derived features |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9455582/ https://www.ncbi.nlm.nih.gov/pubmed/36076915 http://dx.doi.org/10.3390/ijms23179518 |
work_keys_str_mv | AT malikadeel c10predafirstmachinelearningbasedtooltopredictc10familycysteinepeptidasesusingsequencederivedfeatures AT mahajannitin c10predafirstmachinelearningbasedtooltopredictc10familycysteinepeptidasesusingsequencederivedfeatures AT dartanveerali c10predafirstmachinelearningbasedtooltopredictc10familycysteinepeptidasesusingsequencederivedfeatures AT kimchangbae c10predafirstmachinelearningbasedtooltopredictc10familycysteinepeptidasesusingsequencederivedfeatures |