Cargando…

SortPred: The first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information

Sortase enzymes are cysteine transpeptidases that embellish the surface of Gram-positive bacteria with various proteins thereby allowing these microorganisms to interact with their neighboring environment. It is known that several of their substrates can cause pathological implications, so researche...

Descripción completa

Detalles Bibliográficos
Autores principales: Malik, Adeel, Subramaniyam, Sathiyamoorthy, Kim, Chang-Bae, Manavalan, Balachandran
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8703055/
https://www.ncbi.nlm.nih.gov/pubmed/34976319
http://dx.doi.org/10.1016/j.csbj.2021.12.014
_version_ 1784621380473454592
author Malik, Adeel
Subramaniyam, Sathiyamoorthy
Kim, Chang-Bae
Manavalan, Balachandran
author_facet Malik, Adeel
Subramaniyam, Sathiyamoorthy
Kim, Chang-Bae
Manavalan, Balachandran
author_sort Malik, Adeel
collection PubMed
description Sortase enzymes are cysteine transpeptidases that embellish the surface of Gram-positive bacteria with various proteins thereby allowing these microorganisms to interact with their neighboring environment. It is known that several of their substrates can cause pathological implications, so researchers have focused on the development of sortase inhibitors. Currently, six different classes of sortases (A-F) are recognized. However, with the extensive application of bacterial genome sequencing projects, the number of potential sortases in the public databases has exploded, presenting considerable challenges in annotating these sequences. It is very laborious and time-consuming to characterize these sortase classes experimentally. Therefore, this study developed the first machine-learning-based two-layer predictor called SortPred, where the first layer predicts the sortase from the given sequence and the second layer predicts their class from the predicted sortase. To develop SortPred, we constructed an original benchmarking dataset and investigated 31 feature descriptors, primarily on five feature encoding algorithms. Afterward, each of these descriptors were trained using a random forest classifier and their robustness was evaluated with an independent dataset. Finally, we selected the final model independently for both layers depending on the performance consistency between cross-validation and independent evaluation. SortPred is expected to be an effective tool for identifying bacterial sortases, which in turn may aid in designing sortase inhibitors and exploring their functions. The SortPred webserver and a standalone version are freely accessible at: https://procarb.org/sortpred.
format Online
Article
Text
id pubmed-8703055
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-87030552021-12-30 SortPred: The first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information Malik, Adeel Subramaniyam, Sathiyamoorthy Kim, Chang-Bae Manavalan, Balachandran Comput Struct Biotechnol J Research Article Sortase enzymes are cysteine transpeptidases that embellish the surface of Gram-positive bacteria with various proteins thereby allowing these microorganisms to interact with their neighboring environment. It is known that several of their substrates can cause pathological implications, so researchers have focused on the development of sortase inhibitors. Currently, six different classes of sortases (A-F) are recognized. However, with the extensive application of bacterial genome sequencing projects, the number of potential sortases in the public databases has exploded, presenting considerable challenges in annotating these sequences. It is very laborious and time-consuming to characterize these sortase classes experimentally. Therefore, this study developed the first machine-learning-based two-layer predictor called SortPred, where the first layer predicts the sortase from the given sequence and the second layer predicts their class from the predicted sortase. To develop SortPred, we constructed an original benchmarking dataset and investigated 31 feature descriptors, primarily on five feature encoding algorithms. Afterward, each of these descriptors were trained using a random forest classifier and their robustness was evaluated with an independent dataset. Finally, we selected the final model independently for both layers depending on the performance consistency between cross-validation and independent evaluation. SortPred is expected to be an effective tool for identifying bacterial sortases, which in turn may aid in designing sortase inhibitors and exploring their functions. The SortPred webserver and a standalone version are freely accessible at: https://procarb.org/sortpred. Research Network of Computational and Structural Biotechnology 2021-12-14 /pmc/articles/PMC8703055/ /pubmed/34976319 http://dx.doi.org/10.1016/j.csbj.2021.12.014 Text en © 2021 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Malik, Adeel
Subramaniyam, Sathiyamoorthy
Kim, Chang-Bae
Manavalan, Balachandran
SortPred: The first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information
title SortPred: The first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information
title_full SortPred: The first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information
title_fullStr SortPred: The first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information
title_full_unstemmed SortPred: The first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information
title_short SortPred: The first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information
title_sort sortpred: the first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8703055/
https://www.ncbi.nlm.nih.gov/pubmed/34976319
http://dx.doi.org/10.1016/j.csbj.2021.12.014
work_keys_str_mv AT malikadeel sortpredthefirstmachinelearningbasedpredictortoidentifybacterialsortasesandtheirclassesusingsequencederivedinformation
AT subramaniyamsathiyamoorthy sortpredthefirstmachinelearningbasedpredictortoidentifybacterialsortasesandtheirclassesusingsequencederivedinformation
AT kimchangbae sortpredthefirstmachinelearningbasedpredictortoidentifybacterialsortasesandtheirclassesusingsequencederivedinformation
AT manavalanbalachandran sortpredthefirstmachinelearningbasedpredictortoidentifybacterialsortasesandtheirclassesusingsequencederivedinformation