Cargando…

DeepSecE: A Deep-Learning-Based Framework for Multiclass Prediction of Secreted Proteins in Gram-Negative Bacteria

Proteins secreted by Gram-negative bacteria are tightly linked to the virulence and adaptability of these microbes to environmental changes. Accurate identification of such secreted proteins can facilitate the investigations of infections and diseases caused by these bacterial pathogens. However, cu...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Yumeng, Guan, Jiahao, Li, Chen, Wang, Zhikang, Deng, Zixin, Gasser, Robin B., Song, Jiangning, Ou, Hong-Yu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: AAAS 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10599158/
https://www.ncbi.nlm.nih.gov/pubmed/37886621
http://dx.doi.org/10.34133/research.0258
_version_ 1785125715829587968
author Zhang, Yumeng
Guan, Jiahao
Li, Chen
Wang, Zhikang
Deng, Zixin
Gasser, Robin B.
Song, Jiangning
Ou, Hong-Yu
author_facet Zhang, Yumeng
Guan, Jiahao
Li, Chen
Wang, Zhikang
Deng, Zixin
Gasser, Robin B.
Song, Jiangning
Ou, Hong-Yu
author_sort Zhang, Yumeng
collection PubMed
description Proteins secreted by Gram-negative bacteria are tightly linked to the virulence and adaptability of these microbes to environmental changes. Accurate identification of such secreted proteins can facilitate the investigations of infections and diseases caused by these bacterial pathogens. However, current bioinformatic methods for predicting bacterial secreted substrate proteins have limited computational efficiency and application scope on a genome-wide scale. Here, we propose a novel deep-learning-based framework—DeepSecE—for the simultaneous inference of multiple distinct groups of secreted proteins produced by Gram-negative bacteria. DeepSecE remarkably improves their classification from nonsecreted proteins using a pretrained protein language model and transformer, achieving a macro-average accuracy of 0.883 on 5-fold cross-validation. Performance benchmarking suggests that DeepSecE achieves competitive performance with the state-of-the-art binary predictors specialized for individual types of secreted substrates. The attention mechanism corroborates salient patterns and motifs at the N or C termini of the protein sequences. Using this pipeline, we further investigate the genome-wide prediction of novel secreted proteins and their taxonomic distribution across ~1,000 Gram-negative bacterial genomes. The present analysis demonstrates that DeepSecE has major potential for the discovery of disease-associated secreted proteins in a diverse range of Gram-negative bacteria. An online web server of DeepSecE is also publicly available to predict and explore various secreted substrate proteins via the input of bacterial genome sequences.
format Online
Article
Text
id pubmed-10599158
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher AAAS
record_format MEDLINE/PubMed
spelling pubmed-105991582023-10-26 DeepSecE: A Deep-Learning-Based Framework for Multiclass Prediction of Secreted Proteins in Gram-Negative Bacteria Zhang, Yumeng Guan, Jiahao Li, Chen Wang, Zhikang Deng, Zixin Gasser, Robin B. Song, Jiangning Ou, Hong-Yu Research (Wash D C) Research Article Proteins secreted by Gram-negative bacteria are tightly linked to the virulence and adaptability of these microbes to environmental changes. Accurate identification of such secreted proteins can facilitate the investigations of infections and diseases caused by these bacterial pathogens. However, current bioinformatic methods for predicting bacterial secreted substrate proteins have limited computational efficiency and application scope on a genome-wide scale. Here, we propose a novel deep-learning-based framework—DeepSecE—for the simultaneous inference of multiple distinct groups of secreted proteins produced by Gram-negative bacteria. DeepSecE remarkably improves their classification from nonsecreted proteins using a pretrained protein language model and transformer, achieving a macro-average accuracy of 0.883 on 5-fold cross-validation. Performance benchmarking suggests that DeepSecE achieves competitive performance with the state-of-the-art binary predictors specialized for individual types of secreted substrates. The attention mechanism corroborates salient patterns and motifs at the N or C termini of the protein sequences. Using this pipeline, we further investigate the genome-wide prediction of novel secreted proteins and their taxonomic distribution across ~1,000 Gram-negative bacterial genomes. The present analysis demonstrates that DeepSecE has major potential for the discovery of disease-associated secreted proteins in a diverse range of Gram-negative bacteria. An online web server of DeepSecE is also publicly available to predict and explore various secreted substrate proteins via the input of bacterial genome sequences. AAAS 2023-10-25 /pmc/articles/PMC10599158/ /pubmed/37886621 http://dx.doi.org/10.34133/research.0258 Text en https://creativecommons.org/licenses/by/4.0/Exclusive licensee Science and Technology Review Publishing House. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution License 4.0 (CC BY 4.0) (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Zhang, Yumeng
Guan, Jiahao
Li, Chen
Wang, Zhikang
Deng, Zixin
Gasser, Robin B.
Song, Jiangning
Ou, Hong-Yu
DeepSecE: A Deep-Learning-Based Framework for Multiclass Prediction of Secreted Proteins in Gram-Negative Bacteria
title DeepSecE: A Deep-Learning-Based Framework for Multiclass Prediction of Secreted Proteins in Gram-Negative Bacteria
title_full DeepSecE: A Deep-Learning-Based Framework for Multiclass Prediction of Secreted Proteins in Gram-Negative Bacteria
title_fullStr DeepSecE: A Deep-Learning-Based Framework for Multiclass Prediction of Secreted Proteins in Gram-Negative Bacteria
title_full_unstemmed DeepSecE: A Deep-Learning-Based Framework for Multiclass Prediction of Secreted Proteins in Gram-Negative Bacteria
title_short DeepSecE: A Deep-Learning-Based Framework for Multiclass Prediction of Secreted Proteins in Gram-Negative Bacteria
title_sort deepsece: a deep-learning-based framework for multiclass prediction of secreted proteins in gram-negative bacteria
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10599158/
https://www.ncbi.nlm.nih.gov/pubmed/37886621
http://dx.doi.org/10.34133/research.0258
work_keys_str_mv AT zhangyumeng deepseceadeeplearningbasedframeworkformulticlasspredictionofsecretedproteinsingramnegativebacteria
AT guanjiahao deepseceadeeplearningbasedframeworkformulticlasspredictionofsecretedproteinsingramnegativebacteria
AT lichen deepseceadeeplearningbasedframeworkformulticlasspredictionofsecretedproteinsingramnegativebacteria
AT wangzhikang deepseceadeeplearningbasedframeworkformulticlasspredictionofsecretedproteinsingramnegativebacteria
AT dengzixin deepseceadeeplearningbasedframeworkformulticlasspredictionofsecretedproteinsingramnegativebacteria
AT gasserrobinb deepseceadeeplearningbasedframeworkformulticlasspredictionofsecretedproteinsingramnegativebacteria
AT songjiangning deepseceadeeplearningbasedframeworkformulticlasspredictionofsecretedproteinsingramnegativebacteria
AT ouhongyu deepseceadeeplearningbasedframeworkformulticlasspredictionofsecretedproteinsingramnegativebacteria