Cargando…
T1SEstacker: A Tri-Layer Stacking Model Effectively Predicts Bacterial Type 1 Secreted Proteins Based on C-Terminal Non-repeats-in-Toxin-Motif Sequence Features
Type 1 secretion systems play important roles in pathogenicity of Gram-negative bacteria. However, the substrate secretion mechanism remains largely unknown. In this research, we observed the sequence features of repeats-in-toxin (RTX) proteins, a major class of type 1 secreted effectors (T1SEs). We...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8861453/ https://www.ncbi.nlm.nih.gov/pubmed/35211101 http://dx.doi.org/10.3389/fmicb.2021.813094 |
_version_ | 1784654890277011456 |
---|---|
author | Chen, Zewei Zhao, Ziyi Hui, Xinjie Zhang, Junya Hu, Yixue Chen, Runhong Cai, Xuxia Hu, Yueming Wang, Yejun |
author_facet | Chen, Zewei Zhao, Ziyi Hui, Xinjie Zhang, Junya Hu, Yixue Chen, Runhong Cai, Xuxia Hu, Yueming Wang, Yejun |
author_sort | Chen, Zewei |
collection | PubMed |
description | Type 1 secretion systems play important roles in pathogenicity of Gram-negative bacteria. However, the substrate secretion mechanism remains largely unknown. In this research, we observed the sequence features of repeats-in-toxin (RTX) proteins, a major class of type 1 secreted effectors (T1SEs). We found striking non-RTX-motif amino acid composition patterns at the C termini, most typically exemplified by the enriched “[FLI][VAI]” at the most C-terminal two positions. Machine-learning models, including deep-learning ones, were trained using these sequence-based non-RTX-motif features and further combined into a tri-layer stacking model, T1SEstacker, which predicted the RTX proteins accurately, with a fivefold cross-validated sensitivity of ∼0.89 at the specificity of ∼0.94. Besides substrates with RTX motifs, T1SEstacker can also well distinguish non-RTX-motif T1SEs, further suggesting their potential existence of common secretion signals. T1SEstacker was applied to predict T1SEs from the genomes of representative Salmonella strains, and we found that both the number and composition of T1SEs varied among strains. The number of T1SEs is estimated to reach 100 or more in each strain, much larger than what we expected. In summary, we made comprehensive sequence analysis on the type 1 secreted RTX proteins, identified common sequence-based features at the C termini, and developed a stacking model that can predict type 1 secreted proteins accurately. |
format | Online Article Text |
id | pubmed-8861453 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-88614532022-02-23 T1SEstacker: A Tri-Layer Stacking Model Effectively Predicts Bacterial Type 1 Secreted Proteins Based on C-Terminal Non-repeats-in-Toxin-Motif Sequence Features Chen, Zewei Zhao, Ziyi Hui, Xinjie Zhang, Junya Hu, Yixue Chen, Runhong Cai, Xuxia Hu, Yueming Wang, Yejun Front Microbiol Microbiology Type 1 secretion systems play important roles in pathogenicity of Gram-negative bacteria. However, the substrate secretion mechanism remains largely unknown. In this research, we observed the sequence features of repeats-in-toxin (RTX) proteins, a major class of type 1 secreted effectors (T1SEs). We found striking non-RTX-motif amino acid composition patterns at the C termini, most typically exemplified by the enriched “[FLI][VAI]” at the most C-terminal two positions. Machine-learning models, including deep-learning ones, were trained using these sequence-based non-RTX-motif features and further combined into a tri-layer stacking model, T1SEstacker, which predicted the RTX proteins accurately, with a fivefold cross-validated sensitivity of ∼0.89 at the specificity of ∼0.94. Besides substrates with RTX motifs, T1SEstacker can also well distinguish non-RTX-motif T1SEs, further suggesting their potential existence of common secretion signals. T1SEstacker was applied to predict T1SEs from the genomes of representative Salmonella strains, and we found that both the number and composition of T1SEs varied among strains. The number of T1SEs is estimated to reach 100 or more in each strain, much larger than what we expected. In summary, we made comprehensive sequence analysis on the type 1 secreted RTX proteins, identified common sequence-based features at the C termini, and developed a stacking model that can predict type 1 secreted proteins accurately. Frontiers Media S.A. 2022-02-08 /pmc/articles/PMC8861453/ /pubmed/35211101 http://dx.doi.org/10.3389/fmicb.2021.813094 Text en Copyright © 2022 Chen, Zhao, Hui, Zhang, Hu, Chen, Cai, Hu and Wang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Microbiology Chen, Zewei Zhao, Ziyi Hui, Xinjie Zhang, Junya Hu, Yixue Chen, Runhong Cai, Xuxia Hu, Yueming Wang, Yejun T1SEstacker: A Tri-Layer Stacking Model Effectively Predicts Bacterial Type 1 Secreted Proteins Based on C-Terminal Non-repeats-in-Toxin-Motif Sequence Features |
title | T1SEstacker: A Tri-Layer Stacking Model Effectively Predicts Bacterial Type 1 Secreted Proteins Based on C-Terminal Non-repeats-in-Toxin-Motif Sequence Features |
title_full | T1SEstacker: A Tri-Layer Stacking Model Effectively Predicts Bacterial Type 1 Secreted Proteins Based on C-Terminal Non-repeats-in-Toxin-Motif Sequence Features |
title_fullStr | T1SEstacker: A Tri-Layer Stacking Model Effectively Predicts Bacterial Type 1 Secreted Proteins Based on C-Terminal Non-repeats-in-Toxin-Motif Sequence Features |
title_full_unstemmed | T1SEstacker: A Tri-Layer Stacking Model Effectively Predicts Bacterial Type 1 Secreted Proteins Based on C-Terminal Non-repeats-in-Toxin-Motif Sequence Features |
title_short | T1SEstacker: A Tri-Layer Stacking Model Effectively Predicts Bacterial Type 1 Secreted Proteins Based on C-Terminal Non-repeats-in-Toxin-Motif Sequence Features |
title_sort | t1sestacker: a tri-layer stacking model effectively predicts bacterial type 1 secreted proteins based on c-terminal non-repeats-in-toxin-motif sequence features |
topic | Microbiology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8861453/ https://www.ncbi.nlm.nih.gov/pubmed/35211101 http://dx.doi.org/10.3389/fmicb.2021.813094 |
work_keys_str_mv | AT chenzewei t1sestackeratrilayerstackingmodeleffectivelypredictsbacterialtype1secretedproteinsbasedoncterminalnonrepeatsintoxinmotifsequencefeatures AT zhaoziyi t1sestackeratrilayerstackingmodeleffectivelypredictsbacterialtype1secretedproteinsbasedoncterminalnonrepeatsintoxinmotifsequencefeatures AT huixinjie t1sestackeratrilayerstackingmodeleffectivelypredictsbacterialtype1secretedproteinsbasedoncterminalnonrepeatsintoxinmotifsequencefeatures AT zhangjunya t1sestackeratrilayerstackingmodeleffectivelypredictsbacterialtype1secretedproteinsbasedoncterminalnonrepeatsintoxinmotifsequencefeatures AT huyixue t1sestackeratrilayerstackingmodeleffectivelypredictsbacterialtype1secretedproteinsbasedoncterminalnonrepeatsintoxinmotifsequencefeatures AT chenrunhong t1sestackeratrilayerstackingmodeleffectivelypredictsbacterialtype1secretedproteinsbasedoncterminalnonrepeatsintoxinmotifsequencefeatures AT caixuxia t1sestackeratrilayerstackingmodeleffectivelypredictsbacterialtype1secretedproteinsbasedoncterminalnonrepeatsintoxinmotifsequencefeatures AT huyueming t1sestackeratrilayerstackingmodeleffectivelypredictsbacterialtype1secretedproteinsbasedoncterminalnonrepeatsintoxinmotifsequencefeatures AT wangyejun t1sestackeratrilayerstackingmodeleffectivelypredictsbacterialtype1secretedproteinsbasedoncterminalnonrepeatsintoxinmotifsequencefeatures |