Cargando…

T1SEstacker: A Tri-Layer Stacking Model Effectively Predicts Bacterial Type 1 Secreted Proteins Based on C-Terminal Non-repeats-in-Toxin-Motif Sequence Features

Type 1 secretion systems play important roles in pathogenicity of Gram-negative bacteria. However, the substrate secretion mechanism remains largely unknown. In this research, we observed the sequence features of repeats-in-toxin (RTX) proteins, a major class of type 1 secreted effectors (T1SEs). We...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Zewei, Zhao, Ziyi, Hui, Xinjie, Zhang, Junya, Hu, Yixue, Chen, Runhong, Cai, Xuxia, Hu, Yueming, Wang, Yejun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8861453/
https://www.ncbi.nlm.nih.gov/pubmed/35211101
http://dx.doi.org/10.3389/fmicb.2021.813094
_version_ 1784654890277011456
author Chen, Zewei
Zhao, Ziyi
Hui, Xinjie
Zhang, Junya
Hu, Yixue
Chen, Runhong
Cai, Xuxia
Hu, Yueming
Wang, Yejun
author_facet Chen, Zewei
Zhao, Ziyi
Hui, Xinjie
Zhang, Junya
Hu, Yixue
Chen, Runhong
Cai, Xuxia
Hu, Yueming
Wang, Yejun
author_sort Chen, Zewei
collection PubMed
description Type 1 secretion systems play important roles in pathogenicity of Gram-negative bacteria. However, the substrate secretion mechanism remains largely unknown. In this research, we observed the sequence features of repeats-in-toxin (RTX) proteins, a major class of type 1 secreted effectors (T1SEs). We found striking non-RTX-motif amino acid composition patterns at the C termini, most typically exemplified by the enriched “[FLI][VAI]” at the most C-terminal two positions. Machine-learning models, including deep-learning ones, were trained using these sequence-based non-RTX-motif features and further combined into a tri-layer stacking model, T1SEstacker, which predicted the RTX proteins accurately, with a fivefold cross-validated sensitivity of ∼0.89 at the specificity of ∼0.94. Besides substrates with RTX motifs, T1SEstacker can also well distinguish non-RTX-motif T1SEs, further suggesting their potential existence of common secretion signals. T1SEstacker was applied to predict T1SEs from the genomes of representative Salmonella strains, and we found that both the number and composition of T1SEs varied among strains. The number of T1SEs is estimated to reach 100 or more in each strain, much larger than what we expected. In summary, we made comprehensive sequence analysis on the type 1 secreted RTX proteins, identified common sequence-based features at the C termini, and developed a stacking model that can predict type 1 secreted proteins accurately.
format Online
Article
Text
id pubmed-8861453
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-88614532022-02-23 T1SEstacker: A Tri-Layer Stacking Model Effectively Predicts Bacterial Type 1 Secreted Proteins Based on C-Terminal Non-repeats-in-Toxin-Motif Sequence Features Chen, Zewei Zhao, Ziyi Hui, Xinjie Zhang, Junya Hu, Yixue Chen, Runhong Cai, Xuxia Hu, Yueming Wang, Yejun Front Microbiol Microbiology Type 1 secretion systems play important roles in pathogenicity of Gram-negative bacteria. However, the substrate secretion mechanism remains largely unknown. In this research, we observed the sequence features of repeats-in-toxin (RTX) proteins, a major class of type 1 secreted effectors (T1SEs). We found striking non-RTX-motif amino acid composition patterns at the C termini, most typically exemplified by the enriched “[FLI][VAI]” at the most C-terminal two positions. Machine-learning models, including deep-learning ones, were trained using these sequence-based non-RTX-motif features and further combined into a tri-layer stacking model, T1SEstacker, which predicted the RTX proteins accurately, with a fivefold cross-validated sensitivity of ∼0.89 at the specificity of ∼0.94. Besides substrates with RTX motifs, T1SEstacker can also well distinguish non-RTX-motif T1SEs, further suggesting their potential existence of common secretion signals. T1SEstacker was applied to predict T1SEs from the genomes of representative Salmonella strains, and we found that both the number and composition of T1SEs varied among strains. The number of T1SEs is estimated to reach 100 or more in each strain, much larger than what we expected. In summary, we made comprehensive sequence analysis on the type 1 secreted RTX proteins, identified common sequence-based features at the C termini, and developed a stacking model that can predict type 1 secreted proteins accurately. Frontiers Media S.A. 2022-02-08 /pmc/articles/PMC8861453/ /pubmed/35211101 http://dx.doi.org/10.3389/fmicb.2021.813094 Text en Copyright © 2022 Chen, Zhao, Hui, Zhang, Hu, Chen, Cai, Hu and Wang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Chen, Zewei
Zhao, Ziyi
Hui, Xinjie
Zhang, Junya
Hu, Yixue
Chen, Runhong
Cai, Xuxia
Hu, Yueming
Wang, Yejun
T1SEstacker: A Tri-Layer Stacking Model Effectively Predicts Bacterial Type 1 Secreted Proteins Based on C-Terminal Non-repeats-in-Toxin-Motif Sequence Features
title T1SEstacker: A Tri-Layer Stacking Model Effectively Predicts Bacterial Type 1 Secreted Proteins Based on C-Terminal Non-repeats-in-Toxin-Motif Sequence Features
title_full T1SEstacker: A Tri-Layer Stacking Model Effectively Predicts Bacterial Type 1 Secreted Proteins Based on C-Terminal Non-repeats-in-Toxin-Motif Sequence Features
title_fullStr T1SEstacker: A Tri-Layer Stacking Model Effectively Predicts Bacterial Type 1 Secreted Proteins Based on C-Terminal Non-repeats-in-Toxin-Motif Sequence Features
title_full_unstemmed T1SEstacker: A Tri-Layer Stacking Model Effectively Predicts Bacterial Type 1 Secreted Proteins Based on C-Terminal Non-repeats-in-Toxin-Motif Sequence Features
title_short T1SEstacker: A Tri-Layer Stacking Model Effectively Predicts Bacterial Type 1 Secreted Proteins Based on C-Terminal Non-repeats-in-Toxin-Motif Sequence Features
title_sort t1sestacker: a tri-layer stacking model effectively predicts bacterial type 1 secreted proteins based on c-terminal non-repeats-in-toxin-motif sequence features
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8861453/
https://www.ncbi.nlm.nih.gov/pubmed/35211101
http://dx.doi.org/10.3389/fmicb.2021.813094
work_keys_str_mv AT chenzewei t1sestackeratrilayerstackingmodeleffectivelypredictsbacterialtype1secretedproteinsbasedoncterminalnonrepeatsintoxinmotifsequencefeatures
AT zhaoziyi t1sestackeratrilayerstackingmodeleffectivelypredictsbacterialtype1secretedproteinsbasedoncterminalnonrepeatsintoxinmotifsequencefeatures
AT huixinjie t1sestackeratrilayerstackingmodeleffectivelypredictsbacterialtype1secretedproteinsbasedoncterminalnonrepeatsintoxinmotifsequencefeatures
AT zhangjunya t1sestackeratrilayerstackingmodeleffectivelypredictsbacterialtype1secretedproteinsbasedoncterminalnonrepeatsintoxinmotifsequencefeatures
AT huyixue t1sestackeratrilayerstackingmodeleffectivelypredictsbacterialtype1secretedproteinsbasedoncterminalnonrepeatsintoxinmotifsequencefeatures
AT chenrunhong t1sestackeratrilayerstackingmodeleffectivelypredictsbacterialtype1secretedproteinsbasedoncterminalnonrepeatsintoxinmotifsequencefeatures
AT caixuxia t1sestackeratrilayerstackingmodeleffectivelypredictsbacterialtype1secretedproteinsbasedoncterminalnonrepeatsintoxinmotifsequencefeatures
AT huyueming t1sestackeratrilayerstackingmodeleffectivelypredictsbacterialtype1secretedproteinsbasedoncterminalnonrepeatsintoxinmotifsequencefeatures
AT wangyejun t1sestackeratrilayerstackingmodeleffectivelypredictsbacterialtype1secretedproteinsbasedoncterminalnonrepeatsintoxinmotifsequencefeatures