Cargando…
T3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors
Many Gram-negative bacteria infect hosts and cause diseases by translocating a variety of type III secreted effectors (T3SEs) into the host cell cytoplasm. However, despite a dramatic increase in the number of available whole-genome sequences, it remains challenging for accurate prediction of T3SEs....
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Society for Microbiology
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7406222/ https://www.ncbi.nlm.nih.gov/pubmed/32753503 http://dx.doi.org/10.1128/mSystems.00288-20 |
_version_ | 1783567391394889728 |
---|---|
author | Hui, Xinjie Chen, Zewei Lin, Mingxiong Zhang, Junya Hu, Yueming Zeng, Yingying Cheng, Xi Ou-Yang, Le Sun, Ming-an White, Aaron P. Wang, Yejun |
author_facet | Hui, Xinjie Chen, Zewei Lin, Mingxiong Zhang, Junya Hu, Yueming Zeng, Yingying Cheng, Xi Ou-Yang, Le Sun, Ming-an White, Aaron P. Wang, Yejun |
author_sort | Hui, Xinjie |
collection | PubMed |
description | Many Gram-negative bacteria infect hosts and cause diseases by translocating a variety of type III secreted effectors (T3SEs) into the host cell cytoplasm. However, despite a dramatic increase in the number of available whole-genome sequences, it remains challenging for accurate prediction of T3SEs. Traditional prediction models have focused on atypical sequence features buried in the N-terminal peptides of T3SEs, but unfortunately, these models have had high false-positive rates. In this research, we integrated promoter information along with characteristic protein features for signal regions, chaperone-binding domains, and effector domains for T3SE prediction. Machine learning algorithms, including deep learning, were adopted to predict the atypical features mainly buried in signal sequences of T3SEs, followed by development of a voting-based ensemble model integrating the individual prediction results. We assembled this into a unified T3SE prediction pipeline, T3SEpp, which integrated the results of individual modules, resulting in high accuracy (i.e., ∼0.94) and >1-fold reduction in the false-positive rate compared to that of state-of-the-art software tools. The T3SEpp pipeline and sequence features observed here will facilitate the accurate identification of new T3SEs, with numerous benefits for future studies on host-pathogen interactions. IMPORTANCE Type III secreted effector (T3SE) prediction remains a big computational challenge. In practical applications, current software tools often suffer problems of high false-positive rates. One of the causal factors could be the relatively unitary type of biological features used for the design and training of the models. In this research, we made a comprehensive survey on the sequence-based features of T3SEs, including signal sequences, chaperone-binding domains, effector domains, and transcription factor binding promoter sites, and assembled a unified prediction pipeline integrating multi-aspect biological features within homology-based and multiple machine learning models. To our knowledge, we have compiled the most comprehensive biological sequence feature analysis for T3SEs in this research. The T3SEpp pipeline integrating the variety of features and assembling different models showed high accuracy, which should facilitate more accurate identification of T3SEs in new and existing bacterial whole-genome sequences. |
format | Online Article Text |
id | pubmed-7406222 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | American Society for Microbiology |
record_format | MEDLINE/PubMed |
spelling | pubmed-74062222020-08-11 T3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors Hui, Xinjie Chen, Zewei Lin, Mingxiong Zhang, Junya Hu, Yueming Zeng, Yingying Cheng, Xi Ou-Yang, Le Sun, Ming-an White, Aaron P. Wang, Yejun mSystems Research Article Many Gram-negative bacteria infect hosts and cause diseases by translocating a variety of type III secreted effectors (T3SEs) into the host cell cytoplasm. However, despite a dramatic increase in the number of available whole-genome sequences, it remains challenging for accurate prediction of T3SEs. Traditional prediction models have focused on atypical sequence features buried in the N-terminal peptides of T3SEs, but unfortunately, these models have had high false-positive rates. In this research, we integrated promoter information along with characteristic protein features for signal regions, chaperone-binding domains, and effector domains for T3SE prediction. Machine learning algorithms, including deep learning, were adopted to predict the atypical features mainly buried in signal sequences of T3SEs, followed by development of a voting-based ensemble model integrating the individual prediction results. We assembled this into a unified T3SE prediction pipeline, T3SEpp, which integrated the results of individual modules, resulting in high accuracy (i.e., ∼0.94) and >1-fold reduction in the false-positive rate compared to that of state-of-the-art software tools. The T3SEpp pipeline and sequence features observed here will facilitate the accurate identification of new T3SEs, with numerous benefits for future studies on host-pathogen interactions. IMPORTANCE Type III secreted effector (T3SE) prediction remains a big computational challenge. In practical applications, current software tools often suffer problems of high false-positive rates. One of the causal factors could be the relatively unitary type of biological features used for the design and training of the models. In this research, we made a comprehensive survey on the sequence-based features of T3SEs, including signal sequences, chaperone-binding domains, effector domains, and transcription factor binding promoter sites, and assembled a unified prediction pipeline integrating multi-aspect biological features within homology-based and multiple machine learning models. To our knowledge, we have compiled the most comprehensive biological sequence feature analysis for T3SEs in this research. The T3SEpp pipeline integrating the variety of features and assembling different models showed high accuracy, which should facilitate more accurate identification of T3SEs in new and existing bacterial whole-genome sequences. American Society for Microbiology 2020-08-04 /pmc/articles/PMC7406222/ /pubmed/32753503 http://dx.doi.org/10.1128/mSystems.00288-20 Text en Copyright © 2020 Hui et al. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Research Article Hui, Xinjie Chen, Zewei Lin, Mingxiong Zhang, Junya Hu, Yueming Zeng, Yingying Cheng, Xi Ou-Yang, Le Sun, Ming-an White, Aaron P. Wang, Yejun T3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors |
title | T3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors |
title_full | T3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors |
title_fullStr | T3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors |
title_full_unstemmed | T3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors |
title_short | T3SEpp: an Integrated Prediction Pipeline for Bacterial Type III Secreted Effectors |
title_sort | t3sepp: an integrated prediction pipeline for bacterial type iii secreted effectors |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7406222/ https://www.ncbi.nlm.nih.gov/pubmed/32753503 http://dx.doi.org/10.1128/mSystems.00288-20 |
work_keys_str_mv | AT huixinjie t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors AT chenzewei t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors AT linmingxiong t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors AT zhangjunya t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors AT huyueming t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors AT zengyingying t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors AT chengxi t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors AT ouyangle t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors AT sunmingan t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors AT whiteaaronp t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors AT wangyejun t3seppanintegratedpredictionpipelineforbacterialtypeiiisecretedeffectors |