Cargando…

Genome-Scale Identification of Legionella pneumophila Effectors Using a Machine Learning Approach

A large number of highly pathogenic bacteria utilize secretion systems to translocate effector proteins into host cells. Using these effectors, the bacteria subvert host cell processes during infection. Legionella pneumophila translocates effectors via the Icm/Dot type-IV secretion system and to dat...

Descripción completa

Detalles Bibliográficos
Autores principales: Burstein, David, Zusman, Tal, Degtyar, Elena, Viner, Ram, Segal, Gil, Pupko, Tal
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2701608/
https://www.ncbi.nlm.nih.gov/pubmed/19593377
http://dx.doi.org/10.1371/journal.ppat.1000508
_version_ 1782168711104299008
author Burstein, David
Zusman, Tal
Degtyar, Elena
Viner, Ram
Segal, Gil
Pupko, Tal
author_facet Burstein, David
Zusman, Tal
Degtyar, Elena
Viner, Ram
Segal, Gil
Pupko, Tal
author_sort Burstein, David
collection PubMed
description A large number of highly pathogenic bacteria utilize secretion systems to translocate effector proteins into host cells. Using these effectors, the bacteria subvert host cell processes during infection. Legionella pneumophila translocates effectors via the Icm/Dot type-IV secretion system and to date, approximately 100 effectors have been identified by various experimental and computational techniques. Effector identification is a critical first step towards the understanding of the pathogenesis system in L. pneumophila as well as in other bacterial pathogens. Here, we formulate the task of effector identification as a classification problem: each L. pneumophila open reading frame (ORF) was classified as either effector or not. We computationally defined a set of features that best distinguish effectors from non-effectors. These features cover a wide range of characteristics including taxonomical dispersion, regulatory data, genomic organization, similarity to eukaryotic proteomes and more. Machine learning algorithms utilizing these features were then applied to classify all the ORFs within the L. pneumophila genome. Using this approach we were able to predict and experimentally validate 40 new effectors, reaching a success rate of above 90%. Increasing the number of validated effectors to around 140, we were able to gain novel insights into their characteristics. Effectors were found to have low G+C content, supporting the hypothesis that a large number of effectors originate via horizontal gene transfer, probably from their protozoan host. In addition, effectors were found to cluster in specific genomic regions. Finally, we were able to provide a novel description of the C-terminal translocation signal required for effector translocation by the Icm/Dot secretion system. To conclude, we have discovered 40 novel L. pneumophila effectors, predicted over a hundred additional highly probable effectors, and shown the applicability of machine learning algorithms for the identification and characterization of bacterial pathogenesis determinants.
format Text
id pubmed-2701608
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-27016082009-07-10 Genome-Scale Identification of Legionella pneumophila Effectors Using a Machine Learning Approach Burstein, David Zusman, Tal Degtyar, Elena Viner, Ram Segal, Gil Pupko, Tal PLoS Pathog Research Article A large number of highly pathogenic bacteria utilize secretion systems to translocate effector proteins into host cells. Using these effectors, the bacteria subvert host cell processes during infection. Legionella pneumophila translocates effectors via the Icm/Dot type-IV secretion system and to date, approximately 100 effectors have been identified by various experimental and computational techniques. Effector identification is a critical first step towards the understanding of the pathogenesis system in L. pneumophila as well as in other bacterial pathogens. Here, we formulate the task of effector identification as a classification problem: each L. pneumophila open reading frame (ORF) was classified as either effector or not. We computationally defined a set of features that best distinguish effectors from non-effectors. These features cover a wide range of characteristics including taxonomical dispersion, regulatory data, genomic organization, similarity to eukaryotic proteomes and more. Machine learning algorithms utilizing these features were then applied to classify all the ORFs within the L. pneumophila genome. Using this approach we were able to predict and experimentally validate 40 new effectors, reaching a success rate of above 90%. Increasing the number of validated effectors to around 140, we were able to gain novel insights into their characteristics. Effectors were found to have low G+C content, supporting the hypothesis that a large number of effectors originate via horizontal gene transfer, probably from their protozoan host. In addition, effectors were found to cluster in specific genomic regions. Finally, we were able to provide a novel description of the C-terminal translocation signal required for effector translocation by the Icm/Dot secretion system. To conclude, we have discovered 40 novel L. pneumophila effectors, predicted over a hundred additional highly probable effectors, and shown the applicability of machine learning algorithms for the identification and characterization of bacterial pathogenesis determinants. Public Library of Science 2009-07-10 /pmc/articles/PMC2701608/ /pubmed/19593377 http://dx.doi.org/10.1371/journal.ppat.1000508 Text en Burstein et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Burstein, David
Zusman, Tal
Degtyar, Elena
Viner, Ram
Segal, Gil
Pupko, Tal
Genome-Scale Identification of Legionella pneumophila Effectors Using a Machine Learning Approach
title Genome-Scale Identification of Legionella pneumophila Effectors Using a Machine Learning Approach
title_full Genome-Scale Identification of Legionella pneumophila Effectors Using a Machine Learning Approach
title_fullStr Genome-Scale Identification of Legionella pneumophila Effectors Using a Machine Learning Approach
title_full_unstemmed Genome-Scale Identification of Legionella pneumophila Effectors Using a Machine Learning Approach
title_short Genome-Scale Identification of Legionella pneumophila Effectors Using a Machine Learning Approach
title_sort genome-scale identification of legionella pneumophila effectors using a machine learning approach
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2701608/
https://www.ncbi.nlm.nih.gov/pubmed/19593377
http://dx.doi.org/10.1371/journal.ppat.1000508
work_keys_str_mv AT bursteindavid genomescaleidentificationoflegionellapneumophilaeffectorsusingamachinelearningapproach
AT zusmantal genomescaleidentificationoflegionellapneumophilaeffectorsusingamachinelearningapproach
AT degtyarelena genomescaleidentificationoflegionellapneumophilaeffectorsusingamachinelearningapproach
AT vinerram genomescaleidentificationoflegionellapneumophilaeffectorsusingamachinelearningapproach
AT segalgil genomescaleidentificationoflegionellapneumophilaeffectorsusingamachinelearningapproach
AT pupkotal genomescaleidentificationoflegionellapneumophilaeffectorsusingamachinelearningapproach