Cargando…

Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila

Type IV secretion systems exist in a number of bacterial pathogens and are used to secrete effector proteins directly into host cells in order to change their environment making the environment hospitable for the bacteria. In recent years, several machine learning algorithms have been developed to p...

Descripción completa

Detalles Bibliográficos
Autores principales: Esna Ashari, Zhila, Brayton, Kelly A., Broschat, Shira L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6347213/
https://www.ncbi.nlm.nih.gov/pubmed/30682021
http://dx.doi.org/10.1371/journal.pone.0202312
_version_ 1783389899605409792
author Esna Ashari, Zhila
Brayton, Kelly A.
Broschat, Shira L.
author_facet Esna Ashari, Zhila
Brayton, Kelly A.
Broschat, Shira L.
author_sort Esna Ashari, Zhila
collection PubMed
description Type IV secretion systems exist in a number of bacterial pathogens and are used to secrete effector proteins directly into host cells in order to change their environment making the environment hospitable for the bacteria. In recent years, several machine learning algorithms have been developed to predict effector proteins, potentially facilitating experimental verification. However, inconsistencies exist between their results. Previously we analysed the disparate sets of predictive features used in these algorithms to determine an optimal set of 370 features for effector prediction. This study focuses on the best way to use these optimal features by designing three machine learning classifiers, comparing our results with those of others, and obtaining de novo results. We chose the pathogen Legionella pneumophila strain Philadelphia-1, a cause of Legionnaires’ disease, because it has many validated effector proteins and others have developed machine learning prediction tools for it. While all of our models give good results indicating that our optimal features are quite robust, Model 1, which uses all 370 features with a support vector machine, has slightly better accuracy. Moreover, Model 1 predicted 472 effector proteins that are deemed highly probable to be effectors and include 94% of known effectors. Although the results of our three models agree well with those of other researchers, their models only predicted 126 and 311 candidate effectors.
format Online
Article
Text
id pubmed-6347213
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-63472132019-02-02 Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila Esna Ashari, Zhila Brayton, Kelly A. Broschat, Shira L. PLoS One Research Article Type IV secretion systems exist in a number of bacterial pathogens and are used to secrete effector proteins directly into host cells in order to change their environment making the environment hospitable for the bacteria. In recent years, several machine learning algorithms have been developed to predict effector proteins, potentially facilitating experimental verification. However, inconsistencies exist between their results. Previously we analysed the disparate sets of predictive features used in these algorithms to determine an optimal set of 370 features for effector prediction. This study focuses on the best way to use these optimal features by designing three machine learning classifiers, comparing our results with those of others, and obtaining de novo results. We chose the pathogen Legionella pneumophila strain Philadelphia-1, a cause of Legionnaires’ disease, because it has many validated effector proteins and others have developed machine learning prediction tools for it. While all of our models give good results indicating that our optimal features are quite robust, Model 1, which uses all 370 features with a support vector machine, has slightly better accuracy. Moreover, Model 1 predicted 472 effector proteins that are deemed highly probable to be effectors and include 94% of known effectors. Although the results of our three models agree well with those of other researchers, their models only predicted 126 and 311 candidate effectors. Public Library of Science 2019-01-25 /pmc/articles/PMC6347213/ /pubmed/30682021 http://dx.doi.org/10.1371/journal.pone.0202312 Text en © 2019 Esna Ashari et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Esna Ashari, Zhila
Brayton, Kelly A.
Broschat, Shira L.
Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila
title Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila
title_full Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila
title_fullStr Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila
title_full_unstemmed Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila
title_short Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila
title_sort using an optimal set of features with a machine learning-based approach to predict effector proteins for legionella pneumophila
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6347213/
https://www.ncbi.nlm.nih.gov/pubmed/30682021
http://dx.doi.org/10.1371/journal.pone.0202312
work_keys_str_mv AT esnaasharizhila usinganoptimalsetoffeatureswithamachinelearningbasedapproachtopredicteffectorproteinsforlegionellapneumophila
AT braytonkellya usinganoptimalsetoffeatureswithamachinelearningbasedapproachtopredicteffectorproteinsforlegionellapneumophila
AT broschatshiral usinganoptimalsetoffeatureswithamachinelearningbasedapproachtopredicteffectorproteinsforlegionellapneumophila