Cargando…

An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach

Type IV secretion systems (T4SS) are multi-protein complexes in a number of bacterial pathogens that can translocate proteins and DNA to the host. Most T4SSs function in conjugation and translocate DNA; however, approximately 13% function to secrete proteins, delivering effector proteins into the cy...

Descripción completa

Detalles Bibliográficos
Autores principales: Esna Ashari, Zhila, Dasgupta, Nairanjana, Brayton, Kelly A., Broschat, Shira L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5942808/
https://www.ncbi.nlm.nih.gov/pubmed/29742157
http://dx.doi.org/10.1371/journal.pone.0197041
_version_ 1783321521843863552
author Esna Ashari, Zhila
Dasgupta, Nairanjana
Brayton, Kelly A.
Broschat, Shira L.
author_facet Esna Ashari, Zhila
Dasgupta, Nairanjana
Brayton, Kelly A.
Broschat, Shira L.
author_sort Esna Ashari, Zhila
collection PubMed
description Type IV secretion systems (T4SS) are multi-protein complexes in a number of bacterial pathogens that can translocate proteins and DNA to the host. Most T4SSs function in conjugation and translocate DNA; however, approximately 13% function to secrete proteins, delivering effector proteins into the cytosol of eukaryotic host cells. Upon entry, these effectors manipulate the host cell’s machinery for their own benefit, which can result in serious illness or death of the host. For this reason recognition of T4SS effectors has become an important subject. Much previous work has focused on verifying effectors experimentally, a costly endeavor in terms of money, time, and effort. Having good predictions for effectors will help to focus experimental validations and decrease testing costs. In recent years, several scoring and machine learning-based methods have been suggested for the purpose of predicting T4SS effector proteins. These methods have used different sets of features for prediction, and their predictions have been inconsistent. In this paper, an optimal set of features is presented for predicting T4SS effector proteins using a statistical approach. A thorough literature search was performed to find features that have been proposed. Feature values were calculated for datasets of known effectors and non-effectors for T4SS-containing pathogens for four genera with a sufficient number of known effectors, Legionella pneumophila, Coxiella burnetii, Brucella spp, and Bartonella spp. The features were ranked, and less important features were filtered out. Correlations between remaining features were removed, and dimensional reduction was accomplished using principal component analysis and factor analysis. Finally, the optimal features for each pathogen were chosen by building logistic regression models and evaluating each model. The results based on evaluation of our logistic regression models confirm the effectiveness of our four optimal sets of features, and based on these an optimal set of features is proposed for all T4SS effector proteins.
format Online
Article
Text
id pubmed-5942808
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-59428082018-05-18 An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach Esna Ashari, Zhila Dasgupta, Nairanjana Brayton, Kelly A. Broschat, Shira L. PLoS One Research Article Type IV secretion systems (T4SS) are multi-protein complexes in a number of bacterial pathogens that can translocate proteins and DNA to the host. Most T4SSs function in conjugation and translocate DNA; however, approximately 13% function to secrete proteins, delivering effector proteins into the cytosol of eukaryotic host cells. Upon entry, these effectors manipulate the host cell’s machinery for their own benefit, which can result in serious illness or death of the host. For this reason recognition of T4SS effectors has become an important subject. Much previous work has focused on verifying effectors experimentally, a costly endeavor in terms of money, time, and effort. Having good predictions for effectors will help to focus experimental validations and decrease testing costs. In recent years, several scoring and machine learning-based methods have been suggested for the purpose of predicting T4SS effector proteins. These methods have used different sets of features for prediction, and their predictions have been inconsistent. In this paper, an optimal set of features is presented for predicting T4SS effector proteins using a statistical approach. A thorough literature search was performed to find features that have been proposed. Feature values were calculated for datasets of known effectors and non-effectors for T4SS-containing pathogens for four genera with a sufficient number of known effectors, Legionella pneumophila, Coxiella burnetii, Brucella spp, and Bartonella spp. The features were ranked, and less important features were filtered out. Correlations between remaining features were removed, and dimensional reduction was accomplished using principal component analysis and factor analysis. Finally, the optimal features for each pathogen were chosen by building logistic regression models and evaluating each model. The results based on evaluation of our logistic regression models confirm the effectiveness of our four optimal sets of features, and based on these an optimal set of features is proposed for all T4SS effector proteins. Public Library of Science 2018-05-09 /pmc/articles/PMC5942808/ /pubmed/29742157 http://dx.doi.org/10.1371/journal.pone.0197041 Text en © 2018 Esna Ashari et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Esna Ashari, Zhila
Dasgupta, Nairanjana
Brayton, Kelly A.
Broschat, Shira L.
An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach
title An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach
title_full An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach
title_fullStr An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach
title_full_unstemmed An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach
title_short An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach
title_sort optimal set of features for predicting type iv secretion system effector proteins for a subset of species based on a multi-level feature selection approach
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5942808/
https://www.ncbi.nlm.nih.gov/pubmed/29742157
http://dx.doi.org/10.1371/journal.pone.0197041
work_keys_str_mv AT esnaasharizhila anoptimalsetoffeaturesforpredictingtypeivsecretionsystemeffectorproteinsforasubsetofspeciesbasedonamultilevelfeatureselectionapproach
AT dasguptanairanjana anoptimalsetoffeaturesforpredictingtypeivsecretionsystemeffectorproteinsforasubsetofspeciesbasedonamultilevelfeatureselectionapproach
AT braytonkellya anoptimalsetoffeaturesforpredictingtypeivsecretionsystemeffectorproteinsforasubsetofspeciesbasedonamultilevelfeatureselectionapproach
AT broschatshiral anoptimalsetoffeaturesforpredictingtypeivsecretionsystemeffectorproteinsforasubsetofspeciesbasedonamultilevelfeatureselectionapproach
AT esnaasharizhila optimalsetoffeaturesforpredictingtypeivsecretionsystemeffectorproteinsforasubsetofspeciesbasedonamultilevelfeatureselectionapproach
AT dasguptanairanjana optimalsetoffeaturesforpredictingtypeivsecretionsystemeffectorproteinsforasubsetofspeciesbasedonamultilevelfeatureselectionapproach
AT braytonkellya optimalsetoffeaturesforpredictingtypeivsecretionsystemeffectorproteinsforasubsetofspeciesbasedonamultilevelfeatureselectionapproach
AT broschatshiral optimalsetoffeaturesforpredictingtypeivsecretionsystemeffectorproteinsforasubsetofspeciesbasedonamultilevelfeatureselectionapproach