Cargando…
An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach
Type IV secretion systems (T4SS) are multi-protein complexes in a number of bacterial pathogens that can translocate proteins and DNA to the host. Most T4SSs function in conjugation and translocate DNA; however, approximately 13% function to secrete proteins, delivering effector proteins into the cy...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5942808/ https://www.ncbi.nlm.nih.gov/pubmed/29742157 http://dx.doi.org/10.1371/journal.pone.0197041 |
_version_ | 1783321521843863552 |
---|---|
author | Esna Ashari, Zhila Dasgupta, Nairanjana Brayton, Kelly A. Broschat, Shira L. |
author_facet | Esna Ashari, Zhila Dasgupta, Nairanjana Brayton, Kelly A. Broschat, Shira L. |
author_sort | Esna Ashari, Zhila |
collection | PubMed |
description | Type IV secretion systems (T4SS) are multi-protein complexes in a number of bacterial pathogens that can translocate proteins and DNA to the host. Most T4SSs function in conjugation and translocate DNA; however, approximately 13% function to secrete proteins, delivering effector proteins into the cytosol of eukaryotic host cells. Upon entry, these effectors manipulate the host cell’s machinery for their own benefit, which can result in serious illness or death of the host. For this reason recognition of T4SS effectors has become an important subject. Much previous work has focused on verifying effectors experimentally, a costly endeavor in terms of money, time, and effort. Having good predictions for effectors will help to focus experimental validations and decrease testing costs. In recent years, several scoring and machine learning-based methods have been suggested for the purpose of predicting T4SS effector proteins. These methods have used different sets of features for prediction, and their predictions have been inconsistent. In this paper, an optimal set of features is presented for predicting T4SS effector proteins using a statistical approach. A thorough literature search was performed to find features that have been proposed. Feature values were calculated for datasets of known effectors and non-effectors for T4SS-containing pathogens for four genera with a sufficient number of known effectors, Legionella pneumophila, Coxiella burnetii, Brucella spp, and Bartonella spp. The features were ranked, and less important features were filtered out. Correlations between remaining features were removed, and dimensional reduction was accomplished using principal component analysis and factor analysis. Finally, the optimal features for each pathogen were chosen by building logistic regression models and evaluating each model. The results based on evaluation of our logistic regression models confirm the effectiveness of our four optimal sets of features, and based on these an optimal set of features is proposed for all T4SS effector proteins. |
format | Online Article Text |
id | pubmed-5942808 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-59428082018-05-18 An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach Esna Ashari, Zhila Dasgupta, Nairanjana Brayton, Kelly A. Broschat, Shira L. PLoS One Research Article Type IV secretion systems (T4SS) are multi-protein complexes in a number of bacterial pathogens that can translocate proteins and DNA to the host. Most T4SSs function in conjugation and translocate DNA; however, approximately 13% function to secrete proteins, delivering effector proteins into the cytosol of eukaryotic host cells. Upon entry, these effectors manipulate the host cell’s machinery for their own benefit, which can result in serious illness or death of the host. For this reason recognition of T4SS effectors has become an important subject. Much previous work has focused on verifying effectors experimentally, a costly endeavor in terms of money, time, and effort. Having good predictions for effectors will help to focus experimental validations and decrease testing costs. In recent years, several scoring and machine learning-based methods have been suggested for the purpose of predicting T4SS effector proteins. These methods have used different sets of features for prediction, and their predictions have been inconsistent. In this paper, an optimal set of features is presented for predicting T4SS effector proteins using a statistical approach. A thorough literature search was performed to find features that have been proposed. Feature values were calculated for datasets of known effectors and non-effectors for T4SS-containing pathogens for four genera with a sufficient number of known effectors, Legionella pneumophila, Coxiella burnetii, Brucella spp, and Bartonella spp. The features were ranked, and less important features were filtered out. Correlations between remaining features were removed, and dimensional reduction was accomplished using principal component analysis and factor analysis. Finally, the optimal features for each pathogen were chosen by building logistic regression models and evaluating each model. The results based on evaluation of our logistic regression models confirm the effectiveness of our four optimal sets of features, and based on these an optimal set of features is proposed for all T4SS effector proteins. Public Library of Science 2018-05-09 /pmc/articles/PMC5942808/ /pubmed/29742157 http://dx.doi.org/10.1371/journal.pone.0197041 Text en © 2018 Esna Ashari et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Esna Ashari, Zhila Dasgupta, Nairanjana Brayton, Kelly A. Broschat, Shira L. An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach |
title | An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach |
title_full | An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach |
title_fullStr | An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach |
title_full_unstemmed | An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach |
title_short | An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach |
title_sort | optimal set of features for predicting type iv secretion system effector proteins for a subset of species based on a multi-level feature selection approach |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5942808/ https://www.ncbi.nlm.nih.gov/pubmed/29742157 http://dx.doi.org/10.1371/journal.pone.0197041 |
work_keys_str_mv | AT esnaasharizhila anoptimalsetoffeaturesforpredictingtypeivsecretionsystemeffectorproteinsforasubsetofspeciesbasedonamultilevelfeatureselectionapproach AT dasguptanairanjana anoptimalsetoffeaturesforpredictingtypeivsecretionsystemeffectorproteinsforasubsetofspeciesbasedonamultilevelfeatureselectionapproach AT braytonkellya anoptimalsetoffeaturesforpredictingtypeivsecretionsystemeffectorproteinsforasubsetofspeciesbasedonamultilevelfeatureselectionapproach AT broschatshiral anoptimalsetoffeaturesforpredictingtypeivsecretionsystemeffectorproteinsforasubsetofspeciesbasedonamultilevelfeatureselectionapproach AT esnaasharizhila optimalsetoffeaturesforpredictingtypeivsecretionsystemeffectorproteinsforasubsetofspeciesbasedonamultilevelfeatureselectionapproach AT dasguptanairanjana optimalsetoffeaturesforpredictingtypeivsecretionsystemeffectorproteinsforasubsetofspeciesbasedonamultilevelfeatureselectionapproach AT braytonkellya optimalsetoffeaturesforpredictingtypeivsecretionsystemeffectorproteinsforasubsetofspeciesbasedonamultilevelfeatureselectionapproach AT broschatshiral optimalsetoffeaturesforpredictingtypeivsecretionsystemeffectorproteinsforasubsetofspeciesbasedonamultilevelfeatureselectionapproach |