Cargando…

Genome-wide protein localization prediction strategies for gram negative bacteria

BACKGROUND: Genome-wide prediction of protein subcellular localization is an important type of evidence used for inferring protein function. While a variety of computational tools have been developed for this purpose, errors in the gene models and use of protein sorting signals that are not recogniz...

Descripción completa

Detalles Bibliográficos
Autor principal:	Romine, Margaret F
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3223724/ https://www.ncbi.nlm.nih.gov/pubmed/21810203 http://dx.doi.org/10.1186/1471-2164-12-S1-S1

_version_	1782217314831171584
author	Romine, Margaret F
author_facet	Romine, Margaret F
author_sort	Romine, Margaret F
collection	PubMed
description	BACKGROUND: Genome-wide prediction of protein subcellular localization is an important type of evidence used for inferring protein function. While a variety of computational tools have been developed for this purpose, errors in the gene models and use of protein sorting signals that are not recognized by the more commonly accepted tools can diminish the accuracy of their output. RESULTS: As part of an effort to manually curate the annotations of 19 strains of Shewanella, numerous insights were gained regarding the use of computational tools and proteomics data to predict protein localization. Identification of the suite of secretion systems present in each strain at the start of the process made it possible to tailor-fit the subsequent localization prediction strategies to each strain for improved accuracy. Comparisons of the computational predictions among orthologous proteins revealed inconsistencies in the computational outputs, which could often be resolved by adjusting the gene models or ortholog group memberships. While proteomic data was useful for verifying start site predictions and post-translational proteolytic cleavage, care was needed to distinguish cellular versus sample processing-mediated cleavage events. Searches for lipoprotein signal peptides revealed that neither TatP nor LipoP are designed for identification of lipoprotein substrates of the twin arginine translocation system and that the +2 rule for lipoprotein sorting does not apply to this Genus. Analysis of the relationships between domain occurrence and protein localization prediction enabled identification of numerous location-informative domains which could then be used to refine or increase confidence in location predictions. This collective knowledge was used to develop a general strategy for predicting protein localization that could be adapted to other organisms. CONCLUSION: Improved localization prediction accuracy is not simply a matter of developing better computational algorithms. It also entails gathering key knowledge regarding the host architecture and translocation machinery and associated substrate recognition via experimentation and integration of diverse computational analyses from many proteins and, where possible, that are derived from different species within the same genus.
format	Online Article Text
id	pubmed-3223724
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-32237242011-11-26 Genome-wide protein localization prediction strategies for gram negative bacteria Romine, Margaret F BMC Genomics Research BACKGROUND: Genome-wide prediction of protein subcellular localization is an important type of evidence used for inferring protein function. While a variety of computational tools have been developed for this purpose, errors in the gene models and use of protein sorting signals that are not recognized by the more commonly accepted tools can diminish the accuracy of their output. RESULTS: As part of an effort to manually curate the annotations of 19 strains of Shewanella, numerous insights were gained regarding the use of computational tools and proteomics data to predict protein localization. Identification of the suite of secretion systems present in each strain at the start of the process made it possible to tailor-fit the subsequent localization prediction strategies to each strain for improved accuracy. Comparisons of the computational predictions among orthologous proteins revealed inconsistencies in the computational outputs, which could often be resolved by adjusting the gene models or ortholog group memberships. While proteomic data was useful for verifying start site predictions and post-translational proteolytic cleavage, care was needed to distinguish cellular versus sample processing-mediated cleavage events. Searches for lipoprotein signal peptides revealed that neither TatP nor LipoP are designed for identification of lipoprotein substrates of the twin arginine translocation system and that the +2 rule for lipoprotein sorting does not apply to this Genus. Analysis of the relationships between domain occurrence and protein localization prediction enabled identification of numerous location-informative domains which could then be used to refine or increase confidence in location predictions. This collective knowledge was used to develop a general strategy for predicting protein localization that could be adapted to other organisms. CONCLUSION: Improved localization prediction accuracy is not simply a matter of developing better computational algorithms. It also entails gathering key knowledge regarding the host architecture and translocation machinery and associated substrate recognition via experimentation and integration of diverse computational analyses from many proteins and, where possible, that are derived from different species within the same genus. BioMed Central 2011-06-15 /pmc/articles/PMC3223724/ /pubmed/21810203 http://dx.doi.org/10.1186/1471-2164-12-S1-S1 Text en Copyright ©2011 Romine; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Romine, Margaret F Genome-wide protein localization prediction strategies for gram negative bacteria
title	Genome-wide protein localization prediction strategies for gram negative bacteria
title_full	Genome-wide protein localization prediction strategies for gram negative bacteria
title_fullStr	Genome-wide protein localization prediction strategies for gram negative bacteria
title_full_unstemmed	Genome-wide protein localization prediction strategies for gram negative bacteria
title_short	Genome-wide protein localization prediction strategies for gram negative bacteria
title_sort	genome-wide protein localization prediction strategies for gram negative bacteria
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3223724/ https://www.ncbi.nlm.nih.gov/pubmed/21810203 http://dx.doi.org/10.1186/1471-2164-12-S1-S1
work_keys_str_mv	AT rominemargaretf genomewideproteinlocalizationpredictionstrategiesforgramnegativebacteria

Genome-wide protein localization prediction strategies for gram negative bacteria

Ejemplares similares