Cargando…

Computational identification of strain-, species- and genus-specific proteins

BACKGROUND: The identification of unique proteins at different taxonomic levels has both scientific and practical value. Strain-, species- and genus-specific proteins can provide insight into the criteria that define an organism and its relationship with close relatives. Such proteins can also serve...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mazumder, Raja, Natale, Darren A, Murthy, Sudhir, Thiagarajan, Rathi, Wu, Cathy H
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2005
Materias:	Database
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1310627/ https://www.ncbi.nlm.nih.gov/pubmed/16305751 http://dx.doi.org/10.1186/1471-2105-6-279

_version_	1782126318563885056
author	Mazumder, Raja Natale, Darren A Murthy, Sudhir Thiagarajan, Rathi Wu, Cathy H
author_facet	Mazumder, Raja Natale, Darren A Murthy, Sudhir Thiagarajan, Rathi Wu, Cathy H
author_sort	Mazumder, Raja
collection	PubMed
description	BACKGROUND: The identification of unique proteins at different taxonomic levels has both scientific and practical value. Strain-, species- and genus-specific proteins can provide insight into the criteria that define an organism and its relationship with close relatives. Such proteins can also serve as taxon-specific diagnostic targets. DESCRIPTION: A pipeline using a combination of computational and manual analyses of BLAST results was developed to identify strain-, species-, and genus-specific proteins and to catalog the closest sequenced relative for each protein in a proteome. Proteins encoded by a given strain are preliminarily considered to be unique if BLAST, using a comprehensive protein database, fails to retrieve (with an e-value better than 0.001) any protein not encoded by the query strain, species or genus (for strain-, species- and genus-specific proteins respectively), or if BLAST, using the best hit as the query (reverse BLAST), does not retrieve the initial query protein. Results are manually inspected for homology if the initial query is retrieved in the reverse BLAST but is not the best hit. Sequences unlikely to retrieve homologs using the default BLOSUM62 matrix (usually short sequences) are re-tested using the PAM30 matrix, thereby increasing the number of retrieved homologs and increasing the stringency of the search for unique proteins. The above protocol was used to examine several food- and water-borne pathogens. We find that the reverse BLAST step filters out about 22% of proteins with homologs that would otherwise be considered unique at the genus and species levels. Analysis of the annotations of unique proteins reveals that many are remnants of prophage proteins, or may be involved in virulence. The data generated from this study can be accessed and further evaluated from the CUPID (Core and Unique Protein Identification) system web site (updated semi-annually) at . CONCLUSION: CUPID provides a set of proteins specific to a genus, species or a strain, and identifies the most closely related organism.
format	Text
id	pubmed-1310627
institution	National Center for Biotechnology Information
language	English
publishDate	2005
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-13106272005-12-10 Computational identification of strain-, species- and genus-specific proteins Mazumder, Raja Natale, Darren A Murthy, Sudhir Thiagarajan, Rathi Wu, Cathy H BMC Bioinformatics Database BACKGROUND: The identification of unique proteins at different taxonomic levels has both scientific and practical value. Strain-, species- and genus-specific proteins can provide insight into the criteria that define an organism and its relationship with close relatives. Such proteins can also serve as taxon-specific diagnostic targets. DESCRIPTION: A pipeline using a combination of computational and manual analyses of BLAST results was developed to identify strain-, species-, and genus-specific proteins and to catalog the closest sequenced relative for each protein in a proteome. Proteins encoded by a given strain are preliminarily considered to be unique if BLAST, using a comprehensive protein database, fails to retrieve (with an e-value better than 0.001) any protein not encoded by the query strain, species or genus (for strain-, species- and genus-specific proteins respectively), or if BLAST, using the best hit as the query (reverse BLAST), does not retrieve the initial query protein. Results are manually inspected for homology if the initial query is retrieved in the reverse BLAST but is not the best hit. Sequences unlikely to retrieve homologs using the default BLOSUM62 matrix (usually short sequences) are re-tested using the PAM30 matrix, thereby increasing the number of retrieved homologs and increasing the stringency of the search for unique proteins. The above protocol was used to examine several food- and water-borne pathogens. We find that the reverse BLAST step filters out about 22% of proteins with homologs that would otherwise be considered unique at the genus and species levels. Analysis of the annotations of unique proteins reveals that many are remnants of prophage proteins, or may be involved in virulence. The data generated from this study can be accessed and further evaluated from the CUPID (Core and Unique Protein Identification) system web site (updated semi-annually) at . CONCLUSION: CUPID provides a set of proteins specific to a genus, species or a strain, and identifies the most closely related organism. BioMed Central 2005-11-23 /pmc/articles/PMC1310627/ /pubmed/16305751 http://dx.doi.org/10.1186/1471-2105-6-279 Text en Copyright © 2005 Mazumder et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Database Mazumder, Raja Natale, Darren A Murthy, Sudhir Thiagarajan, Rathi Wu, Cathy H Computational identification of strain-, species- and genus-specific proteins
title	Computational identification of strain-, species- and genus-specific proteins
title_full	Computational identification of strain-, species- and genus-specific proteins
title_fullStr	Computational identification of strain-, species- and genus-specific proteins
title_full_unstemmed	Computational identification of strain-, species- and genus-specific proteins
title_short	Computational identification of strain-, species- and genus-specific proteins
title_sort	computational identification of strain-, species- and genus-specific proteins
topic	Database
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1310627/ https://www.ncbi.nlm.nih.gov/pubmed/16305751 http://dx.doi.org/10.1186/1471-2105-6-279
work_keys_str_mv	AT mazumderraja computationalidentificationofstrainspeciesandgenusspecificproteins AT nataledarrena computationalidentificationofstrainspeciesandgenusspecificproteins AT murthysudhir computationalidentificationofstrainspeciesandgenusspecificproteins AT thiagarajanrathi computationalidentificationofstrainspeciesandgenusspecificproteins AT wucathyh computationalidentificationofstrainspeciesandgenusspecificproteins

Computational identification of strain-, species- and genus-specific proteins

Ejemplares similares