Cargando…

Representative Proteomes: A Stable, Scalable and Unbiased Proteome Set for Sequence Analysis and Functional Annotation

The accelerating growth in the number of protein sequences taxes both the computational and manual resources needed to analyze them. One approach to dealing with this problem is to minimize the number of proteins subjected to such analysis in a way that minimizes loss of information. To this end we...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Chuming, Natale, Darren A., Finn, Robert D., Huang, Hongzhan, Zhang, Jian, Wu, Cathy H., Mazumder, Raja
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3083393/ https://www.ncbi.nlm.nih.gov/pubmed/21556138 http://dx.doi.org/10.1371/journal.pone.0018910

_version_	1782202392305991680
author	Chen, Chuming Natale, Darren A. Finn, Robert D. Huang, Hongzhan Zhang, Jian Wu, Cathy H. Mazumder, Raja
author_facet	Chen, Chuming Natale, Darren A. Finn, Robert D. Huang, Hongzhan Zhang, Jian Wu, Cathy H. Mazumder, Raja
author_sort	Chen, Chuming
collection	PubMed
description	The accelerating growth in the number of protein sequences taxes both the computational and manual resources needed to analyze them. One approach to dealing with this problem is to minimize the number of proteins subjected to such analysis in a way that minimizes loss of information. To this end we have developed a set of Representative Proteomes (RPs), each selected from a Representative Proteome Group (RPG) containing similar proteomes calculated based on co-membership in UniRef50 clusters. A Representative Proteome is the proteome that can best represent all the proteomes in its group in terms of the majority of the sequence space and information. RPs at 75%, 55%, 35% and 15% co-membership threshold (CMT) are provided to allow users to decrease or increase the granularity of the sequence space based on their requirements. We find that a CMT of 55% (RP55) most closely follows standard taxonomic classifications. Further analysis of this set reveals that sequence space is reduced by more than 80% relative to UniProtKB, while retaining both sequence diversity (over 95% of InterPro domains) and annotation information (93% of experimentally characterized proteins). All sets can be browsed and are available for sequence similarity searches and download at http://www.proteininformationresource.org/rps, while the set of 637 RPs determined using a 55% CMT are also available for text searches. Potential applications include sequence similarity searches, protein classification and targeted protein annotation and characterization.
format	Text
id	pubmed-3083393
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-30833932011-05-09 Representative Proteomes: A Stable, Scalable and Unbiased Proteome Set for Sequence Analysis and Functional Annotation Chen, Chuming Natale, Darren A. Finn, Robert D. Huang, Hongzhan Zhang, Jian Wu, Cathy H. Mazumder, Raja PLoS One Research Article The accelerating growth in the number of protein sequences taxes both the computational and manual resources needed to analyze them. One approach to dealing with this problem is to minimize the number of proteins subjected to such analysis in a way that minimizes loss of information. To this end we have developed a set of Representative Proteomes (RPs), each selected from a Representative Proteome Group (RPG) containing similar proteomes calculated based on co-membership in UniRef50 clusters. A Representative Proteome is the proteome that can best represent all the proteomes in its group in terms of the majority of the sequence space and information. RPs at 75%, 55%, 35% and 15% co-membership threshold (CMT) are provided to allow users to decrease or increase the granularity of the sequence space based on their requirements. We find that a CMT of 55% (RP55) most closely follows standard taxonomic classifications. Further analysis of this set reveals that sequence space is reduced by more than 80% relative to UniProtKB, while retaining both sequence diversity (over 95% of InterPro domains) and annotation information (93% of experimentally characterized proteins). All sets can be browsed and are available for sequence similarity searches and download at http://www.proteininformationresource.org/rps, while the set of 637 RPs determined using a 55% CMT are also available for text searches. Potential applications include sequence similarity searches, protein classification and targeted protein annotation and characterization. Public Library of Science 2011-04-27 /pmc/articles/PMC3083393/ /pubmed/21556138 http://dx.doi.org/10.1371/journal.pone.0018910 Text en Chen et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Chen, Chuming Natale, Darren A. Finn, Robert D. Huang, Hongzhan Zhang, Jian Wu, Cathy H. Mazumder, Raja Representative Proteomes: A Stable, Scalable and Unbiased Proteome Set for Sequence Analysis and Functional Annotation
title	Representative Proteomes: A Stable, Scalable and Unbiased Proteome Set for Sequence Analysis and Functional Annotation
title_full	Representative Proteomes: A Stable, Scalable and Unbiased Proteome Set for Sequence Analysis and Functional Annotation
title_fullStr	Representative Proteomes: A Stable, Scalable and Unbiased Proteome Set for Sequence Analysis and Functional Annotation
title_full_unstemmed	Representative Proteomes: A Stable, Scalable and Unbiased Proteome Set for Sequence Analysis and Functional Annotation
title_short	Representative Proteomes: A Stable, Scalable and Unbiased Proteome Set for Sequence Analysis and Functional Annotation
title_sort	representative proteomes: a stable, scalable and unbiased proteome set for sequence analysis and functional annotation
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3083393/ https://www.ncbi.nlm.nih.gov/pubmed/21556138 http://dx.doi.org/10.1371/journal.pone.0018910
work_keys_str_mv	AT chenchuming representativeproteomesastablescalableandunbiasedproteomesetforsequenceanalysisandfunctionalannotation AT nataledarrena representativeproteomesastablescalableandunbiasedproteomesetforsequenceanalysisandfunctionalannotation AT finnrobertd representativeproteomesastablescalableandunbiasedproteomesetforsequenceanalysisandfunctionalannotation AT huanghongzhan representativeproteomesastablescalableandunbiasedproteomesetforsequenceanalysisandfunctionalannotation AT zhangjian representativeproteomesastablescalableandunbiasedproteomesetforsequenceanalysisandfunctionalannotation AT wucathyh representativeproteomesastablescalableandunbiasedproteomesetforsequenceanalysisandfunctionalannotation AT mazumderraja representativeproteomesastablescalableandunbiasedproteomesetforsequenceanalysisandfunctionalannotation

Representative Proteomes: A Stable, Scalable and Unbiased Proteome Set for Sequence Analysis and Functional Annotation

Ejemplares similares