Cargando…

Analysis of the tryptic search space in UniProt databases

In this article, we provide a comprehensive study of the content of the Universal Protein Resource (UniProt) protein data sets for human and mouse. The tryptic search spaces of the UniProtKB (UniProt knowledgebase) complete proteome sets were compared with other data sets from UniProtKB and with the...

Descripción completa

Detalles Bibliográficos
Autores principales: Alpi, Emanuele, Griss, Johannes, da Silva, Alan Wilter Sousa, Bely, Benoit, Antunes, Ricardo, Zellner, Hermann, Ríos, Daniel, O'Donovan, Claire, Vizcaíno, Juan Antonio, Martin, Maria J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BlackWell Publishing Ltd 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4298651/
https://www.ncbi.nlm.nih.gov/pubmed/25307260
http://dx.doi.org/10.1002/pmic.201400227
_version_ 1782353272423579648
author Alpi, Emanuele
Griss, Johannes
da Silva, Alan Wilter Sousa
Bely, Benoit
Antunes, Ricardo
Zellner, Hermann
Ríos, Daniel
O'Donovan, Claire
Vizcaíno, Juan Antonio
Martin, Maria J
author_facet Alpi, Emanuele
Griss, Johannes
da Silva, Alan Wilter Sousa
Bely, Benoit
Antunes, Ricardo
Zellner, Hermann
Ríos, Daniel
O'Donovan, Claire
Vizcaíno, Juan Antonio
Martin, Maria J
author_sort Alpi, Emanuele
collection PubMed
description In this article, we provide a comprehensive study of the content of the Universal Protein Resource (UniProt) protein data sets for human and mouse. The tryptic search spaces of the UniProtKB (UniProt knowledgebase) complete proteome sets were compared with other data sets from UniProtKB and with the corresponding International Protein Index, reference sequence, Ensembl, and UniRef100 (where UniRef is UniProt reference clusters) organism-specific data sets. All protein forms annotated in UniProtKB (both the canonical sequences and isoforms) were evaluated in this study. In addition, natural and disease-associated amino acid variants annotated in UniProtKB were included in the evaluation. The peptide unicity was also evaluated for each data set. Furthermore, the peptide information in the UniProtKB data sets was also compared against the available peptide-level identifications in the main MS-based proteomics repositories. Identifying the peptides observed in these repositories is an important resource of information for protein databases as they provide supporting evidence for the existence of otherwise predicted proteins. Likewise, the repositories could use the information available in UniProtKB to direct reprocessing efforts on specific sets of peptides/proteins of interest. In summary, we provide comprehensive information about the different organism-specific sequence data sets available from UniProt, together with the pros and cons for each, in terms of search space for MS-based bottom-up proteomics workflows. The aim of the analysis is to provide a clear view of the tryptic search space of UniProt and other protein databases to enable scientists to select those most appropriate for their purposes.
format Online
Article
Text
id pubmed-4298651
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BlackWell Publishing Ltd
record_format MEDLINE/PubMed
spelling pubmed-42986512015-04-09 Analysis of the tryptic search space in UniProt databases Alpi, Emanuele Griss, Johannes da Silva, Alan Wilter Sousa Bely, Benoit Antunes, Ricardo Zellner, Hermann Ríos, Daniel O'Donovan, Claire Vizcaíno, Juan Antonio Martin, Maria J Proteomics Research Article In this article, we provide a comprehensive study of the content of the Universal Protein Resource (UniProt) protein data sets for human and mouse. The tryptic search spaces of the UniProtKB (UniProt knowledgebase) complete proteome sets were compared with other data sets from UniProtKB and with the corresponding International Protein Index, reference sequence, Ensembl, and UniRef100 (where UniRef is UniProt reference clusters) organism-specific data sets. All protein forms annotated in UniProtKB (both the canonical sequences and isoforms) were evaluated in this study. In addition, natural and disease-associated amino acid variants annotated in UniProtKB were included in the evaluation. The peptide unicity was also evaluated for each data set. Furthermore, the peptide information in the UniProtKB data sets was also compared against the available peptide-level identifications in the main MS-based proteomics repositories. Identifying the peptides observed in these repositories is an important resource of information for protein databases as they provide supporting evidence for the existence of otherwise predicted proteins. Likewise, the repositories could use the information available in UniProtKB to direct reprocessing efforts on specific sets of peptides/proteins of interest. In summary, we provide comprehensive information about the different organism-specific sequence data sets available from UniProt, together with the pros and cons for each, in terms of search space for MS-based bottom-up proteomics workflows. The aim of the analysis is to provide a clear view of the tryptic search space of UniProt and other protein databases to enable scientists to select those most appropriate for their purposes. BlackWell Publishing Ltd 2015-01 2014-12-03 /pmc/articles/PMC4298651/ /pubmed/25307260 http://dx.doi.org/10.1002/pmic.201400227 Text en The Authors. PROTEOMICS Published by Wiley-VCH Verlag GmbH & Co. KGaA. http://creativecommons.org/licenses/by/3.0/ This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Alpi, Emanuele
Griss, Johannes
da Silva, Alan Wilter Sousa
Bely, Benoit
Antunes, Ricardo
Zellner, Hermann
Ríos, Daniel
O'Donovan, Claire
Vizcaíno, Juan Antonio
Martin, Maria J
Analysis of the tryptic search space in UniProt databases
title Analysis of the tryptic search space in UniProt databases
title_full Analysis of the tryptic search space in UniProt databases
title_fullStr Analysis of the tryptic search space in UniProt databases
title_full_unstemmed Analysis of the tryptic search space in UniProt databases
title_short Analysis of the tryptic search space in UniProt databases
title_sort analysis of the tryptic search space in uniprot databases
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4298651/
https://www.ncbi.nlm.nih.gov/pubmed/25307260
http://dx.doi.org/10.1002/pmic.201400227
work_keys_str_mv AT alpiemanuele analysisofthetrypticsearchspaceinuniprotdatabases
AT grissjohannes analysisofthetrypticsearchspaceinuniprotdatabases
AT dasilvaalanwiltersousa analysisofthetrypticsearchspaceinuniprotdatabases
AT belybenoit analysisofthetrypticsearchspaceinuniprotdatabases
AT antunesricardo analysisofthetrypticsearchspaceinuniprotdatabases
AT zellnerhermann analysisofthetrypticsearchspaceinuniprotdatabases
AT riosdaniel analysisofthetrypticsearchspaceinuniprotdatabases
AT odonovanclaire analysisofthetrypticsearchspaceinuniprotdatabases
AT vizcainojuanantonio analysisofthetrypticsearchspaceinuniprotdatabases
AT martinmariaj analysisofthetrypticsearchspaceinuniprotdatabases