Cargando…

UniProt: the universal protein knowledgebase

The UniProt knowledgebase is a large resource of protein sequences and associated detailed annotation. The database contains over 60 million sequences, of which over half a million sequences have been curated by experts who critically review experimental and predicted data for each protein. The rema...

Descripción completa

Detalles Bibliográficos
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210571/
https://www.ncbi.nlm.nih.gov/pubmed/27899622
http://dx.doi.org/10.1093/nar/gkw1099
_version_ 1782490910362173440
collection PubMed
description The UniProt knowledgebase is a large resource of protein sequences and associated detailed annotation. The database contains over 60 million sequences, of which over half a million sequences have been curated by experts who critically review experimental and predicted data for each protein. The remainder are automatically annotated based on rule systems that rely on the expert curated knowledge. Since our last update in 2014, we have more than doubled the number of reference proteomes to 5631, giving a greater coverage of taxonomic diversity. We implemented a pipeline to remove redundant highly similar proteomes that were causing excessive redundancy in UniProt. The initial run of this pipeline reduced the number of sequences in UniProt by 47 million. For our users interested in the accessory proteomes, we have made available sets of pan proteome sequences that cover the diversity of sequences for each species that is found in its strains and sub-strains. To help interpretation of genomic variants, we provide tracks of detailed protein information for the major genome browsers. We provide a SPARQL endpoint that allows complex queries of the more than 22 billion triples of data in UniProt (http://sparql.uniprot.org/). UniProt resources can be accessed via the website at http://www.uniprot.org/.
format Online
Article
Text
id pubmed-5210571
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-52105712017-01-05 UniProt: the universal protein knowledgebase Nucleic Acids Res Database Issue The UniProt knowledgebase is a large resource of protein sequences and associated detailed annotation. The database contains over 60 million sequences, of which over half a million sequences have been curated by experts who critically review experimental and predicted data for each protein. The remainder are automatically annotated based on rule systems that rely on the expert curated knowledge. Since our last update in 2014, we have more than doubled the number of reference proteomes to 5631, giving a greater coverage of taxonomic diversity. We implemented a pipeline to remove redundant highly similar proteomes that were causing excessive redundancy in UniProt. The initial run of this pipeline reduced the number of sequences in UniProt by 47 million. For our users interested in the accessory proteomes, we have made available sets of pan proteome sequences that cover the diversity of sequences for each species that is found in its strains and sub-strains. To help interpretation of genomic variants, we provide tracks of detailed protein information for the major genome browsers. We provide a SPARQL endpoint that allows complex queries of the more than 22 billion triples of data in UniProt (http://sparql.uniprot.org/). UniProt resources can be accessed via the website at http://www.uniprot.org/. Oxford University Press 2017-01-04 2016-11-28 /pmc/articles/PMC5210571/ /pubmed/27899622 http://dx.doi.org/10.1093/nar/gkw1099 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Database Issue
UniProt: the universal protein knowledgebase
title UniProt: the universal protein knowledgebase
title_full UniProt: the universal protein knowledgebase
title_fullStr UniProt: the universal protein knowledgebase
title_full_unstemmed UniProt: the universal protein knowledgebase
title_short UniProt: the universal protein knowledgebase
title_sort uniprot: the universal protein knowledgebase
topic Database Issue
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210571/
https://www.ncbi.nlm.nih.gov/pubmed/27899622
http://dx.doi.org/10.1093/nar/gkw1099
work_keys_str_mv AT uniprottheuniversalproteinknowledgebase