Cargando…
Uniclust databases of clustered and deeply annotated protein sequences and alignments
We present three clustered protein sequence databases, Uniclust90, Uniclust50, Uniclust30 and three databases of multiple sequence alignments (MSAs), Uniboost10, Uniboost20 and Uniboost30, as a resource for protein sequence analysis, function prediction and sequence searches. The Uniclust databases...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5614098/ https://www.ncbi.nlm.nih.gov/pubmed/27899574 http://dx.doi.org/10.1093/nar/gkw1081 |
_version_ | 1783266361959514112 |
---|---|
author | Mirdita, Milot von den Driesch, Lars Galiez, Clovis Martin, Maria J. Söding, Johannes Steinegger, Martin |
author_facet | Mirdita, Milot von den Driesch, Lars Galiez, Clovis Martin, Maria J. Söding, Johannes Steinegger, Martin |
author_sort | Mirdita, Milot |
collection | PubMed |
description | We present three clustered protein sequence databases, Uniclust90, Uniclust50, Uniclust30 and three databases of multiple sequence alignments (MSAs), Uniboost10, Uniboost20 and Uniboost30, as a resource for protein sequence analysis, function prediction and sequence searches. The Uniclust databases cluster UniProtKB sequences at the level of 90%, 50% and 30% pairwise sequence identity. Uniclust90 and Uniclust50 clusters showed better consistency of functional annotation than those of UniRef90 and UniRef50, owing to an optimised clustering pipeline that runs with our MMseqs2 software for fast and sensitive protein sequence searching and clustering. Uniclust sequences are annotated with matches to Pfam, SCOP domains, and proteins in the PDB, using our HHblits homology detection tool. Due to its high sensitivity, Uniclust contains 17% more Pfam domain annotations than UniProt. Uniboost MSAs of three diversities are built by enriching the Uniclust30 MSAs with local sequence matches from MMseqs2 profile searches through Uniclust30. All databases can be downloaded from the Uniclust server at uniclust.mmseqs.com. Users can search clusters by keywords and explore their MSAs, taxonomic representation, and annotations. Uniclust is updated every two months with the new UniProt release. |
format | Online Article Text |
id | pubmed-5614098 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-56140982017-09-29 Uniclust databases of clustered and deeply annotated protein sequences and alignments Mirdita, Milot von den Driesch, Lars Galiez, Clovis Martin, Maria J. Söding, Johannes Steinegger, Martin Nucleic Acids Res Database Issue We present three clustered protein sequence databases, Uniclust90, Uniclust50, Uniclust30 and three databases of multiple sequence alignments (MSAs), Uniboost10, Uniboost20 and Uniboost30, as a resource for protein sequence analysis, function prediction and sequence searches. The Uniclust databases cluster UniProtKB sequences at the level of 90%, 50% and 30% pairwise sequence identity. Uniclust90 and Uniclust50 clusters showed better consistency of functional annotation than those of UniRef90 and UniRef50, owing to an optimised clustering pipeline that runs with our MMseqs2 software for fast and sensitive protein sequence searching and clustering. Uniclust sequences are annotated with matches to Pfam, SCOP domains, and proteins in the PDB, using our HHblits homology detection tool. Due to its high sensitivity, Uniclust contains 17% more Pfam domain annotations than UniProt. Uniboost MSAs of three diversities are built by enriching the Uniclust30 MSAs with local sequence matches from MMseqs2 profile searches through Uniclust30. All databases can be downloaded from the Uniclust server at uniclust.mmseqs.com. Users can search clusters by keywords and explore their MSAs, taxonomic representation, and annotations. Uniclust is updated every two months with the new UniProt release. Oxford University Press 2017-01-04 2016-11-29 /pmc/articles/PMC5614098/ /pubmed/27899574 http://dx.doi.org/10.1093/nar/gkw1081 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Database Issue Mirdita, Milot von den Driesch, Lars Galiez, Clovis Martin, Maria J. Söding, Johannes Steinegger, Martin Uniclust databases of clustered and deeply annotated protein sequences and alignments |
title | Uniclust databases of clustered and deeply annotated protein sequences and alignments |
title_full | Uniclust databases of clustered and deeply annotated protein sequences and alignments |
title_fullStr | Uniclust databases of clustered and deeply annotated protein sequences and alignments |
title_full_unstemmed | Uniclust databases of clustered and deeply annotated protein sequences and alignments |
title_short | Uniclust databases of clustered and deeply annotated protein sequences and alignments |
title_sort | uniclust databases of clustered and deeply annotated protein sequences and alignments |
topic | Database Issue |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5614098/ https://www.ncbi.nlm.nih.gov/pubmed/27899574 http://dx.doi.org/10.1093/nar/gkw1081 |
work_keys_str_mv | AT mirditamilot uniclustdatabasesofclusteredanddeeplyannotatedproteinsequencesandalignments AT vondendrieschlars uniclustdatabasesofclusteredanddeeplyannotatedproteinsequencesandalignments AT galiezclovis uniclustdatabasesofclusteredanddeeplyannotatedproteinsequencesandalignments AT martinmariaj uniclustdatabasesofclusteredanddeeplyannotatedproteinsequencesandalignments AT sodingjohannes uniclustdatabasesofclusteredanddeeplyannotatedproteinsequencesandalignments AT steineggermartin uniclustdatabasesofclusteredanddeeplyannotatedproteinsequencesandalignments |