Cargando…

Uniclust databases of clustered and deeply annotated protein sequences and alignments

We present three clustered protein sequence databases, Uniclust90, Uniclust50, Uniclust30 and three databases of multiple sequence alignments (MSAs), Uniboost10, Uniboost20 and Uniboost30, as a resource for protein sequence analysis, function prediction and sequence searches. The Uniclust databases...

Descripción completa

Detalles Bibliográficos
Autores principales: Mirdita, Milot, von den Driesch, Lars, Galiez, Clovis, Martin, Maria J., Söding, Johannes, Steinegger, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5614098/
https://www.ncbi.nlm.nih.gov/pubmed/27899574
http://dx.doi.org/10.1093/nar/gkw1081
_version_ 1783266361959514112
author Mirdita, Milot
von den Driesch, Lars
Galiez, Clovis
Martin, Maria J.
Söding, Johannes
Steinegger, Martin
author_facet Mirdita, Milot
von den Driesch, Lars
Galiez, Clovis
Martin, Maria J.
Söding, Johannes
Steinegger, Martin
author_sort Mirdita, Milot
collection PubMed
description We present three clustered protein sequence databases, Uniclust90, Uniclust50, Uniclust30 and three databases of multiple sequence alignments (MSAs), Uniboost10, Uniboost20 and Uniboost30, as a resource for protein sequence analysis, function prediction and sequence searches. The Uniclust databases cluster UniProtKB sequences at the level of 90%, 50% and 30% pairwise sequence identity. Uniclust90 and Uniclust50 clusters showed better consistency of functional annotation than those of UniRef90 and UniRef50, owing to an optimised clustering pipeline that runs with our MMseqs2 software for fast and sensitive protein sequence searching and clustering. Uniclust sequences are annotated with matches to Pfam, SCOP domains, and proteins in the PDB, using our HHblits homology detection tool. Due to its high sensitivity, Uniclust contains 17% more Pfam domain annotations than UniProt. Uniboost MSAs of three diversities are built by enriching the Uniclust30 MSAs with local sequence matches from MMseqs2 profile searches through Uniclust30. All databases can be downloaded from the Uniclust server at uniclust.mmseqs.com. Users can search clusters by keywords and explore their MSAs, taxonomic representation, and annotations. Uniclust is updated every two months with the new UniProt release.
format Online
Article
Text
id pubmed-5614098
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-56140982017-09-29 Uniclust databases of clustered and deeply annotated protein sequences and alignments Mirdita, Milot von den Driesch, Lars Galiez, Clovis Martin, Maria J. Söding, Johannes Steinegger, Martin Nucleic Acids Res Database Issue We present three clustered protein sequence databases, Uniclust90, Uniclust50, Uniclust30 and three databases of multiple sequence alignments (MSAs), Uniboost10, Uniboost20 and Uniboost30, as a resource for protein sequence analysis, function prediction and sequence searches. The Uniclust databases cluster UniProtKB sequences at the level of 90%, 50% and 30% pairwise sequence identity. Uniclust90 and Uniclust50 clusters showed better consistency of functional annotation than those of UniRef90 and UniRef50, owing to an optimised clustering pipeline that runs with our MMseqs2 software for fast and sensitive protein sequence searching and clustering. Uniclust sequences are annotated with matches to Pfam, SCOP domains, and proteins in the PDB, using our HHblits homology detection tool. Due to its high sensitivity, Uniclust contains 17% more Pfam domain annotations than UniProt. Uniboost MSAs of three diversities are built by enriching the Uniclust30 MSAs with local sequence matches from MMseqs2 profile searches through Uniclust30. All databases can be downloaded from the Uniclust server at uniclust.mmseqs.com. Users can search clusters by keywords and explore their MSAs, taxonomic representation, and annotations. Uniclust is updated every two months with the new UniProt release. Oxford University Press 2017-01-04 2016-11-29 /pmc/articles/PMC5614098/ /pubmed/27899574 http://dx.doi.org/10.1093/nar/gkw1081 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Database Issue
Mirdita, Milot
von den Driesch, Lars
Galiez, Clovis
Martin, Maria J.
Söding, Johannes
Steinegger, Martin
Uniclust databases of clustered and deeply annotated protein sequences and alignments
title Uniclust databases of clustered and deeply annotated protein sequences and alignments
title_full Uniclust databases of clustered and deeply annotated protein sequences and alignments
title_fullStr Uniclust databases of clustered and deeply annotated protein sequences and alignments
title_full_unstemmed Uniclust databases of clustered and deeply annotated protein sequences and alignments
title_short Uniclust databases of clustered and deeply annotated protein sequences and alignments
title_sort uniclust databases of clustered and deeply annotated protein sequences and alignments
topic Database Issue
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5614098/
https://www.ncbi.nlm.nih.gov/pubmed/27899574
http://dx.doi.org/10.1093/nar/gkw1081
work_keys_str_mv AT mirditamilot uniclustdatabasesofclusteredanddeeplyannotatedproteinsequencesandalignments
AT vondendrieschlars uniclustdatabasesofclusteredanddeeplyannotatedproteinsequencesandalignments
AT galiezclovis uniclustdatabasesofclusteredanddeeplyannotatedproteinsequencesandalignments
AT martinmariaj uniclustdatabasesofclusteredanddeeplyannotatedproteinsequencesandalignments
AT sodingjohannes uniclustdatabasesofclusteredanddeeplyannotatedproteinsequencesandalignments
AT steineggermartin uniclustdatabasesofclusteredanddeeplyannotatedproteinsequencesandalignments