Cargando…

The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools

BACKGROUND: Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of s...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wilke, Andreas, Harrison, Travis, Wilkening, Jared, Field, Dawn, Glass, Elizabeth M, Kyrpides, Nikos, Mavrommatis, Konstantinos, Meyer, Folker
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Database
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3410781/ https://www.ncbi.nlm.nih.gov/pubmed/22720753 http://dx.doi.org/10.1186/1471-2105-13-141

Descripción
Sumario:	BACKGROUND: Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference. DESCRIPTION: We introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation namespaces, e.g. KEGG or NCBI's GenBank. CONCLUSIONS: The data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets.

The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools

Ejemplares similares