Cargando…

The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools

BACKGROUND: Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of s...

Descripción completa

Detalles Bibliográficos
Autores principales: Wilke, Andreas, Harrison, Travis, Wilkening, Jared, Field, Dawn, Glass, Elizabeth M, Kyrpides, Nikos, Mavrommatis, Konstantinos, Meyer, Folker
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3410781/
https://www.ncbi.nlm.nih.gov/pubmed/22720753
http://dx.doi.org/10.1186/1471-2105-13-141
_version_ 1782239760600793088
author Wilke, Andreas
Harrison, Travis
Wilkening, Jared
Field, Dawn
Glass, Elizabeth M
Kyrpides, Nikos
Mavrommatis, Konstantinos
Meyer, Folker
author_facet Wilke, Andreas
Harrison, Travis
Wilkening, Jared
Field, Dawn
Glass, Elizabeth M
Kyrpides, Nikos
Mavrommatis, Konstantinos
Meyer, Folker
author_sort Wilke, Andreas
collection PubMed
description BACKGROUND: Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference. DESCRIPTION: We introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation namespaces, e.g. KEGG or NCBI's GenBank. CONCLUSIONS: The data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets.
format Online
Article
Text
id pubmed-3410781
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34107812012-08-03 The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools Wilke, Andreas Harrison, Travis Wilkening, Jared Field, Dawn Glass, Elizabeth M Kyrpides, Nikos Mavrommatis, Konstantinos Meyer, Folker BMC Bioinformatics Database BACKGROUND: Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference. DESCRIPTION: We introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation namespaces, e.g. KEGG or NCBI's GenBank. CONCLUSIONS: The data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets. BioMed Central 2012-06-21 /pmc/articles/PMC3410781/ /pubmed/22720753 http://dx.doi.org/10.1186/1471-2105-13-141 Text en Copyright ©2012 Wilke et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Database
Wilke, Andreas
Harrison, Travis
Wilkening, Jared
Field, Dawn
Glass, Elizabeth M
Kyrpides, Nikos
Mavrommatis, Konstantinos
Meyer, Folker
The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools
title The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools
title_full The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools
title_fullStr The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools
title_full_unstemmed The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools
title_short The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools
title_sort m5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools
topic Database
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3410781/
https://www.ncbi.nlm.nih.gov/pubmed/22720753
http://dx.doi.org/10.1186/1471-2105-13-141
work_keys_str_mv AT wilkeandreas them5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools
AT harrisontravis them5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools
AT wilkeningjared them5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools
AT fielddawn them5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools
AT glasselizabethm them5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools
AT kyrpidesnikos them5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools
AT mavrommatiskonstantinos them5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools
AT meyerfolker them5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools
AT wilkeandreas m5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools
AT harrisontravis m5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools
AT wilkeningjared m5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools
AT fielddawn m5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools
AT glasselizabethm m5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools
AT kyrpidesnikos m5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools
AT mavrommatiskonstantinos m5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools
AT meyerfolker m5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools