Cargando…
The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools
BACKGROUND: Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of s...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3410781/ https://www.ncbi.nlm.nih.gov/pubmed/22720753 http://dx.doi.org/10.1186/1471-2105-13-141 |
_version_ | 1782239760600793088 |
---|---|
author | Wilke, Andreas Harrison, Travis Wilkening, Jared Field, Dawn Glass, Elizabeth M Kyrpides, Nikos Mavrommatis, Konstantinos Meyer, Folker |
author_facet | Wilke, Andreas Harrison, Travis Wilkening, Jared Field, Dawn Glass, Elizabeth M Kyrpides, Nikos Mavrommatis, Konstantinos Meyer, Folker |
author_sort | Wilke, Andreas |
collection | PubMed |
description | BACKGROUND: Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference. DESCRIPTION: We introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation namespaces, e.g. KEGG or NCBI's GenBank. CONCLUSIONS: The data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets. |
format | Online Article Text |
id | pubmed-3410781 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-34107812012-08-03 The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools Wilke, Andreas Harrison, Travis Wilkening, Jared Field, Dawn Glass, Elizabeth M Kyrpides, Nikos Mavrommatis, Konstantinos Meyer, Folker BMC Bioinformatics Database BACKGROUND: Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference. DESCRIPTION: We introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation namespaces, e.g. KEGG or NCBI's GenBank. CONCLUSIONS: The data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets. BioMed Central 2012-06-21 /pmc/articles/PMC3410781/ /pubmed/22720753 http://dx.doi.org/10.1186/1471-2105-13-141 Text en Copyright ©2012 Wilke et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Database Wilke, Andreas Harrison, Travis Wilkening, Jared Field, Dawn Glass, Elizabeth M Kyrpides, Nikos Mavrommatis, Konstantinos Meyer, Folker The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools |
title | The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools |
title_full | The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools |
title_fullStr | The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools |
title_full_unstemmed | The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools |
title_short | The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools |
title_sort | m5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools |
topic | Database |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3410781/ https://www.ncbi.nlm.nih.gov/pubmed/22720753 http://dx.doi.org/10.1186/1471-2105-13-141 |
work_keys_str_mv | AT wilkeandreas them5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools AT harrisontravis them5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools AT wilkeningjared them5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools AT fielddawn them5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools AT glasselizabethm them5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools AT kyrpidesnikos them5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools AT mavrommatiskonstantinos them5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools AT meyerfolker them5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools AT wilkeandreas m5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools AT harrisontravis m5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools AT wilkeningjared m5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools AT fielddawn m5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools AT glasselizabethm m5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools AT kyrpidesnikos m5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools AT mavrommatiskonstantinos m5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools AT meyerfolker m5nranovelnonredundantdatabasecontainingproteinsequencesandannotationsfrommultiplesourcesandassociatedtools |