Cargando…

ReprDB and panDB: minimalist databases with maximal microbial representation

BACKGROUND: Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tool...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Wei, Gay, Nicole, Oh, Julia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5774170/
https://www.ncbi.nlm.nih.gov/pubmed/29347966
http://dx.doi.org/10.1186/s40168-018-0399-2
_version_ 1783293717927428096
author Zhou, Wei
Gay, Nicole
Oh, Julia
author_facet Zhou, Wei
Gay, Nicole
Oh, Julia
author_sort Zhou, Wei
collection PubMed
description BACKGROUND: Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tools. Moreover, computational tools to rapidly compile and update such databases to accommodate the rapid increase in new reference genomes do not exist. As a result, database-guided analyses often fail to profile a substantial fraction of metagenomic shotgun sequencing reads from complex microbiomes. RESULTS: We report pipelines that efficiently traverse all open access microbial genomes and assemble non-redundant genomic information. The pipelines result in two species-resolution microbial reference databases of relatively small sizes: reprDB, which assembles microbial representative or reference genomes, and panDB, for which we developed a novel iterative alignment algorithm to identify and assemble non-redundant genomic regions in multiple sequenced strains. With the databases, we managed to assign taxonomic labels and genome positions to the majority of metagenomic reads from human skin and gut microbiomes, demonstrating a significant improvement over a previous database-guided analysis on the same datasets. CONCLUSIONS: reprDB and panDB leverage the rapid increases in the number of open access microbial genomes to more fully profile metagenomic samples. Additionally, the databases exclude redundant sequence information to avoid inflated storage or memory space and indexing or analyzing time. Finally, the novel iterative alignment algorithm significantly increases efficiency in pan-genome identification and can be useful in comparative genomic analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-018-0399-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5774170
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57741702018-01-26 ReprDB and panDB: minimalist databases with maximal microbial representation Zhou, Wei Gay, Nicole Oh, Julia Microbiome Methodology BACKGROUND: Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tools. Moreover, computational tools to rapidly compile and update such databases to accommodate the rapid increase in new reference genomes do not exist. As a result, database-guided analyses often fail to profile a substantial fraction of metagenomic shotgun sequencing reads from complex microbiomes. RESULTS: We report pipelines that efficiently traverse all open access microbial genomes and assemble non-redundant genomic information. The pipelines result in two species-resolution microbial reference databases of relatively small sizes: reprDB, which assembles microbial representative or reference genomes, and panDB, for which we developed a novel iterative alignment algorithm to identify and assemble non-redundant genomic regions in multiple sequenced strains. With the databases, we managed to assign taxonomic labels and genome positions to the majority of metagenomic reads from human skin and gut microbiomes, demonstrating a significant improvement over a previous database-guided analysis on the same datasets. CONCLUSIONS: reprDB and panDB leverage the rapid increases in the number of open access microbial genomes to more fully profile metagenomic samples. Additionally, the databases exclude redundant sequence information to avoid inflated storage or memory space and indexing or analyzing time. Finally, the novel iterative alignment algorithm significantly increases efficiency in pan-genome identification and can be useful in comparative genomic analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-018-0399-2) contains supplementary material, which is available to authorized users. BioMed Central 2018-01-18 /pmc/articles/PMC5774170/ /pubmed/29347966 http://dx.doi.org/10.1186/s40168-018-0399-2 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Zhou, Wei
Gay, Nicole
Oh, Julia
ReprDB and panDB: minimalist databases with maximal microbial representation
title ReprDB and panDB: minimalist databases with maximal microbial representation
title_full ReprDB and panDB: minimalist databases with maximal microbial representation
title_fullStr ReprDB and panDB: minimalist databases with maximal microbial representation
title_full_unstemmed ReprDB and panDB: minimalist databases with maximal microbial representation
title_short ReprDB and panDB: minimalist databases with maximal microbial representation
title_sort reprdb and pandb: minimalist databases with maximal microbial representation
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5774170/
https://www.ncbi.nlm.nih.gov/pubmed/29347966
http://dx.doi.org/10.1186/s40168-018-0399-2
work_keys_str_mv AT zhouwei reprdbandpandbminimalistdatabaseswithmaximalmicrobialrepresentation
AT gaynicole reprdbandpandbminimalistdatabaseswithmaximalmicrobialrepresentation
AT ohjulia reprdbandpandbminimalistdatabaseswithmaximalmicrobialrepresentation