Cargando…
ReprDB and panDB: minimalist databases with maximal microbial representation
BACKGROUND: Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tool...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5774170/ https://www.ncbi.nlm.nih.gov/pubmed/29347966 http://dx.doi.org/10.1186/s40168-018-0399-2 |
_version_ | 1783293717927428096 |
---|---|
author | Zhou, Wei Gay, Nicole Oh, Julia |
author_facet | Zhou, Wei Gay, Nicole Oh, Julia |
author_sort | Zhou, Wei |
collection | PubMed |
description | BACKGROUND: Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tools. Moreover, computational tools to rapidly compile and update such databases to accommodate the rapid increase in new reference genomes do not exist. As a result, database-guided analyses often fail to profile a substantial fraction of metagenomic shotgun sequencing reads from complex microbiomes. RESULTS: We report pipelines that efficiently traverse all open access microbial genomes and assemble non-redundant genomic information. The pipelines result in two species-resolution microbial reference databases of relatively small sizes: reprDB, which assembles microbial representative or reference genomes, and panDB, for which we developed a novel iterative alignment algorithm to identify and assemble non-redundant genomic regions in multiple sequenced strains. With the databases, we managed to assign taxonomic labels and genome positions to the majority of metagenomic reads from human skin and gut microbiomes, demonstrating a significant improvement over a previous database-guided analysis on the same datasets. CONCLUSIONS: reprDB and panDB leverage the rapid increases in the number of open access microbial genomes to more fully profile metagenomic samples. Additionally, the databases exclude redundant sequence information to avoid inflated storage or memory space and indexing or analyzing time. Finally, the novel iterative alignment algorithm significantly increases efficiency in pan-genome identification and can be useful in comparative genomic analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-018-0399-2) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5774170 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-57741702018-01-26 ReprDB and panDB: minimalist databases with maximal microbial representation Zhou, Wei Gay, Nicole Oh, Julia Microbiome Methodology BACKGROUND: Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tools. Moreover, computational tools to rapidly compile and update such databases to accommodate the rapid increase in new reference genomes do not exist. As a result, database-guided analyses often fail to profile a substantial fraction of metagenomic shotgun sequencing reads from complex microbiomes. RESULTS: We report pipelines that efficiently traverse all open access microbial genomes and assemble non-redundant genomic information. The pipelines result in two species-resolution microbial reference databases of relatively small sizes: reprDB, which assembles microbial representative or reference genomes, and panDB, for which we developed a novel iterative alignment algorithm to identify and assemble non-redundant genomic regions in multiple sequenced strains. With the databases, we managed to assign taxonomic labels and genome positions to the majority of metagenomic reads from human skin and gut microbiomes, demonstrating a significant improvement over a previous database-guided analysis on the same datasets. CONCLUSIONS: reprDB and panDB leverage the rapid increases in the number of open access microbial genomes to more fully profile metagenomic samples. Additionally, the databases exclude redundant sequence information to avoid inflated storage or memory space and indexing or analyzing time. Finally, the novel iterative alignment algorithm significantly increases efficiency in pan-genome identification and can be useful in comparative genomic analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-018-0399-2) contains supplementary material, which is available to authorized users. BioMed Central 2018-01-18 /pmc/articles/PMC5774170/ /pubmed/29347966 http://dx.doi.org/10.1186/s40168-018-0399-2 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Zhou, Wei Gay, Nicole Oh, Julia ReprDB and panDB: minimalist databases with maximal microbial representation |
title | ReprDB and panDB: minimalist databases with maximal microbial representation |
title_full | ReprDB and panDB: minimalist databases with maximal microbial representation |
title_fullStr | ReprDB and panDB: minimalist databases with maximal microbial representation |
title_full_unstemmed | ReprDB and panDB: minimalist databases with maximal microbial representation |
title_short | ReprDB and panDB: minimalist databases with maximal microbial representation |
title_sort | reprdb and pandb: minimalist databases with maximal microbial representation |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5774170/ https://www.ncbi.nlm.nih.gov/pubmed/29347966 http://dx.doi.org/10.1186/s40168-018-0399-2 |
work_keys_str_mv | AT zhouwei reprdbandpandbminimalistdatabaseswithmaximalmicrobialrepresentation AT gaynicole reprdbandpandbminimalistdatabaseswithmaximalmicrobialrepresentation AT ohjulia reprdbandpandbminimalistdatabaseswithmaximalmicrobialrepresentation |