Cargando…

ReprDB and panDB: minimalist databases with maximal microbial representation

BACKGROUND: Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tool...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhou, Wei, Gay, Nicole, Oh, Julia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5774170/ https://www.ncbi.nlm.nih.gov/pubmed/29347966 http://dx.doi.org/10.1186/s40168-018-0399-2

_version_	1783293717927428096
author	Zhou, Wei Gay, Nicole Oh, Julia
author_facet	Zhou, Wei Gay, Nicole Oh, Julia
author_sort	Zhou, Wei
collection	PubMed
description	BACKGROUND: Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tools. Moreover, computational tools to rapidly compile and update such databases to accommodate the rapid increase in new reference genomes do not exist. As a result, database-guided analyses often fail to profile a substantial fraction of metagenomic shotgun sequencing reads from complex microbiomes. RESULTS: We report pipelines that efficiently traverse all open access microbial genomes and assemble non-redundant genomic information. The pipelines result in two species-resolution microbial reference databases of relatively small sizes: reprDB, which assembles microbial representative or reference genomes, and panDB, for which we developed a novel iterative alignment algorithm to identify and assemble non-redundant genomic regions in multiple sequenced strains. With the databases, we managed to assign taxonomic labels and genome positions to the majority of metagenomic reads from human skin and gut microbiomes, demonstrating a significant improvement over a previous database-guided analysis on the same datasets. CONCLUSIONS: reprDB and panDB leverage the rapid increases in the number of open access microbial genomes to more fully profile metagenomic samples. Additionally, the databases exclude redundant sequence information to avoid inflated storage or memory space and indexing or analyzing time. Finally, the novel iterative alignment algorithm significantly increases efficiency in pan-genome identification and can be useful in comparative genomic analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-018-0399-2) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5774170
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-57741702018-01-26 ReprDB and panDB: minimalist databases with maximal microbial representation Zhou, Wei Gay, Nicole Oh, Julia Microbiome Methodology BACKGROUND: Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tools. Moreover, computational tools to rapidly compile and update such databases to accommodate the rapid increase in new reference genomes do not exist. As a result, database-guided analyses often fail to profile a substantial fraction of metagenomic shotgun sequencing reads from complex microbiomes. RESULTS: We report pipelines that efficiently traverse all open access microbial genomes and assemble non-redundant genomic information. The pipelines result in two species-resolution microbial reference databases of relatively small sizes: reprDB, which assembles microbial representative or reference genomes, and panDB, for which we developed a novel iterative alignment algorithm to identify and assemble non-redundant genomic regions in multiple sequenced strains. With the databases, we managed to assign taxonomic labels and genome positions to the majority of metagenomic reads from human skin and gut microbiomes, demonstrating a significant improvement over a previous database-guided analysis on the same datasets. CONCLUSIONS: reprDB and panDB leverage the rapid increases in the number of open access microbial genomes to more fully profile metagenomic samples. Additionally, the databases exclude redundant sequence information to avoid inflated storage or memory space and indexing or analyzing time. Finally, the novel iterative alignment algorithm significantly increases efficiency in pan-genome identification and can be useful in comparative genomic analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-018-0399-2) contains supplementary material, which is available to authorized users. BioMed Central 2018-01-18 /pmc/articles/PMC5774170/ /pubmed/29347966 http://dx.doi.org/10.1186/s40168-018-0399-2 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Zhou, Wei Gay, Nicole Oh, Julia ReprDB and panDB: minimalist databases with maximal microbial representation
title	ReprDB and panDB: minimalist databases with maximal microbial representation
title_full	ReprDB and panDB: minimalist databases with maximal microbial representation
title_fullStr	ReprDB and panDB: minimalist databases with maximal microbial representation
title_full_unstemmed	ReprDB and panDB: minimalist databases with maximal microbial representation
title_short	ReprDB and panDB: minimalist databases with maximal microbial representation
title_sort	reprdb and pandb: minimalist databases with maximal microbial representation
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5774170/ https://www.ncbi.nlm.nih.gov/pubmed/29347966 http://dx.doi.org/10.1186/s40168-018-0399-2
work_keys_str_mv	AT zhouwei reprdbandpandbminimalistdatabaseswithmaximalmicrobialrepresentation AT gaynicole reprdbandpandbminimalistdatabaseswithmaximalmicrobialrepresentation AT ohjulia reprdbandpandbminimalistdatabaseswithmaximalmicrobialrepresentation

ReprDB and panDB: minimalist databases with maximal microbial representation

Ejemplares similares