Cargando…

CAMISIM: simulating metagenomes and microbial communities

BACKGROUND: Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess th...

Descripción completa

Detalles Bibliográficos
Autores principales: Fritz, Adrian, Hofmann, Peter, Majda, Stephan, Dahms, Eik, Dröge, Johannes, Fiedler, Jessika, Lesker, Till R., Belmann, Peter, DeMaere, Matthew Z., Darling, Aaron E., Sczyrba, Alexander, Bremges, Andreas, McHardy, Alice C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6368784/
https://www.ncbi.nlm.nih.gov/pubmed/30736849
http://dx.doi.org/10.1186/s40168-019-0633-6
_version_ 1783394061664649216
author Fritz, Adrian
Hofmann, Peter
Majda, Stephan
Dahms, Eik
Dröge, Johannes
Fiedler, Jessika
Lesker, Till R.
Belmann, Peter
DeMaere, Matthew Z.
Darling, Aaron E.
Sczyrba, Alexander
Bremges, Andreas
McHardy, Alice C.
author_facet Fritz, Adrian
Hofmann, Peter
Majda, Stephan
Dahms, Eik
Dröge, Johannes
Fiedler, Jessika
Lesker, Till R.
Belmann, Peter
DeMaere, Matthew Z.
Darling, Aaron E.
Sczyrba, Alexander
Bremges, Andreas
McHardy, Alice C.
author_sort Fritz, Adrian
collection PubMed
description BACKGROUND: Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. RESULTS: We describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes, we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT, and metaSPAdes, on several thousand small data sets generated with CAMISIM. CONCLUSIONS: CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation. All data sets and the software are freely available at https://github.com/CAMI-challenge/CAMISIM ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-019-0633-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6368784
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63687842019-02-15 CAMISIM: simulating metagenomes and microbial communities Fritz, Adrian Hofmann, Peter Majda, Stephan Dahms, Eik Dröge, Johannes Fiedler, Jessika Lesker, Till R. Belmann, Peter DeMaere, Matthew Z. Darling, Aaron E. Sczyrba, Alexander Bremges, Andreas McHardy, Alice C. Microbiome Software BACKGROUND: Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. RESULTS: We describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes, we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT, and metaSPAdes, on several thousand small data sets generated with CAMISIM. CONCLUSIONS: CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation. All data sets and the software are freely available at https://github.com/CAMI-challenge/CAMISIM ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-019-0633-6) contains supplementary material, which is available to authorized users. BioMed Central 2019-02-08 /pmc/articles/PMC6368784/ /pubmed/30736849 http://dx.doi.org/10.1186/s40168-019-0633-6 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Fritz, Adrian
Hofmann, Peter
Majda, Stephan
Dahms, Eik
Dröge, Johannes
Fiedler, Jessika
Lesker, Till R.
Belmann, Peter
DeMaere, Matthew Z.
Darling, Aaron E.
Sczyrba, Alexander
Bremges, Andreas
McHardy, Alice C.
CAMISIM: simulating metagenomes and microbial communities
title CAMISIM: simulating metagenomes and microbial communities
title_full CAMISIM: simulating metagenomes and microbial communities
title_fullStr CAMISIM: simulating metagenomes and microbial communities
title_full_unstemmed CAMISIM: simulating metagenomes and microbial communities
title_short CAMISIM: simulating metagenomes and microbial communities
title_sort camisim: simulating metagenomes and microbial communities
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6368784/
https://www.ncbi.nlm.nih.gov/pubmed/30736849
http://dx.doi.org/10.1186/s40168-019-0633-6
work_keys_str_mv AT fritzadrian camisimsimulatingmetagenomesandmicrobialcommunities
AT hofmannpeter camisimsimulatingmetagenomesandmicrobialcommunities
AT majdastephan camisimsimulatingmetagenomesandmicrobialcommunities
AT dahmseik camisimsimulatingmetagenomesandmicrobialcommunities
AT drogejohannes camisimsimulatingmetagenomesandmicrobialcommunities
AT fiedlerjessika camisimsimulatingmetagenomesandmicrobialcommunities
AT leskertillr camisimsimulatingmetagenomesandmicrobialcommunities
AT belmannpeter camisimsimulatingmetagenomesandmicrobialcommunities
AT demaerematthewz camisimsimulatingmetagenomesandmicrobialcommunities
AT darlingaarone camisimsimulatingmetagenomesandmicrobialcommunities
AT sczyrbaalexander camisimsimulatingmetagenomesandmicrobialcommunities
AT bremgesandreas camisimsimulatingmetagenomesandmicrobialcommunities
AT mchardyalicec camisimsimulatingmetagenomesandmicrobialcommunities