Cargando…
CAMISIM: simulating metagenomes and microbial communities
BACKGROUND: Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess th...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6368784/ https://www.ncbi.nlm.nih.gov/pubmed/30736849 http://dx.doi.org/10.1186/s40168-019-0633-6 |
_version_ | 1783394061664649216 |
---|---|
author | Fritz, Adrian Hofmann, Peter Majda, Stephan Dahms, Eik Dröge, Johannes Fiedler, Jessika Lesker, Till R. Belmann, Peter DeMaere, Matthew Z. Darling, Aaron E. Sczyrba, Alexander Bremges, Andreas McHardy, Alice C. |
author_facet | Fritz, Adrian Hofmann, Peter Majda, Stephan Dahms, Eik Dröge, Johannes Fiedler, Jessika Lesker, Till R. Belmann, Peter DeMaere, Matthew Z. Darling, Aaron E. Sczyrba, Alexander Bremges, Andreas McHardy, Alice C. |
author_sort | Fritz, Adrian |
collection | PubMed |
description | BACKGROUND: Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. RESULTS: We describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes, we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT, and metaSPAdes, on several thousand small data sets generated with CAMISIM. CONCLUSIONS: CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation. All data sets and the software are freely available at https://github.com/CAMI-challenge/CAMISIM ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-019-0633-6) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6368784 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-63687842019-02-15 CAMISIM: simulating metagenomes and microbial communities Fritz, Adrian Hofmann, Peter Majda, Stephan Dahms, Eik Dröge, Johannes Fiedler, Jessika Lesker, Till R. Belmann, Peter DeMaere, Matthew Z. Darling, Aaron E. Sczyrba, Alexander Bremges, Andreas McHardy, Alice C. Microbiome Software BACKGROUND: Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. RESULTS: We describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes, we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT, and metaSPAdes, on several thousand small data sets generated with CAMISIM. CONCLUSIONS: CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation. All data sets and the software are freely available at https://github.com/CAMI-challenge/CAMISIM ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-019-0633-6) contains supplementary material, which is available to authorized users. BioMed Central 2019-02-08 /pmc/articles/PMC6368784/ /pubmed/30736849 http://dx.doi.org/10.1186/s40168-019-0633-6 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Fritz, Adrian Hofmann, Peter Majda, Stephan Dahms, Eik Dröge, Johannes Fiedler, Jessika Lesker, Till R. Belmann, Peter DeMaere, Matthew Z. Darling, Aaron E. Sczyrba, Alexander Bremges, Andreas McHardy, Alice C. CAMISIM: simulating metagenomes and microbial communities |
title | CAMISIM: simulating metagenomes and microbial communities |
title_full | CAMISIM: simulating metagenomes and microbial communities |
title_fullStr | CAMISIM: simulating metagenomes and microbial communities |
title_full_unstemmed | CAMISIM: simulating metagenomes and microbial communities |
title_short | CAMISIM: simulating metagenomes and microbial communities |
title_sort | camisim: simulating metagenomes and microbial communities |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6368784/ https://www.ncbi.nlm.nih.gov/pubmed/30736849 http://dx.doi.org/10.1186/s40168-019-0633-6 |
work_keys_str_mv | AT fritzadrian camisimsimulatingmetagenomesandmicrobialcommunities AT hofmannpeter camisimsimulatingmetagenomesandmicrobialcommunities AT majdastephan camisimsimulatingmetagenomesandmicrobialcommunities AT dahmseik camisimsimulatingmetagenomesandmicrobialcommunities AT drogejohannes camisimsimulatingmetagenomesandmicrobialcommunities AT fiedlerjessika camisimsimulatingmetagenomesandmicrobialcommunities AT leskertillr camisimsimulatingmetagenomesandmicrobialcommunities AT belmannpeter camisimsimulatingmetagenomesandmicrobialcommunities AT demaerematthewz camisimsimulatingmetagenomesandmicrobialcommunities AT darlingaarone camisimsimulatingmetagenomesandmicrobialcommunities AT sczyrbaalexander camisimsimulatingmetagenomesandmicrobialcommunities AT bremgesandreas camisimsimulatingmetagenomesandmicrobialcommunities AT mchardyalicec camisimsimulatingmetagenomesandmicrobialcommunities |