Cargando…

SEQ2MGS: an effective tool for generating realistic artificial metagenomes from the existing sequencing data

Assessment of bioinformatics tools for the metagenomics analysis from the whole genome sequencing data requires realistic benchmark sets. We developed an effective and simple generator of artificial metagenomes from real sequencing experiments. The tool (SEQ2MGS) analyzes the input FASTQ files, prec...

Descripción completa

Detalles Bibliográficos
Autores principales: Van Camp, Pieter-Jan, Porollo, Aleksey
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9310082/
https://www.ncbi.nlm.nih.gov/pubmed/35899079
http://dx.doi.org/10.1093/nargab/lqac050
_version_ 1784753313959378944
author Van Camp, Pieter-Jan
Porollo, Aleksey
author_facet Van Camp, Pieter-Jan
Porollo, Aleksey
author_sort Van Camp, Pieter-Jan
collection PubMed
description Assessment of bioinformatics tools for the metagenomics analysis from the whole genome sequencing data requires realistic benchmark sets. We developed an effective and simple generator of artificial metagenomes from real sequencing experiments. The tool (SEQ2MGS) analyzes the input FASTQ files, precomputes genomic content, and blends shotgun reads from different sequenced isolates, or spike isolate(s) in real metagenome, in desired proportions. SEQ2MGS eliminates the need for simulation of sequencing platform variations, reads distributions, presence of plasmids, viruses, and contamination. The tool is especially useful for a quick generation of multiple complex samples that include new or understudied organisms, even without assembled genomes. For illustration, we first demonstrated the ease of SEQ2MGS use for the simulation of altered Schaedler flora (ASF) in comparison with de novo metagenomics generators Grinder and CAMISIM. Next, we emulated the emergence of a pathogen in the human gut microbiome and observed that Kraken, Centrifuge, and MetaPhlAn, while correctly identified Klebsiella pneumoniae, produced inconsistent results for the rest of real metagenome. Finally, using the MG-RAST platform, we affirmed that SEQ2MGS properly transfers genomic information from an isolate into the simulated metagenome by the correct identification of antimicrobial resistance genes anticipated to appear compared to the original metagenome.
format Online
Article
Text
id pubmed-9310082
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-93100822022-07-26 SEQ2MGS: an effective tool for generating realistic artificial metagenomes from the existing sequencing data Van Camp, Pieter-Jan Porollo, Aleksey NAR Genom Bioinform Methods Article Assessment of bioinformatics tools for the metagenomics analysis from the whole genome sequencing data requires realistic benchmark sets. We developed an effective and simple generator of artificial metagenomes from real sequencing experiments. The tool (SEQ2MGS) analyzes the input FASTQ files, precomputes genomic content, and blends shotgun reads from different sequenced isolates, or spike isolate(s) in real metagenome, in desired proportions. SEQ2MGS eliminates the need for simulation of sequencing platform variations, reads distributions, presence of plasmids, viruses, and contamination. The tool is especially useful for a quick generation of multiple complex samples that include new or understudied organisms, even without assembled genomes. For illustration, we first demonstrated the ease of SEQ2MGS use for the simulation of altered Schaedler flora (ASF) in comparison with de novo metagenomics generators Grinder and CAMISIM. Next, we emulated the emergence of a pathogen in the human gut microbiome and observed that Kraken, Centrifuge, and MetaPhlAn, while correctly identified Klebsiella pneumoniae, produced inconsistent results for the rest of real metagenome. Finally, using the MG-RAST platform, we affirmed that SEQ2MGS properly transfers genomic information from an isolate into the simulated metagenome by the correct identification of antimicrobial resistance genes anticipated to appear compared to the original metagenome. Oxford University Press 2022-07-25 /pmc/articles/PMC9310082/ /pubmed/35899079 http://dx.doi.org/10.1093/nargab/lqac050 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Article
Van Camp, Pieter-Jan
Porollo, Aleksey
SEQ2MGS: an effective tool for generating realistic artificial metagenomes from the existing sequencing data
title SEQ2MGS: an effective tool for generating realistic artificial metagenomes from the existing sequencing data
title_full SEQ2MGS: an effective tool for generating realistic artificial metagenomes from the existing sequencing data
title_fullStr SEQ2MGS: an effective tool for generating realistic artificial metagenomes from the existing sequencing data
title_full_unstemmed SEQ2MGS: an effective tool for generating realistic artificial metagenomes from the existing sequencing data
title_short SEQ2MGS: an effective tool for generating realistic artificial metagenomes from the existing sequencing data
title_sort seq2mgs: an effective tool for generating realistic artificial metagenomes from the existing sequencing data
topic Methods Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9310082/
https://www.ncbi.nlm.nih.gov/pubmed/35899079
http://dx.doi.org/10.1093/nargab/lqac050
work_keys_str_mv AT vancamppieterjan seq2mgsaneffectivetoolforgeneratingrealisticartificialmetagenomesfromtheexistingsequencingdata
AT porolloaleksey seq2mgsaneffectivetoolforgeneratingrealisticartificialmetagenomesfromtheexistingsequencingdata