Cargando…

SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes

Background: Over the last decade, we have observed in microbial ecology a transition from gene-centric to genome-centric analyses. Indeed, the advent of metagenomics combined with binning methods, single-cell genome sequencing as well as high-throughput cultivation methods have contributed to the co...

Descripción completa

Detalles Bibliográficos
Autores principales: Tadrent, Nachida, Dedeine, Franck, Hervé, Vincent
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9978240/
https://www.ncbi.nlm.nih.gov/pubmed/36875992
http://dx.doi.org/10.12688/f1000research.128091.2
_version_ 1784899476107821056
author Tadrent, Nachida
Dedeine, Franck
Hervé, Vincent
author_facet Tadrent, Nachida
Dedeine, Franck
Hervé, Vincent
author_sort Tadrent, Nachida
collection PubMed
description Background: Over the last decade, we have observed in microbial ecology a transition from gene-centric to genome-centric analyses. Indeed, the advent of metagenomics combined with binning methods, single-cell genome sequencing as well as high-throughput cultivation methods have contributed to the continuing and exponential increase of available prokaryotic genomes, which in turn has favored the exploration of microbial metabolisms. In the case of metagenomics, data processing, from raw reads to genome reconstruction, involves various steps and software which can represent a major technical obstacle. Methods: To overcome this challenge, we developed SnakeMAGs, a simple workflow that can process Illumina data, from raw reads to metagenome-assembled genomes (MAGs) classification and relative abundance estimate. It integrates state-of-the-art bioinformatic tools to sequentially perform: quality control of the reads (illumina-utils, Trimmomatic), host sequence removal (optional step, using Bowtie2), assembly (MEGAHIT), binning (MetaBAT2), quality filtering of the bins (CheckM, GUNC), classification of the MAGs (GTDB-Tk) and estimate of their relative abundance (CoverM). Developed with the popular Snakemake workflow management system, it can be deployed on various architectures, from single to multicore and from workstation to computer clusters and grids. It is also flexible since users can easily change parameters and/or add new rules. Results: Using termite gut metagenomic datasets, we showed that SnakeMAGs is slower but allowed the recovery of more MAGs encompassing more diverse phyla compared to another similar workflow named ATLAS. Importantly, these additional MAGs showed no significant difference compared to the other ones in terms of completeness, contamination, genome size nor relative abundance. Conclusions: Overall, it should make the reconstruction of MAGs more accessible to microbiologists. SnakeMAGs as well as test files and an extended tutorial are available at https://github.com/Nachida08/SnakeMAGs.
format Online
Article
Text
id pubmed-9978240
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-99782402023-03-03 SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes Tadrent, Nachida Dedeine, Franck Hervé, Vincent F1000Res Software Tool Article Background: Over the last decade, we have observed in microbial ecology a transition from gene-centric to genome-centric analyses. Indeed, the advent of metagenomics combined with binning methods, single-cell genome sequencing as well as high-throughput cultivation methods have contributed to the continuing and exponential increase of available prokaryotic genomes, which in turn has favored the exploration of microbial metabolisms. In the case of metagenomics, data processing, from raw reads to genome reconstruction, involves various steps and software which can represent a major technical obstacle. Methods: To overcome this challenge, we developed SnakeMAGs, a simple workflow that can process Illumina data, from raw reads to metagenome-assembled genomes (MAGs) classification and relative abundance estimate. It integrates state-of-the-art bioinformatic tools to sequentially perform: quality control of the reads (illumina-utils, Trimmomatic), host sequence removal (optional step, using Bowtie2), assembly (MEGAHIT), binning (MetaBAT2), quality filtering of the bins (CheckM, GUNC), classification of the MAGs (GTDB-Tk) and estimate of their relative abundance (CoverM). Developed with the popular Snakemake workflow management system, it can be deployed on various architectures, from single to multicore and from workstation to computer clusters and grids. It is also flexible since users can easily change parameters and/or add new rules. Results: Using termite gut metagenomic datasets, we showed that SnakeMAGs is slower but allowed the recovery of more MAGs encompassing more diverse phyla compared to another similar workflow named ATLAS. Importantly, these additional MAGs showed no significant difference compared to the other ones in terms of completeness, contamination, genome size nor relative abundance. Conclusions: Overall, it should make the reconstruction of MAGs more accessible to microbiologists. SnakeMAGs as well as test files and an extended tutorial are available at https://github.com/Nachida08/SnakeMAGs. F1000 Research Limited 2023-02-27 /pmc/articles/PMC9978240/ /pubmed/36875992 http://dx.doi.org/10.12688/f1000research.128091.2 Text en Copyright: © 2023 Tadrent N et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Tool Article
Tadrent, Nachida
Dedeine, Franck
Hervé, Vincent
SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes
title SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes
title_full SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes
title_fullStr SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes
title_full_unstemmed SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes
title_short SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes
title_sort snakemags: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes
topic Software Tool Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9978240/
https://www.ncbi.nlm.nih.gov/pubmed/36875992
http://dx.doi.org/10.12688/f1000research.128091.2
work_keys_str_mv AT tadrentnachida snakemagsasimpleefficientflexibleandscalableworkflowtoreconstructprokaryoticgenomesfrommetagenomes
AT dedeinefranck snakemagsasimpleefficientflexibleandscalableworkflowtoreconstructprokaryoticgenomesfrommetagenomes
AT hervevincent snakemagsasimpleefficientflexibleandscalableworkflowtoreconstructprokaryoticgenomesfrommetagenomes