Cargando…

Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads

BACKGROUND: Sequencing of marker genes amplified from environmental samples, known as amplicon sequencing, allows us to resolve some of the hidden diversity and elucidate evolutionary relationships and ecological processes among complex microbial communities. The analysis of large numbers of samples...

Descripción completa

Detalles Bibliográficos
Autores principales: Welzel, Marius, Lange, Anja, Heider, Dominik, Schwarz, Michael, Freisleben, Bernd, Jensen, Manfred, Boenigk, Jens, Beisser, Daniela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7667751/
https://www.ncbi.nlm.nih.gov/pubmed/33198651
http://dx.doi.org/10.1186/s12859-020-03852-4
_version_ 1783610374831996928
author Welzel, Marius
Lange, Anja
Heider, Dominik
Schwarz, Michael
Freisleben, Bernd
Jensen, Manfred
Boenigk, Jens
Beisser, Daniela
author_facet Welzel, Marius
Lange, Anja
Heider, Dominik
Schwarz, Michael
Freisleben, Bernd
Jensen, Manfred
Boenigk, Jens
Beisser, Daniela
author_sort Welzel, Marius
collection PubMed
description BACKGROUND: Sequencing of marker genes amplified from environmental samples, known as amplicon sequencing, allows us to resolve some of the hidden diversity and elucidate evolutionary relationships and ecological processes among complex microbial communities. The analysis of large numbers of samples at high sequencing depths generated by high throughput sequencing technologies requires efficient, flexible, and reproducible bioinformatics pipelines. Only a few existing workflows can be run in a user-friendly, scalable, and reproducible manner on different computing devices using an efficient workflow management system. RESULTS: We present Natrix, an open-source bioinformatics workflow for preprocessing raw amplicon sequencing data. The workflow contains all analysis steps from quality assessment, read assembly, dereplication, chimera detection, split-sample merging, sequence representative assignment (OTUs or ASVs) to the taxonomic assignment of sequence representatives. The workflow is written using Snakemake, a workflow management engine for developing data analysis workflows. In addition, Conda is used for version control. Thus, Snakemake ensures reproducibility and Conda offers version control of the utilized programs. The encapsulation of rules and their dependencies support hassle-free sharing of rules between workflows and easy adaptation and extension of existing workflows. Natrix is freely available on GitHub (https://github.com/MW55/Natrix) or as a Docker container on DockerHub (https://hub.docker.com/r/mw55/natrix). CONCLUSION: Natrix is a user-friendly and highly extensible workflow for processing Illumina amplicon data.
format Online
Article
Text
id pubmed-7667751
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-76677512020-11-17 Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads Welzel, Marius Lange, Anja Heider, Dominik Schwarz, Michael Freisleben, Bernd Jensen, Manfred Boenigk, Jens Beisser, Daniela BMC Bioinformatics Software BACKGROUND: Sequencing of marker genes amplified from environmental samples, known as amplicon sequencing, allows us to resolve some of the hidden diversity and elucidate evolutionary relationships and ecological processes among complex microbial communities. The analysis of large numbers of samples at high sequencing depths generated by high throughput sequencing technologies requires efficient, flexible, and reproducible bioinformatics pipelines. Only a few existing workflows can be run in a user-friendly, scalable, and reproducible manner on different computing devices using an efficient workflow management system. RESULTS: We present Natrix, an open-source bioinformatics workflow for preprocessing raw amplicon sequencing data. The workflow contains all analysis steps from quality assessment, read assembly, dereplication, chimera detection, split-sample merging, sequence representative assignment (OTUs or ASVs) to the taxonomic assignment of sequence representatives. The workflow is written using Snakemake, a workflow management engine for developing data analysis workflows. In addition, Conda is used for version control. Thus, Snakemake ensures reproducibility and Conda offers version control of the utilized programs. The encapsulation of rules and their dependencies support hassle-free sharing of rules between workflows and easy adaptation and extension of existing workflows. Natrix is freely available on GitHub (https://github.com/MW55/Natrix) or as a Docker container on DockerHub (https://hub.docker.com/r/mw55/natrix). CONCLUSION: Natrix is a user-friendly and highly extensible workflow for processing Illumina amplicon data. BioMed Central 2020-11-16 /pmc/articles/PMC7667751/ /pubmed/33198651 http://dx.doi.org/10.1186/s12859-020-03852-4 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Welzel, Marius
Lange, Anja
Heider, Dominik
Schwarz, Michael
Freisleben, Bernd
Jensen, Manfred
Boenigk, Jens
Beisser, Daniela
Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads
title Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads
title_full Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads
title_fullStr Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads
title_full_unstemmed Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads
title_short Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads
title_sort natrix: a snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7667751/
https://www.ncbi.nlm.nih.gov/pubmed/33198651
http://dx.doi.org/10.1186/s12859-020-03852-4
work_keys_str_mv AT welzelmarius natrixasnakemakebasedworkflowforprocessingclusteringandtaxonomicallyassigningampliconsequencingreads
AT langeanja natrixasnakemakebasedworkflowforprocessingclusteringandtaxonomicallyassigningampliconsequencingreads
AT heiderdominik natrixasnakemakebasedworkflowforprocessingclusteringandtaxonomicallyassigningampliconsequencingreads
AT schwarzmichael natrixasnakemakebasedworkflowforprocessingclusteringandtaxonomicallyassigningampliconsequencingreads
AT freislebenbernd natrixasnakemakebasedworkflowforprocessingclusteringandtaxonomicallyassigningampliconsequencingreads
AT jensenmanfred natrixasnakemakebasedworkflowforprocessingclusteringandtaxonomicallyassigningampliconsequencingreads
AT boenigkjens natrixasnakemakebasedworkflowforprocessingclusteringandtaxonomicallyassigningampliconsequencingreads
AT beisserdaniela natrixasnakemakebasedworkflowforprocessingclusteringandtaxonomicallyassigningampliconsequencingreads