Cargando…
Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads
BACKGROUND: Sequencing of marker genes amplified from environmental samples, known as amplicon sequencing, allows us to resolve some of the hidden diversity and elucidate evolutionary relationships and ecological processes among complex microbial communities. The analysis of large numbers of samples...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7667751/ https://www.ncbi.nlm.nih.gov/pubmed/33198651 http://dx.doi.org/10.1186/s12859-020-03852-4 |
_version_ | 1783610374831996928 |
---|---|
author | Welzel, Marius Lange, Anja Heider, Dominik Schwarz, Michael Freisleben, Bernd Jensen, Manfred Boenigk, Jens Beisser, Daniela |
author_facet | Welzel, Marius Lange, Anja Heider, Dominik Schwarz, Michael Freisleben, Bernd Jensen, Manfred Boenigk, Jens Beisser, Daniela |
author_sort | Welzel, Marius |
collection | PubMed |
description | BACKGROUND: Sequencing of marker genes amplified from environmental samples, known as amplicon sequencing, allows us to resolve some of the hidden diversity and elucidate evolutionary relationships and ecological processes among complex microbial communities. The analysis of large numbers of samples at high sequencing depths generated by high throughput sequencing technologies requires efficient, flexible, and reproducible bioinformatics pipelines. Only a few existing workflows can be run in a user-friendly, scalable, and reproducible manner on different computing devices using an efficient workflow management system. RESULTS: We present Natrix, an open-source bioinformatics workflow for preprocessing raw amplicon sequencing data. The workflow contains all analysis steps from quality assessment, read assembly, dereplication, chimera detection, split-sample merging, sequence representative assignment (OTUs or ASVs) to the taxonomic assignment of sequence representatives. The workflow is written using Snakemake, a workflow management engine for developing data analysis workflows. In addition, Conda is used for version control. Thus, Snakemake ensures reproducibility and Conda offers version control of the utilized programs. The encapsulation of rules and their dependencies support hassle-free sharing of rules between workflows and easy adaptation and extension of existing workflows. Natrix is freely available on GitHub (https://github.com/MW55/Natrix) or as a Docker container on DockerHub (https://hub.docker.com/r/mw55/natrix). CONCLUSION: Natrix is a user-friendly and highly extensible workflow for processing Illumina amplicon data. |
format | Online Article Text |
id | pubmed-7667751 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-76677512020-11-17 Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads Welzel, Marius Lange, Anja Heider, Dominik Schwarz, Michael Freisleben, Bernd Jensen, Manfred Boenigk, Jens Beisser, Daniela BMC Bioinformatics Software BACKGROUND: Sequencing of marker genes amplified from environmental samples, known as amplicon sequencing, allows us to resolve some of the hidden diversity and elucidate evolutionary relationships and ecological processes among complex microbial communities. The analysis of large numbers of samples at high sequencing depths generated by high throughput sequencing technologies requires efficient, flexible, and reproducible bioinformatics pipelines. Only a few existing workflows can be run in a user-friendly, scalable, and reproducible manner on different computing devices using an efficient workflow management system. RESULTS: We present Natrix, an open-source bioinformatics workflow for preprocessing raw amplicon sequencing data. The workflow contains all analysis steps from quality assessment, read assembly, dereplication, chimera detection, split-sample merging, sequence representative assignment (OTUs or ASVs) to the taxonomic assignment of sequence representatives. The workflow is written using Snakemake, a workflow management engine for developing data analysis workflows. In addition, Conda is used for version control. Thus, Snakemake ensures reproducibility and Conda offers version control of the utilized programs. The encapsulation of rules and their dependencies support hassle-free sharing of rules between workflows and easy adaptation and extension of existing workflows. Natrix is freely available on GitHub (https://github.com/MW55/Natrix) or as a Docker container on DockerHub (https://hub.docker.com/r/mw55/natrix). CONCLUSION: Natrix is a user-friendly and highly extensible workflow for processing Illumina amplicon data. BioMed Central 2020-11-16 /pmc/articles/PMC7667751/ /pubmed/33198651 http://dx.doi.org/10.1186/s12859-020-03852-4 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Welzel, Marius Lange, Anja Heider, Dominik Schwarz, Michael Freisleben, Bernd Jensen, Manfred Boenigk, Jens Beisser, Daniela Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads |
title | Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads |
title_full | Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads |
title_fullStr | Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads |
title_full_unstemmed | Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads |
title_short | Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads |
title_sort | natrix: a snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7667751/ https://www.ncbi.nlm.nih.gov/pubmed/33198651 http://dx.doi.org/10.1186/s12859-020-03852-4 |
work_keys_str_mv | AT welzelmarius natrixasnakemakebasedworkflowforprocessingclusteringandtaxonomicallyassigningampliconsequencingreads AT langeanja natrixasnakemakebasedworkflowforprocessingclusteringandtaxonomicallyassigningampliconsequencingreads AT heiderdominik natrixasnakemakebasedworkflowforprocessingclusteringandtaxonomicallyassigningampliconsequencingreads AT schwarzmichael natrixasnakemakebasedworkflowforprocessingclusteringandtaxonomicallyassigningampliconsequencingreads AT freislebenbernd natrixasnakemakebasedworkflowforprocessingclusteringandtaxonomicallyassigningampliconsequencingreads AT jensenmanfred natrixasnakemakebasedworkflowforprocessingclusteringandtaxonomicallyassigningampliconsequencingreads AT boenigkjens natrixasnakemakebasedworkflowforprocessingclusteringandtaxonomicallyassigningampliconsequencingreads AT beisserdaniela natrixasnakemakebasedworkflowforprocessingclusteringandtaxonomicallyassigningampliconsequencingreads |