Cargando…

Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake

BACKGROUND: Amplicon sequencing (metabarcoding) is a common method to survey diversity of environmental communities whereby a single genetic locus is amplified and sequenced from the DNA of whole or partial organisms, organismal traces (e.g., skin, mucus, feces), or microbes in an environmental samp...

Descripción completa

Detalles Bibliográficos
Autores principales: Thompson, Luke R, Anderson, Sean R, Den Uyl, Paul A, Patin, Nastassia V, Lim, Shen Jean, Sanderson, Grant, Goodwin, Kelly D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9334028/
https://www.ncbi.nlm.nih.gov/pubmed/35902092
http://dx.doi.org/10.1093/gigascience/giac066
_version_ 1784759008705380352
author Thompson, Luke R
Anderson, Sean R
Den Uyl, Paul A
Patin, Nastassia V
Lim, Shen Jean
Sanderson, Grant
Goodwin, Kelly D
author_facet Thompson, Luke R
Anderson, Sean R
Den Uyl, Paul A
Patin, Nastassia V
Lim, Shen Jean
Sanderson, Grant
Goodwin, Kelly D
author_sort Thompson, Luke R
collection PubMed
description BACKGROUND: Amplicon sequencing (metabarcoding) is a common method to survey diversity of environmental communities whereby a single genetic locus is amplified and sequenced from the DNA of whole or partial organisms, organismal traces (e.g., skin, mucus, feces), or microbes in an environmental sample. Several software packages exist for analyzing amplicon data, among which QIIME 2 has emerged as a popular option because of its broad functionality, plugin architecture, provenance tracking, and interactive visualizations. However, each new analysis requires the user to keep track of input and output file names, parameters, and commands; this lack of automation and standardization is inefficient and creates barriers to meta-analysis and sharing of results. FINDINGS: We developed Tourmaline, a Python-based workflow that implements QIIME 2 and is built using the Snakemake workflow management system. Starting from a configuration file that defines parameters and input files—a reference database, a sample metadata file, and a manifest or archive of FASTQ sequences—it uses QIIME 2 to run either the DADA2 or Deblur denoising algorithm; assigns taxonomy to the resulting representative sequences; performs analyses of taxonomic, alpha, and beta diversity; and generates an HTML report summarizing and linking to the output files. Features include support for multiple cores, automatic determination of trimming parameters using quality scores, representative sequence filtering (taxonomy, length, abundance, prevalence, or ID), support for multiple taxonomic classification and sequence alignment methods, outlier detection, and automated initialization of a new analysis using previous settings. The workflow runs natively on Linux and macOS or via a Docker container. We ran Tourmaline on a 16S ribosomal RNA amplicon data set from Lake Erie surface water, showing its utility for parameter optimization and the ability to easily view interactive visualizations through the HTML report, QIIME 2 viewer, and R- and Python-based Jupyter notebooks. CONCLUSION: Automated workflows like Tourmaline enable rapid analysis of environmental amplicon data, decreasing the time from data generation to actionable results. Tourmaline is available for download at github.com/aomlomics/tourmaline.
format Online
Article
Text
id pubmed-9334028
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-93340282022-07-29 Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake Thompson, Luke R Anderson, Sean R Den Uyl, Paul A Patin, Nastassia V Lim, Shen Jean Sanderson, Grant Goodwin, Kelly D Gigascience Technical Note BACKGROUND: Amplicon sequencing (metabarcoding) is a common method to survey diversity of environmental communities whereby a single genetic locus is amplified and sequenced from the DNA of whole or partial organisms, organismal traces (e.g., skin, mucus, feces), or microbes in an environmental sample. Several software packages exist for analyzing amplicon data, among which QIIME 2 has emerged as a popular option because of its broad functionality, plugin architecture, provenance tracking, and interactive visualizations. However, each new analysis requires the user to keep track of input and output file names, parameters, and commands; this lack of automation and standardization is inefficient and creates barriers to meta-analysis and sharing of results. FINDINGS: We developed Tourmaline, a Python-based workflow that implements QIIME 2 and is built using the Snakemake workflow management system. Starting from a configuration file that defines parameters and input files—a reference database, a sample metadata file, and a manifest or archive of FASTQ sequences—it uses QIIME 2 to run either the DADA2 or Deblur denoising algorithm; assigns taxonomy to the resulting representative sequences; performs analyses of taxonomic, alpha, and beta diversity; and generates an HTML report summarizing and linking to the output files. Features include support for multiple cores, automatic determination of trimming parameters using quality scores, representative sequence filtering (taxonomy, length, abundance, prevalence, or ID), support for multiple taxonomic classification and sequence alignment methods, outlier detection, and automated initialization of a new analysis using previous settings. The workflow runs natively on Linux and macOS or via a Docker container. We ran Tourmaline on a 16S ribosomal RNA amplicon data set from Lake Erie surface water, showing its utility for parameter optimization and the ability to easily view interactive visualizations through the HTML report, QIIME 2 viewer, and R- and Python-based Jupyter notebooks. CONCLUSION: Automated workflows like Tourmaline enable rapid analysis of environmental amplicon data, decreasing the time from data generation to actionable results. Tourmaline is available for download at github.com/aomlomics/tourmaline. Oxford University Press 2022-07-28 /pmc/articles/PMC9334028/ /pubmed/35902092 http://dx.doi.org/10.1093/gigascience/giac066 Text en © The Author(s) 2022. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Thompson, Luke R
Anderson, Sean R
Den Uyl, Paul A
Patin, Nastassia V
Lim, Shen Jean
Sanderson, Grant
Goodwin, Kelly D
Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake
title Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake
title_full Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake
title_fullStr Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake
title_full_unstemmed Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake
title_short Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake
title_sort tourmaline: a containerized workflow for rapid and iterable amplicon sequence analysis using qiime 2 and snakemake
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9334028/
https://www.ncbi.nlm.nih.gov/pubmed/35902092
http://dx.doi.org/10.1093/gigascience/giac066
work_keys_str_mv AT thompsonluker tourmalineacontainerizedworkflowforrapidanditerableampliconsequenceanalysisusingqiime2andsnakemake
AT andersonseanr tourmalineacontainerizedworkflowforrapidanditerableampliconsequenceanalysisusingqiime2andsnakemake
AT denuylpaula tourmalineacontainerizedworkflowforrapidanditerableampliconsequenceanalysisusingqiime2andsnakemake
AT patinnastassiav tourmalineacontainerizedworkflowforrapidanditerableampliconsequenceanalysisusingqiime2andsnakemake
AT limshenjean tourmalineacontainerizedworkflowforrapidanditerableampliconsequenceanalysisusingqiime2andsnakemake
AT sandersongrant tourmalineacontainerizedworkflowforrapidanditerableampliconsequenceanalysisusingqiime2andsnakemake
AT goodwinkellyd tourmalineacontainerizedworkflowforrapidanditerableampliconsequenceanalysisusingqiime2andsnakemake