Cargando…

Cascabel: A Scalable and Versatile Amplicon Sequence Data Analysis Pipeline Delivering Reproducible and Documented Results

Marker gene sequencing of the rRNA operon (16S, 18S, ITS) or cytochrome c oxidase I (CO1) is a popular means to assess microbial communities of the environment, microbiomes associated with plants and animals, as well as communities of multicellular organisms via environmental DNA sequencing. Since t...

Descripción completa

Detalles Bibliográficos
Autores principales: Abdala Asbun, Alejandro, Besseling, Marc A., Balzano, Sergio, van Bleijswijk, Judith D. L., Witte, Harry J., Villanueva, Laura, Engelmann, Julia C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7718033/
https://www.ncbi.nlm.nih.gov/pubmed/33329686
http://dx.doi.org/10.3389/fgene.2020.489357
_version_ 1783619427076407296
author Abdala Asbun, Alejandro
Besseling, Marc A.
Balzano, Sergio
van Bleijswijk, Judith D. L.
Witte, Harry J.
Villanueva, Laura
Engelmann, Julia C.
author_facet Abdala Asbun, Alejandro
Besseling, Marc A.
Balzano, Sergio
van Bleijswijk, Judith D. L.
Witte, Harry J.
Villanueva, Laura
Engelmann, Julia C.
author_sort Abdala Asbun, Alejandro
collection PubMed
description Marker gene sequencing of the rRNA operon (16S, 18S, ITS) or cytochrome c oxidase I (CO1) is a popular means to assess microbial communities of the environment, microbiomes associated with plants and animals, as well as communities of multicellular organisms via environmental DNA sequencing. Since this technique is based on sequencing a single gene, or even only parts of a single gene rather than the entire genome, the number of reads needed per sample to assess the microbial community structure is lower than that required for metagenome sequencing. This makes marker gene sequencing affordable to nearly any laboratory. Despite the relative ease and cost-efficiency of data generation, analyzing the resulting sequence data requires computational skills that may go beyond the standard repertoire of a current molecular biologist/ecologist. We have developed Cascabel, a scalable, flexible, and easy-to-use amplicon sequence data analysis pipeline, which uses Snakemake and a combination of existing and newly developed solutions for its computational steps. Cascabel takes the raw data as input and delivers a table of operational taxonomic units (OTUs) or Amplicon Sequence Variants (ASVs) in BIOM and text format and representative sequences. Cascabel is a highly versatile software that allows users to customize several steps of the pipeline, such as selecting from a set of OTU clustering methods or performing ASV analysis. In addition, we designed Cascabel to run in any linux/unix computing environment from desktop computers to computing servers making use of parallel processing if possible. The analyses and results are fully reproducible and documented in an HTML and optional pdf report. Cascabel is freely available at Github: https://github.com/AlejandroAb/CASCABEL.
format Online
Article
Text
id pubmed-7718033
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-77180332020-12-15 Cascabel: A Scalable and Versatile Amplicon Sequence Data Analysis Pipeline Delivering Reproducible and Documented Results Abdala Asbun, Alejandro Besseling, Marc A. Balzano, Sergio van Bleijswijk, Judith D. L. Witte, Harry J. Villanueva, Laura Engelmann, Julia C. Front Genet Genetics Marker gene sequencing of the rRNA operon (16S, 18S, ITS) or cytochrome c oxidase I (CO1) is a popular means to assess microbial communities of the environment, microbiomes associated with plants and animals, as well as communities of multicellular organisms via environmental DNA sequencing. Since this technique is based on sequencing a single gene, or even only parts of a single gene rather than the entire genome, the number of reads needed per sample to assess the microbial community structure is lower than that required for metagenome sequencing. This makes marker gene sequencing affordable to nearly any laboratory. Despite the relative ease and cost-efficiency of data generation, analyzing the resulting sequence data requires computational skills that may go beyond the standard repertoire of a current molecular biologist/ecologist. We have developed Cascabel, a scalable, flexible, and easy-to-use amplicon sequence data analysis pipeline, which uses Snakemake and a combination of existing and newly developed solutions for its computational steps. Cascabel takes the raw data as input and delivers a table of operational taxonomic units (OTUs) or Amplicon Sequence Variants (ASVs) in BIOM and text format and representative sequences. Cascabel is a highly versatile software that allows users to customize several steps of the pipeline, such as selecting from a set of OTU clustering methods or performing ASV analysis. In addition, we designed Cascabel to run in any linux/unix computing environment from desktop computers to computing servers making use of parallel processing if possible. The analyses and results are fully reproducible and documented in an HTML and optional pdf report. Cascabel is freely available at Github: https://github.com/AlejandroAb/CASCABEL. Frontiers Media S.A. 2020-11-20 /pmc/articles/PMC7718033/ /pubmed/33329686 http://dx.doi.org/10.3389/fgene.2020.489357 Text en Copyright © 2020 Abdala Asbun, Besseling, Balzano, van Bleijswijk, Witte, Villanueva and Engelmann. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Abdala Asbun, Alejandro
Besseling, Marc A.
Balzano, Sergio
van Bleijswijk, Judith D. L.
Witte, Harry J.
Villanueva, Laura
Engelmann, Julia C.
Cascabel: A Scalable and Versatile Amplicon Sequence Data Analysis Pipeline Delivering Reproducible and Documented Results
title Cascabel: A Scalable and Versatile Amplicon Sequence Data Analysis Pipeline Delivering Reproducible and Documented Results
title_full Cascabel: A Scalable and Versatile Amplicon Sequence Data Analysis Pipeline Delivering Reproducible and Documented Results
title_fullStr Cascabel: A Scalable and Versatile Amplicon Sequence Data Analysis Pipeline Delivering Reproducible and Documented Results
title_full_unstemmed Cascabel: A Scalable and Versatile Amplicon Sequence Data Analysis Pipeline Delivering Reproducible and Documented Results
title_short Cascabel: A Scalable and Versatile Amplicon Sequence Data Analysis Pipeline Delivering Reproducible and Documented Results
title_sort cascabel: a scalable and versatile amplicon sequence data analysis pipeline delivering reproducible and documented results
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7718033/
https://www.ncbi.nlm.nih.gov/pubmed/33329686
http://dx.doi.org/10.3389/fgene.2020.489357
work_keys_str_mv AT abdalaasbunalejandro cascabelascalableandversatileampliconsequencedataanalysispipelinedeliveringreproducibleanddocumentedresults
AT besselingmarca cascabelascalableandversatileampliconsequencedataanalysispipelinedeliveringreproducibleanddocumentedresults
AT balzanosergio cascabelascalableandversatileampliconsequencedataanalysispipelinedeliveringreproducibleanddocumentedresults
AT vanbleijswijkjudithdl cascabelascalableandversatileampliconsequencedataanalysispipelinedeliveringreproducibleanddocumentedresults
AT witteharryj cascabelascalableandversatileampliconsequencedataanalysispipelinedeliveringreproducibleanddocumentedresults
AT villanuevalaura cascabelascalableandversatileampliconsequencedataanalysispipelinedeliveringreproducibleanddocumentedresults
AT engelmannjuliac cascabelascalableandversatileampliconsequencedataanalysispipelinedeliveringreproducibleanddocumentedresults