Cargando…

Systematic processing of ribosomal RNA gene amplicon sequencing data

BACKGROUND: With the advent of high-throughput sequencing, microbiology is becoming increasingly data-intensive. Because of its low cost, robust databases, and established bioinformatic workflows, sequencing of 16S/18S/ITS ribosomal RNA (rRNA) gene amplicons, which provides a marker of choice for ph...

Descripción completa

Detalles Bibliográficos
Autores principales: Tremblay, Julien, Yergeau, Etienne
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6901069/
https://www.ncbi.nlm.nih.gov/pubmed/31816087
http://dx.doi.org/10.1093/gigascience/giz146
_version_ 1783477448437923840
author Tremblay, Julien
Yergeau, Etienne
author_facet Tremblay, Julien
Yergeau, Etienne
author_sort Tremblay, Julien
collection PubMed
description BACKGROUND: With the advent of high-throughput sequencing, microbiology is becoming increasingly data-intensive. Because of its low cost, robust databases, and established bioinformatic workflows, sequencing of 16S/18S/ITS ribosomal RNA (rRNA) gene amplicons, which provides a marker of choice for phylogenetic studies, has become ubiquitous. Many established end-to-end bioinformatic pipelines are available to perform short amplicon sequence data analysis. These pipelines suit a general audience, but few options exist for more specialized users who are experienced in code scripting, Linux-based systems, and high-performance computing (HPC) environments. For such an audience, existing pipelines can be limiting to fully leverage modern HPC capabilities and perform tweaking and optimization operations. Moreover, a wealth of stand-alone software packages that perform specific targeted bioinformatic tasks are increasingly accessible, and finding a way to easily integrate these applications in a pipeline is critical to the evolution of bioinformatic methodologies. RESULTS: Here we describe AmpliconTagger, a short rRNA marker gene amplicon pipeline coded in a Python framework that enables fine tuning and integration of virtually any potential rRNA gene amplicon bioinformatic procedure. It is designed to work within an HPC environment, supporting a complex network of job dependencies with a smart-restart mechanism in case of job failure or parameter modifications. As proof of concept, we present end results obtained with AmpliconTagger using 16S, 18S, ITS rRNA short gene amplicons and Pacific Biosciences long-read amplicon data types as input. CONCLUSIONS: Using a selection of published algorithms for generating operational taxonomic units and amplicon sequence variants and for computing downstream taxonomic summaries and diversity metrics, we demonstrate the performance and versatility of our pipeline for systematic analyses of amplicon sequence data.
format Online
Article
Text
id pubmed-6901069
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-69010692019-12-16 Systematic processing of ribosomal RNA gene amplicon sequencing data Tremblay, Julien Yergeau, Etienne Gigascience Research BACKGROUND: With the advent of high-throughput sequencing, microbiology is becoming increasingly data-intensive. Because of its low cost, robust databases, and established bioinformatic workflows, sequencing of 16S/18S/ITS ribosomal RNA (rRNA) gene amplicons, which provides a marker of choice for phylogenetic studies, has become ubiquitous. Many established end-to-end bioinformatic pipelines are available to perform short amplicon sequence data analysis. These pipelines suit a general audience, but few options exist for more specialized users who are experienced in code scripting, Linux-based systems, and high-performance computing (HPC) environments. For such an audience, existing pipelines can be limiting to fully leverage modern HPC capabilities and perform tweaking and optimization operations. Moreover, a wealth of stand-alone software packages that perform specific targeted bioinformatic tasks are increasingly accessible, and finding a way to easily integrate these applications in a pipeline is critical to the evolution of bioinformatic methodologies. RESULTS: Here we describe AmpliconTagger, a short rRNA marker gene amplicon pipeline coded in a Python framework that enables fine tuning and integration of virtually any potential rRNA gene amplicon bioinformatic procedure. It is designed to work within an HPC environment, supporting a complex network of job dependencies with a smart-restart mechanism in case of job failure or parameter modifications. As proof of concept, we present end results obtained with AmpliconTagger using 16S, 18S, ITS rRNA short gene amplicons and Pacific Biosciences long-read amplicon data types as input. CONCLUSIONS: Using a selection of published algorithms for generating operational taxonomic units and amplicon sequence variants and for computing downstream taxonomic summaries and diversity metrics, we demonstrate the performance and versatility of our pipeline for systematic analyses of amplicon sequence data. Oxford University Press 2019-12-09 /pmc/articles/PMC6901069/ /pubmed/31816087 http://dx.doi.org/10.1093/gigascience/giz146 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Tremblay, Julien
Yergeau, Etienne
Systematic processing of ribosomal RNA gene amplicon sequencing data
title Systematic processing of ribosomal RNA gene amplicon sequencing data
title_full Systematic processing of ribosomal RNA gene amplicon sequencing data
title_fullStr Systematic processing of ribosomal RNA gene amplicon sequencing data
title_full_unstemmed Systematic processing of ribosomal RNA gene amplicon sequencing data
title_short Systematic processing of ribosomal RNA gene amplicon sequencing data
title_sort systematic processing of ribosomal rna gene amplicon sequencing data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6901069/
https://www.ncbi.nlm.nih.gov/pubmed/31816087
http://dx.doi.org/10.1093/gigascience/giz146
work_keys_str_mv AT tremblayjulien systematicprocessingofribosomalrnageneampliconsequencingdata
AT yergeauetienne systematicprocessingofribosomalrnageneampliconsequencingdata