Cargando…

Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes

Sequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomi...

Descripción completa

Detalles Bibliográficos
Autores principales: Petit, Robert A., Read, Timothy D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7406220/
https://www.ncbi.nlm.nih.gov/pubmed/32753501
http://dx.doi.org/10.1128/mSystems.00190-20
_version_ 1783567390935613440
author Petit, Robert A.
Read, Timothy D.
author_facet Petit, Robert A.
Read, Timothy D.
author_sort Petit, Robert A.
collection PubMed
description Sequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a data set setup step (Bactopia Data Sets [BaDs]), which creates a series of customizable data sets for the species of interest, the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly, and several other functions based on the available data sets and outputs the processed data to a structured directory format, and a series of Bactopia Tools (BaTs) that perform specific postprocessing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes, and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on Lactobacillus crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to ones including thousands of genomes and that allows for great flexibility in choosing comparison data sets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia. IMPORTANCE It is now relatively easy to obtain a high-quality draft genome sequence of a bacterium, but bioinformatic analysis requires organization and optimization of multiple open source software tools. We present Bactopia, a pipeline for bacterial genome analysis, as an option for processing bacterial genome data. Bactopia also automates downloading of data from multiple public sources and species-specific customization. Because the pipeline is written in the Nextflow language, analyses can be scaled from individual genomes on a local computer to thousands of genomes using cloud resources. As a usage example, we processed 1,664 Lactobacillus genomes from public sources and used comparative analysis workflows (Bactopia Tools) to identify and analyze members of the L. crispatus species.
format Online
Article
Text
id pubmed-7406220
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-74062202020-08-11 Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes Petit, Robert A. Read, Timothy D. mSystems Research Article Sequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a data set setup step (Bactopia Data Sets [BaDs]), which creates a series of customizable data sets for the species of interest, the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly, and several other functions based on the available data sets and outputs the processed data to a structured directory format, and a series of Bactopia Tools (BaTs) that perform specific postprocessing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes, and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on Lactobacillus crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to ones including thousands of genomes and that allows for great flexibility in choosing comparison data sets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia. IMPORTANCE It is now relatively easy to obtain a high-quality draft genome sequence of a bacterium, but bioinformatic analysis requires organization and optimization of multiple open source software tools. We present Bactopia, a pipeline for bacterial genome analysis, as an option for processing bacterial genome data. Bactopia also automates downloading of data from multiple public sources and species-specific customization. Because the pipeline is written in the Nextflow language, analyses can be scaled from individual genomes on a local computer to thousands of genomes using cloud resources. As a usage example, we processed 1,664 Lactobacillus genomes from public sources and used comparative analysis workflows (Bactopia Tools) to identify and analyze members of the L. crispatus species. American Society for Microbiology 2020-08-04 /pmc/articles/PMC7406220/ /pubmed/32753501 http://dx.doi.org/10.1128/mSystems.00190-20 Text en Copyright © 2020 Petit and Read. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Petit, Robert A.
Read, Timothy D.
Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes
title Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes
title_full Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes
title_fullStr Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes
title_full_unstemmed Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes
title_short Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes
title_sort bactopia: a flexible pipeline for complete analysis of bacterial genomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7406220/
https://www.ncbi.nlm.nih.gov/pubmed/32753501
http://dx.doi.org/10.1128/mSystems.00190-20
work_keys_str_mv AT petitroberta bactopiaaflexiblepipelineforcompleteanalysisofbacterialgenomes
AT readtimothyd bactopiaaflexiblepipelineforcompleteanalysisofbacterialgenomes