Cargando…
Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes
Sequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomi...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Society for Microbiology
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7406220/ https://www.ncbi.nlm.nih.gov/pubmed/32753501 http://dx.doi.org/10.1128/mSystems.00190-20 |
_version_ | 1783567390935613440 |
---|---|
author | Petit, Robert A. Read, Timothy D. |
author_facet | Petit, Robert A. Read, Timothy D. |
author_sort | Petit, Robert A. |
collection | PubMed |
description | Sequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a data set setup step (Bactopia Data Sets [BaDs]), which creates a series of customizable data sets for the species of interest, the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly, and several other functions based on the available data sets and outputs the processed data to a structured directory format, and a series of Bactopia Tools (BaTs) that perform specific postprocessing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes, and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on Lactobacillus crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to ones including thousands of genomes and that allows for great flexibility in choosing comparison data sets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia. IMPORTANCE It is now relatively easy to obtain a high-quality draft genome sequence of a bacterium, but bioinformatic analysis requires organization and optimization of multiple open source software tools. We present Bactopia, a pipeline for bacterial genome analysis, as an option for processing bacterial genome data. Bactopia also automates downloading of data from multiple public sources and species-specific customization. Because the pipeline is written in the Nextflow language, analyses can be scaled from individual genomes on a local computer to thousands of genomes using cloud resources. As a usage example, we processed 1,664 Lactobacillus genomes from public sources and used comparative analysis workflows (Bactopia Tools) to identify and analyze members of the L. crispatus species. |
format | Online Article Text |
id | pubmed-7406220 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | American Society for Microbiology |
record_format | MEDLINE/PubMed |
spelling | pubmed-74062202020-08-11 Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes Petit, Robert A. Read, Timothy D. mSystems Research Article Sequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a data set setup step (Bactopia Data Sets [BaDs]), which creates a series of customizable data sets for the species of interest, the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly, and several other functions based on the available data sets and outputs the processed data to a structured directory format, and a series of Bactopia Tools (BaTs) that perform specific postprocessing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes, and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on Lactobacillus crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to ones including thousands of genomes and that allows for great flexibility in choosing comparison data sets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia. IMPORTANCE It is now relatively easy to obtain a high-quality draft genome sequence of a bacterium, but bioinformatic analysis requires organization and optimization of multiple open source software tools. We present Bactopia, a pipeline for bacterial genome analysis, as an option for processing bacterial genome data. Bactopia also automates downloading of data from multiple public sources and species-specific customization. Because the pipeline is written in the Nextflow language, analyses can be scaled from individual genomes on a local computer to thousands of genomes using cloud resources. As a usage example, we processed 1,664 Lactobacillus genomes from public sources and used comparative analysis workflows (Bactopia Tools) to identify and analyze members of the L. crispatus species. American Society for Microbiology 2020-08-04 /pmc/articles/PMC7406220/ /pubmed/32753501 http://dx.doi.org/10.1128/mSystems.00190-20 Text en Copyright © 2020 Petit and Read. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Research Article Petit, Robert A. Read, Timothy D. Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes |
title | Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes |
title_full | Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes |
title_fullStr | Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes |
title_full_unstemmed | Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes |
title_short | Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes |
title_sort | bactopia: a flexible pipeline for complete analysis of bacterial genomes |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7406220/ https://www.ncbi.nlm.nih.gov/pubmed/32753501 http://dx.doi.org/10.1128/mSystems.00190-20 |
work_keys_str_mv | AT petitroberta bactopiaaflexiblepipelineforcompleteanalysisofbacterialgenomes AT readtimothyd bactopiaaflexiblepipelineforcompleteanalysisofbacterialgenomes |