Cargando…

Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification

Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines heavily depend on taxon-specific databases or sufficiently wel...

Descripción completa

Detalles Bibliográficos
Autores principales: Schwengers, Oliver, Jelonek, Lukas, Dieckmann, Marius Alfred, Beyvers, Sebastian, Blom, Jochen, Goesmann, Alexander
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Microbiology Society 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8743544/
https://www.ncbi.nlm.nih.gov/pubmed/34739369
http://dx.doi.org/10.1099/mgen.0.000685
_version_ 1784629925752340480
author Schwengers, Oliver
Jelonek, Lukas
Dieckmann, Marius Alfred
Beyvers, Sebastian
Blom, Jochen
Goesmann, Alexander
author_facet Schwengers, Oliver
Jelonek, Lukas
Dieckmann, Marius Alfred
Beyvers, Sebastian
Blom, Jochen
Goesmann, Alexander
author_sort Schwengers, Oliver
collection PubMed
description Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines heavily depend on taxon-specific databases or sufficiently well annotated reference genomes. Here, we introduce Bakta, a new command-line software tool for the robust, taxon-independent, thorough and, nonetheless, fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross-references. Annotation results are exported in GFF3 and International Nucleotide Sequence Database Collaboration (INSDC)-compliant flat files, as well as comprehensive JSON files, facilitating automated downstream analysis. We compared Bakta to other rapid contemporary command-line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references, whilst providing comparable wall-clock runtimes. Bakta is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a GPLv3 license at https://github.com/oschwengers/bakta. An accompanying web version is available at https://bakta.computational.bio.
format Online
Article
Text
id pubmed-8743544
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Microbiology Society
record_format MEDLINE/PubMed
spelling pubmed-87435442022-01-10 Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification Schwengers, Oliver Jelonek, Lukas Dieckmann, Marius Alfred Beyvers, Sebastian Blom, Jochen Goesmann, Alexander Microb Genom Research Articles Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines heavily depend on taxon-specific databases or sufficiently well annotated reference genomes. Here, we introduce Bakta, a new command-line software tool for the robust, taxon-independent, thorough and, nonetheless, fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross-references. Annotation results are exported in GFF3 and International Nucleotide Sequence Database Collaboration (INSDC)-compliant flat files, as well as comprehensive JSON files, facilitating automated downstream analysis. We compared Bakta to other rapid contemporary command-line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references, whilst providing comparable wall-clock runtimes. Bakta is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a GPLv3 license at https://github.com/oschwengers/bakta. An accompanying web version is available at https://bakta.computational.bio. Microbiology Society 2021-11-05 /pmc/articles/PMC8743544/ /pubmed/34739369 http://dx.doi.org/10.1099/mgen.0.000685 Text en © 2021 The Authors https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License.
spellingShingle Research Articles
Schwengers, Oliver
Jelonek, Lukas
Dieckmann, Marius Alfred
Beyvers, Sebastian
Blom, Jochen
Goesmann, Alexander
Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification
title Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification
title_full Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification
title_fullStr Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification
title_full_unstemmed Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification
title_short Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification
title_sort bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8743544/
https://www.ncbi.nlm.nih.gov/pubmed/34739369
http://dx.doi.org/10.1099/mgen.0.000685
work_keys_str_mv AT schwengersoliver baktarapidandstandardizedannotationofbacterialgenomesviaalignmentfreesequenceidentification
AT jeloneklukas baktarapidandstandardizedannotationofbacterialgenomesviaalignmentfreesequenceidentification
AT dieckmannmariusalfred baktarapidandstandardizedannotationofbacterialgenomesviaalignmentfreesequenceidentification
AT beyverssebastian baktarapidandstandardizedannotationofbacterialgenomesviaalignmentfreesequenceidentification
AT blomjochen baktarapidandstandardizedannotationofbacterialgenomesviaalignmentfreesequenceidentification
AT goesmannalexander baktarapidandstandardizedannotationofbacterialgenomesviaalignmentfreesequenceidentification