Cargando…
Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification
Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines heavily depend on taxon-specific databases or sufficiently wel...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Microbiology Society
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8743544/ https://www.ncbi.nlm.nih.gov/pubmed/34739369 http://dx.doi.org/10.1099/mgen.0.000685 |
_version_ | 1784629925752340480 |
---|---|
author | Schwengers, Oliver Jelonek, Lukas Dieckmann, Marius Alfred Beyvers, Sebastian Blom, Jochen Goesmann, Alexander |
author_facet | Schwengers, Oliver Jelonek, Lukas Dieckmann, Marius Alfred Beyvers, Sebastian Blom, Jochen Goesmann, Alexander |
author_sort | Schwengers, Oliver |
collection | PubMed |
description | Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines heavily depend on taxon-specific databases or sufficiently well annotated reference genomes. Here, we introduce Bakta, a new command-line software tool for the robust, taxon-independent, thorough and, nonetheless, fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross-references. Annotation results are exported in GFF3 and International Nucleotide Sequence Database Collaboration (INSDC)-compliant flat files, as well as comprehensive JSON files, facilitating automated downstream analysis. We compared Bakta to other rapid contemporary command-line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references, whilst providing comparable wall-clock runtimes. Bakta is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a GPLv3 license at https://github.com/oschwengers/bakta. An accompanying web version is available at https://bakta.computational.bio. |
format | Online Article Text |
id | pubmed-8743544 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Microbiology Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-87435442022-01-10 Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification Schwengers, Oliver Jelonek, Lukas Dieckmann, Marius Alfred Beyvers, Sebastian Blom, Jochen Goesmann, Alexander Microb Genom Research Articles Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines heavily depend on taxon-specific databases or sufficiently well annotated reference genomes. Here, we introduce Bakta, a new command-line software tool for the robust, taxon-independent, thorough and, nonetheless, fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross-references. Annotation results are exported in GFF3 and International Nucleotide Sequence Database Collaboration (INSDC)-compliant flat files, as well as comprehensive JSON files, facilitating automated downstream analysis. We compared Bakta to other rapid contemporary command-line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references, whilst providing comparable wall-clock runtimes. Bakta is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a GPLv3 license at https://github.com/oschwengers/bakta. An accompanying web version is available at https://bakta.computational.bio. Microbiology Society 2021-11-05 /pmc/articles/PMC8743544/ /pubmed/34739369 http://dx.doi.org/10.1099/mgen.0.000685 Text en © 2021 The Authors https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License. |
spellingShingle | Research Articles Schwengers, Oliver Jelonek, Lukas Dieckmann, Marius Alfred Beyvers, Sebastian Blom, Jochen Goesmann, Alexander Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification |
title | Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification |
title_full | Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification |
title_fullStr | Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification |
title_full_unstemmed | Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification |
title_short | Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification |
title_sort | bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification |
topic | Research Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8743544/ https://www.ncbi.nlm.nih.gov/pubmed/34739369 http://dx.doi.org/10.1099/mgen.0.000685 |
work_keys_str_mv | AT schwengersoliver baktarapidandstandardizedannotationofbacterialgenomesviaalignmentfreesequenceidentification AT jeloneklukas baktarapidandstandardizedannotationofbacterialgenomesviaalignmentfreesequenceidentification AT dieckmannmariusalfred baktarapidandstandardizedannotationofbacterialgenomesviaalignmentfreesequenceidentification AT beyverssebastian baktarapidandstandardizedannotationofbacterialgenomesviaalignmentfreesequenceidentification AT blomjochen baktarapidandstandardizedannotationofbacterialgenomesviaalignmentfreesequenceidentification AT goesmannalexander baktarapidandstandardizedannotationofbacterialgenomesviaalignmentfreesequenceidentification |