Cargando…

GALBA: Genome Annotation with Miniprot and AUGUSTUS

The Earth Biogenome Project has rapidly increased the number of available eukaryotic genomes, but most released genomes continue to lack annotation of protein-coding genes. In addition, no transcriptome data is available for some genomes. Various gene annotation tools have been developed but each ha...

Descripción completa

Detalles Bibliográficos
Autores principales: Brůna, Tomáš, Li, Heng, Guhlin, Joseph, Honsel, Daniel, Herbold, Steffen, Stanke, Mario, Nenasheva, Natalia, Ebel, Matthis, Gabriel, Lars, Hoff, Katharina J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10120627/
https://www.ncbi.nlm.nih.gov/pubmed/37090650
http://dx.doi.org/10.1101/2023.04.10.536199
_version_ 1785029215344656384
author Brůna, Tomáš
Li, Heng
Guhlin, Joseph
Honsel, Daniel
Herbold, Steffen
Stanke, Mario
Nenasheva, Natalia
Ebel, Matthis
Gabriel, Lars
Hoff, Katharina J.
author_facet Brůna, Tomáš
Li, Heng
Guhlin, Joseph
Honsel, Daniel
Herbold, Steffen
Stanke, Mario
Nenasheva, Natalia
Ebel, Matthis
Gabriel, Lars
Hoff, Katharina J.
author_sort Brůna, Tomáš
collection PubMed
description The Earth Biogenome Project has rapidly increased the number of available eukaryotic genomes, but most released genomes continue to lack annotation of protein-coding genes. In addition, no transcriptome data is available for some genomes. Various gene annotation tools have been developed but each has its limitations. Here, we introduce GALBA, a fully automated pipeline that utilizes miniprot, a rapid protein-to-genome aligner, in combination with AUGUSTUS to predict genes with high accuracy. Accuracy results indicate that GALBA is particularly strong in the annotation of large vertebrate genomes. We also present use cases in insects, vertebrates, and a previously unannotated land plant. GALBA is fully open source and available as a docker image for easy execution with Singularity in high-performance computing environments. Our pipeline addresses the critical need for accurate gene annotation in newly sequenced genomes, and we believe that GALBA will greatly facilitate genome annotation for diverse organisms.
format Online
Article
Text
id pubmed-10120627
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-101206272023-04-22 GALBA: Genome Annotation with Miniprot and AUGUSTUS Brůna, Tomáš Li, Heng Guhlin, Joseph Honsel, Daniel Herbold, Steffen Stanke, Mario Nenasheva, Natalia Ebel, Matthis Gabriel, Lars Hoff, Katharina J. bioRxiv Article The Earth Biogenome Project has rapidly increased the number of available eukaryotic genomes, but most released genomes continue to lack annotation of protein-coding genes. In addition, no transcriptome data is available for some genomes. Various gene annotation tools have been developed but each has its limitations. Here, we introduce GALBA, a fully automated pipeline that utilizes miniprot, a rapid protein-to-genome aligner, in combination with AUGUSTUS to predict genes with high accuracy. Accuracy results indicate that GALBA is particularly strong in the annotation of large vertebrate genomes. We also present use cases in insects, vertebrates, and a previously unannotated land plant. GALBA is fully open source and available as a docker image for easy execution with Singularity in high-performance computing environments. Our pipeline addresses the critical need for accurate gene annotation in newly sequenced genomes, and we believe that GALBA will greatly facilitate genome annotation for diverse organisms. Cold Spring Harbor Laboratory 2023-04-10 /pmc/articles/PMC10120627/ /pubmed/37090650 http://dx.doi.org/10.1101/2023.04.10.536199 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Brůna, Tomáš
Li, Heng
Guhlin, Joseph
Honsel, Daniel
Herbold, Steffen
Stanke, Mario
Nenasheva, Natalia
Ebel, Matthis
Gabriel, Lars
Hoff, Katharina J.
GALBA: Genome Annotation with Miniprot and AUGUSTUS
title GALBA: Genome Annotation with Miniprot and AUGUSTUS
title_full GALBA: Genome Annotation with Miniprot and AUGUSTUS
title_fullStr GALBA: Genome Annotation with Miniprot and AUGUSTUS
title_full_unstemmed GALBA: Genome Annotation with Miniprot and AUGUSTUS
title_short GALBA: Genome Annotation with Miniprot and AUGUSTUS
title_sort galba: genome annotation with miniprot and augustus
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10120627/
https://www.ncbi.nlm.nih.gov/pubmed/37090650
http://dx.doi.org/10.1101/2023.04.10.536199
work_keys_str_mv AT brunatomas galbagenomeannotationwithminiprotandaugustus
AT liheng galbagenomeannotationwithminiprotandaugustus
AT guhlinjoseph galbagenomeannotationwithminiprotandaugustus
AT honseldaniel galbagenomeannotationwithminiprotandaugustus
AT herboldsteffen galbagenomeannotationwithminiprotandaugustus
AT stankemario galbagenomeannotationwithminiprotandaugustus
AT nenashevanatalia galbagenomeannotationwithminiprotandaugustus
AT ebelmatthis galbagenomeannotationwithminiprotandaugustus
AT gabriellars galbagenomeannotationwithminiprotandaugustus
AT hoffkatharinaj galbagenomeannotationwithminiprotandaugustus