Cargando…

BRAKER3: Fully Automated Genome Annotation Using RNA-Seq and Protein Evidence with GeneMark-ETP, AUGUSTUS and TSEBRA

Gene prediction remains an active area of bioinformatics research. Challenges are presented by large eukaryotic genomes and heterogeneous data situations. To meet the challenges, several streams of evidence must be integrated, from protein homology and transcriptome data, as well as information deri...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gabriel, Lars, Brůna, Tomáš, Hoff, Katharina J., Ebel, Matthis, Lomsadze, Alexandre, Borodovsky, Mark, Stanke, Mario
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cold Spring Harbor Laboratory 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10312602/ https://www.ncbi.nlm.nih.gov/pubmed/37398387 http://dx.doi.org/10.1101/2023.06.10.544449

_version_	1785066956598018048
author	Gabriel, Lars Brůna, Tomáš Hoff, Katharina J. Ebel, Matthis Lomsadze, Alexandre Borodovsky, Mark Stanke, Mario
author_facet	Gabriel, Lars Brůna, Tomáš Hoff, Katharina J. Ebel, Matthis Lomsadze, Alexandre Borodovsky, Mark Stanke, Mario
author_sort	Gabriel, Lars
collection	PubMed
description	Gene prediction remains an active area of bioinformatics research. Challenges are presented by large eukaryotic genomes and heterogeneous data situations. To meet the challenges, several streams of evidence must be integrated, from protein homology and transcriptome data, as well as information derived from the genome itself. The amount and significance of the available evidence from transcriptomes and proteomes vary from genome to genome, between genes and even along a single gene. User-friendly and accurate annotation pipelines that can cope with such data heterogeneity are needed. The previously developed annotation pipelines BRAKER1 and BRAKER2 use RNA-Seq or protein data, respectively, but not both. The recently released GeneMark-ETP integrates all three types of data and achieves much higher levels of accuracy. We here present the BRAKER3 pipeline that builds on GeneMark-ETP and AUGUSTUS and further improves accuracy using the TSEBRA combiner. BRAKER3 annotates protein-coding genes in eukaryotic genomes using both short-read RNA-Seq and a large protein database along with statistical models learned iteratively and specifically for the target genome. We benchmarked the new pipeline on 11 species under controlled conditions on the assumed relatedness of the target species to available proteomes. BRAKER3 outperformed BRAKER1 and BRAKER2, increasing the average transcript-level F1-score by ~20 percentage points, most pronounced for species with large and complex genomes. BRAKER3 also outperforms MAKER2 and Funannotate. For the first time, we provide a Singularity container for the BRAKER software to minimize installation obstacles. Overall, BRAKER3 is an accurate, easy-to-use tool for the annotation of eukaryotic genomes.
format	Online Article Text
id	pubmed-10312602
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Cold Spring Harbor Laboratory
record_format	MEDLINE/PubMed
spelling	pubmed-103126022023-07-01 BRAKER3: Fully Automated Genome Annotation Using RNA-Seq and Protein Evidence with GeneMark-ETP, AUGUSTUS and TSEBRA Gabriel, Lars Brůna, Tomáš Hoff, Katharina J. Ebel, Matthis Lomsadze, Alexandre Borodovsky, Mark Stanke, Mario bioRxiv Article Gene prediction remains an active area of bioinformatics research. Challenges are presented by large eukaryotic genomes and heterogeneous data situations. To meet the challenges, several streams of evidence must be integrated, from protein homology and transcriptome data, as well as information derived from the genome itself. The amount and significance of the available evidence from transcriptomes and proteomes vary from genome to genome, between genes and even along a single gene. User-friendly and accurate annotation pipelines that can cope with such data heterogeneity are needed. The previously developed annotation pipelines BRAKER1 and BRAKER2 use RNA-Seq or protein data, respectively, but not both. The recently released GeneMark-ETP integrates all three types of data and achieves much higher levels of accuracy. We here present the BRAKER3 pipeline that builds on GeneMark-ETP and AUGUSTUS and further improves accuracy using the TSEBRA combiner. BRAKER3 annotates protein-coding genes in eukaryotic genomes using both short-read RNA-Seq and a large protein database along with statistical models learned iteratively and specifically for the target genome. We benchmarked the new pipeline on 11 species under controlled conditions on the assumed relatedness of the target species to available proteomes. BRAKER3 outperformed BRAKER1 and BRAKER2, increasing the average transcript-level F1-score by ~20 percentage points, most pronounced for species with large and complex genomes. BRAKER3 also outperforms MAKER2 and Funannotate. For the first time, we provide a Singularity container for the BRAKER software to minimize installation obstacles. Overall, BRAKER3 is an accurate, easy-to-use tool for the annotation of eukaryotic genomes. Cold Spring Harbor Laboratory 2023-09-02 /pmc/articles/PMC10312602/ /pubmed/37398387 http://dx.doi.org/10.1101/2023.06.10.544449 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle	Article Gabriel, Lars Brůna, Tomáš Hoff, Katharina J. Ebel, Matthis Lomsadze, Alexandre Borodovsky, Mark Stanke, Mario BRAKER3: Fully Automated Genome Annotation Using RNA-Seq and Protein Evidence with GeneMark-ETP, AUGUSTUS and TSEBRA
title	BRAKER3: Fully Automated Genome Annotation Using RNA-Seq and Protein Evidence with GeneMark-ETP, AUGUSTUS and TSEBRA
title_full	BRAKER3: Fully Automated Genome Annotation Using RNA-Seq and Protein Evidence with GeneMark-ETP, AUGUSTUS and TSEBRA
title_fullStr	BRAKER3: Fully Automated Genome Annotation Using RNA-Seq and Protein Evidence with GeneMark-ETP, AUGUSTUS and TSEBRA
title_full_unstemmed	BRAKER3: Fully Automated Genome Annotation Using RNA-Seq and Protein Evidence with GeneMark-ETP, AUGUSTUS and TSEBRA
title_short	BRAKER3: Fully Automated Genome Annotation Using RNA-Seq and Protein Evidence with GeneMark-ETP, AUGUSTUS and TSEBRA
title_sort	braker3: fully automated genome annotation using rna-seq and protein evidence with genemark-etp, augustus and tsebra
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10312602/ https://www.ncbi.nlm.nih.gov/pubmed/37398387 http://dx.doi.org/10.1101/2023.06.10.544449
work_keys_str_mv	AT gabriellars braker3fullyautomatedgenomeannotationusingrnaseqandproteinevidencewithgenemarketpaugustusandtsebra AT brunatomas braker3fullyautomatedgenomeannotationusingrnaseqandproteinevidencewithgenemarketpaugustusandtsebra AT hoffkatharinaj braker3fullyautomatedgenomeannotationusingrnaseqandproteinevidencewithgenemarketpaugustusandtsebra AT ebelmatthis braker3fullyautomatedgenomeannotationusingrnaseqandproteinevidencewithgenemarketpaugustusandtsebra AT lomsadzealexandre braker3fullyautomatedgenomeannotationusingrnaseqandproteinevidencewithgenemarketpaugustusandtsebra AT borodovskymark braker3fullyautomatedgenomeannotationusingrnaseqandproteinevidencewithgenemarketpaugustusandtsebra AT stankemario braker3fullyautomatedgenomeannotationusingrnaseqandproteinevidencewithgenemarketpaugustusandtsebra

BRAKER3: Fully Automated Genome Annotation Using RNA-Seq and Protein Evidence with GeneMark-ETP, AUGUSTUS and TSEBRA

Ejemplares similares