Cargando…

BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database

The task of eukaryotic genome annotation remains challenging. Only a few genomes could serve as standards of annotation achieved through a tremendous investment of human curation efforts. Still, the correctness of all alternative isoforms, even in the best-annotated genomes, could be a good subject...

Descripción completa

Detalles Bibliográficos
Autores principales: Brůna, Tomáš, Hoff, Katharina J, Lomsadze, Alexandre, Stanke, Mario, Borodovsky, Mark
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7787252/
https://www.ncbi.nlm.nih.gov/pubmed/33575650
http://dx.doi.org/10.1093/nargab/lqaa108
_version_ 1783632789121269760
author Brůna, Tomáš
Hoff, Katharina J
Lomsadze, Alexandre
Stanke, Mario
Borodovsky, Mark
author_facet Brůna, Tomáš
Hoff, Katharina J
Lomsadze, Alexandre
Stanke, Mario
Borodovsky, Mark
author_sort Brůna, Tomáš
collection PubMed
description The task of eukaryotic genome annotation remains challenging. Only a few genomes could serve as standards of annotation achieved through a tremendous investment of human curation efforts. Still, the correctness of all alternative isoforms, even in the best-annotated genomes, could be a good subject for further investigation. The new BRAKER2 pipeline generates and integrates external protein support into the iterative process of training and gene prediction by GeneMark-EP+ and AUGUSTUS. BRAKER2 continues the line started by BRAKER1 where self-training GeneMark-ET and AUGUSTUS made gene predictions supported by transcriptomic data. Among the challenges addressed by the new pipeline was a generation of reliable hints to protein-coding exon boundaries from likely homologous but evolutionarily distant proteins. In comparison with other pipelines for eukaryotic genome annotation, BRAKER2 is fully automatic. It is favorably compared under equal conditions with other pipelines, e.g. MAKER2, in terms of accuracy and performance. Development of BRAKER2 should facilitate solving the task of harmonization of annotation of protein-coding genes in genomes of different eukaryotic species. However, we fully understand that several more innovations are needed in transcriptomic and proteomic technologies as well as in algorithmic development to reach the goal of highly accurate annotation of eukaryotic genomes.
format Online
Article
Text
id pubmed-7787252
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-77872522021-02-10 BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database Brůna, Tomáš Hoff, Katharina J Lomsadze, Alexandre Stanke, Mario Borodovsky, Mark NAR Genom Bioinform Standard Article The task of eukaryotic genome annotation remains challenging. Only a few genomes could serve as standards of annotation achieved through a tremendous investment of human curation efforts. Still, the correctness of all alternative isoforms, even in the best-annotated genomes, could be a good subject for further investigation. The new BRAKER2 pipeline generates and integrates external protein support into the iterative process of training and gene prediction by GeneMark-EP+ and AUGUSTUS. BRAKER2 continues the line started by BRAKER1 where self-training GeneMark-ET and AUGUSTUS made gene predictions supported by transcriptomic data. Among the challenges addressed by the new pipeline was a generation of reliable hints to protein-coding exon boundaries from likely homologous but evolutionarily distant proteins. In comparison with other pipelines for eukaryotic genome annotation, BRAKER2 is fully automatic. It is favorably compared under equal conditions with other pipelines, e.g. MAKER2, in terms of accuracy and performance. Development of BRAKER2 should facilitate solving the task of harmonization of annotation of protein-coding genes in genomes of different eukaryotic species. However, we fully understand that several more innovations are needed in transcriptomic and proteomic technologies as well as in algorithmic development to reach the goal of highly accurate annotation of eukaryotic genomes. Oxford University Press 2021-01-06 /pmc/articles/PMC7787252/ /pubmed/33575650 http://dx.doi.org/10.1093/nargab/lqaa108 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Standard Article
Brůna, Tomáš
Hoff, Katharina J
Lomsadze, Alexandre
Stanke, Mario
Borodovsky, Mark
BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database
title BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database
title_full BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database
title_fullStr BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database
title_full_unstemmed BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database
title_short BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database
title_sort braker2: automatic eukaryotic genome annotation with genemark-ep+ and augustus supported by a protein database
topic Standard Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7787252/
https://www.ncbi.nlm.nih.gov/pubmed/33575650
http://dx.doi.org/10.1093/nargab/lqaa108
work_keys_str_mv AT brunatomas braker2automaticeukaryoticgenomeannotationwithgenemarkepandaugustussupportedbyaproteindatabase
AT hoffkatharinaj braker2automaticeukaryoticgenomeannotationwithgenemarkepandaugustussupportedbyaproteindatabase
AT lomsadzealexandre braker2automaticeukaryoticgenomeannotationwithgenemarkepandaugustussupportedbyaproteindatabase
AT stankemario braker2automaticeukaryoticgenomeannotationwithgenemarkepandaugustussupportedbyaproteindatabase
AT borodovskymark braker2automaticeukaryoticgenomeannotationwithgenemarkepandaugustussupportedbyaproteindatabase