Cargando…

TSEBRA: transcript selector for BRAKER

BACKGROUND: BRAKER is a suite of automatic pipelines, BRAKER1 and BRAKER2, for the accurate annotation of protein-coding genes in eukaryotic genomes. Each pipeline trains statistical models of protein-coding genes based on provided evidence and, then predicts protein-coding genes in genomic sequence...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gabriel, Lars, Hoff, Katharina J., Brůna, Tomáš, Borodovsky, Mark, Stanke, Mario
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8620231/ https://www.ncbi.nlm.nih.gov/pubmed/34823473 http://dx.doi.org/10.1186/s12859-021-04482-0

_version_	1784605169857593344
author	Gabriel, Lars Hoff, Katharina J. Brůna, Tomáš Borodovsky, Mark Stanke, Mario
author_facet	Gabriel, Lars Hoff, Katharina J. Brůna, Tomáš Borodovsky, Mark Stanke, Mario
author_sort	Gabriel, Lars
collection	PubMed
description	BACKGROUND: BRAKER is a suite of automatic pipelines, BRAKER1 and BRAKER2, for the accurate annotation of protein-coding genes in eukaryotic genomes. Each pipeline trains statistical models of protein-coding genes based on provided evidence and, then predicts protein-coding genes in genomic sequences using both the extrinsic evidence and statistical models. For training and prediction, BRAKER1 and BRAKER2 incorporate complementary extrinsic evidence: BRAKER1 uses only RNA-seq data while BRAKER2 uses only a database of cross-species proteins. The BRAKER suite has so far not been able to reliably exceed the accuracy of BRAKER1 and BRAKER2 when incorporating both types of evidence simultaneously. Currently, for a novel genome project where both RNA-seq and protein data are available, the best option is to run both pipelines independently, and to pick one, likely better output. Therefore, one or another type of the extrinsic evidence would remain unexploited. RESULTS: We present TSEBRA, a software that selects gene predictions (transcripts) from the sets generated by BRAKER1 and BRAKER2. TSEBRA uses a set of rules to compare scores of overlapping transcripts based on their support by RNA-seq and homologous protein evidence. We show in computational experiments on genomes of 11 species that TSEBRA achieves higher accuracy than either BRAKER1 or BRAKER2 running alone and that TSEBRA compares favorably with the combiner tool EVidenceModeler. CONCLUSION: TSEBRA is an easy-to-use and fast software tool. It can be used in concert with the BRAKER pipeline to generate a gene prediction set supported by both RNA-seq and homologous protein evidence. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04482-0.
format	Online Article Text
id	pubmed-8620231
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-86202312021-11-29 TSEBRA: transcript selector for BRAKER Gabriel, Lars Hoff, Katharina J. Brůna, Tomáš Borodovsky, Mark Stanke, Mario BMC Bioinformatics Software BACKGROUND: BRAKER is a suite of automatic pipelines, BRAKER1 and BRAKER2, for the accurate annotation of protein-coding genes in eukaryotic genomes. Each pipeline trains statistical models of protein-coding genes based on provided evidence and, then predicts protein-coding genes in genomic sequences using both the extrinsic evidence and statistical models. For training and prediction, BRAKER1 and BRAKER2 incorporate complementary extrinsic evidence: BRAKER1 uses only RNA-seq data while BRAKER2 uses only a database of cross-species proteins. The BRAKER suite has so far not been able to reliably exceed the accuracy of BRAKER1 and BRAKER2 when incorporating both types of evidence simultaneously. Currently, for a novel genome project where both RNA-seq and protein data are available, the best option is to run both pipelines independently, and to pick one, likely better output. Therefore, one or another type of the extrinsic evidence would remain unexploited. RESULTS: We present TSEBRA, a software that selects gene predictions (transcripts) from the sets generated by BRAKER1 and BRAKER2. TSEBRA uses a set of rules to compare scores of overlapping transcripts based on their support by RNA-seq and homologous protein evidence. We show in computational experiments on genomes of 11 species that TSEBRA achieves higher accuracy than either BRAKER1 or BRAKER2 running alone and that TSEBRA compares favorably with the combiner tool EVidenceModeler. CONCLUSION: TSEBRA is an easy-to-use and fast software tool. It can be used in concert with the BRAKER pipeline to generate a gene prediction set supported by both RNA-seq and homologous protein evidence. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04482-0. BioMed Central 2021-11-25 /pmc/articles/PMC8620231/ /pubmed/34823473 http://dx.doi.org/10.1186/s12859-021-04482-0 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Software Gabriel, Lars Hoff, Katharina J. Brůna, Tomáš Borodovsky, Mark Stanke, Mario TSEBRA: transcript selector for BRAKER
title	TSEBRA: transcript selector for BRAKER
title_full	TSEBRA: transcript selector for BRAKER
title_fullStr	TSEBRA: transcript selector for BRAKER
title_full_unstemmed	TSEBRA: transcript selector for BRAKER
title_short	TSEBRA: transcript selector for BRAKER
title_sort	tsebra: transcript selector for braker
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8620231/ https://www.ncbi.nlm.nih.gov/pubmed/34823473 http://dx.doi.org/10.1186/s12859-021-04482-0
work_keys_str_mv	AT gabriellars tsebratranscriptselectorforbraker AT hoffkatharinaj tsebratranscriptselectorforbraker AT brunatomas tsebratranscriptselectorforbraker AT borodovskymark tsebratranscriptselectorforbraker AT stankemario tsebratranscriptselectorforbraker

TSEBRA: transcript selector for BRAKER

Ejemplares similares