Cargando…

Balrog: A universal protein model for prokaryotic gene prediction

Low-cost, high-throughput sequencing has led to an enormous increase in the number of sequenced microbial genomes, with well over 100,000 genomes in public archives today. Automatic genome annotation tools are integral to understanding these organisms, yet older gene finding methods must be retraine...

Descripción completa

Detalles Bibliográficos
Autores principales: Sommer, Markus J., Salzberg, Steven L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7946324/
https://www.ncbi.nlm.nih.gov/pubmed/33635857
http://dx.doi.org/10.1371/journal.pcbi.1008727
_version_ 1783663031267360768
author Sommer, Markus J.
Salzberg, Steven L.
author_facet Sommer, Markus J.
Salzberg, Steven L.
author_sort Sommer, Markus J.
collection PubMed
description Low-cost, high-throughput sequencing has led to an enormous increase in the number of sequenced microbial genomes, with well over 100,000 genomes in public archives today. Automatic genome annotation tools are integral to understanding these organisms, yet older gene finding methods must be retrained on each new genome. We have developed a universal model of prokaryotic genes by fitting a temporal convolutional network to amino-acid sequences from a large, diverse set of microbial genomes. We incorporated the new model into a gene finding system, Balrog (Bacterial Annotation by Learned Representation Of Genes), which does not require genome-specific training and which matches or outperforms other state-of-the-art gene finding tools. Balrog is freely available under the MIT license at https://github.com/salzberg-lab/Balrog.
format Online
Article
Text
id pubmed-7946324
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-79463242021-03-19 Balrog: A universal protein model for prokaryotic gene prediction Sommer, Markus J. Salzberg, Steven L. PLoS Comput Biol Research Article Low-cost, high-throughput sequencing has led to an enormous increase in the number of sequenced microbial genomes, with well over 100,000 genomes in public archives today. Automatic genome annotation tools are integral to understanding these organisms, yet older gene finding methods must be retrained on each new genome. We have developed a universal model of prokaryotic genes by fitting a temporal convolutional network to amino-acid sequences from a large, diverse set of microbial genomes. We incorporated the new model into a gene finding system, Balrog (Bacterial Annotation by Learned Representation Of Genes), which does not require genome-specific training and which matches or outperforms other state-of-the-art gene finding tools. Balrog is freely available under the MIT license at https://github.com/salzberg-lab/Balrog. Public Library of Science 2021-02-26 /pmc/articles/PMC7946324/ /pubmed/33635857 http://dx.doi.org/10.1371/journal.pcbi.1008727 Text en © 2021 Sommer, Salzberg http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Sommer, Markus J.
Salzberg, Steven L.
Balrog: A universal protein model for prokaryotic gene prediction
title Balrog: A universal protein model for prokaryotic gene prediction
title_full Balrog: A universal protein model for prokaryotic gene prediction
title_fullStr Balrog: A universal protein model for prokaryotic gene prediction
title_full_unstemmed Balrog: A universal protein model for prokaryotic gene prediction
title_short Balrog: A universal protein model for prokaryotic gene prediction
title_sort balrog: a universal protein model for prokaryotic gene prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7946324/
https://www.ncbi.nlm.nih.gov/pubmed/33635857
http://dx.doi.org/10.1371/journal.pcbi.1008727
work_keys_str_mv AT sommermarkusj balrogauniversalproteinmodelforprokaryoticgeneprediction
AT salzbergstevenl balrogauniversalproteinmodelforprokaryoticgeneprediction