Cargando…
Balrog: A universal protein model for prokaryotic gene prediction
Low-cost, high-throughput sequencing has led to an enormous increase in the number of sequenced microbial genomes, with well over 100,000 genomes in public archives today. Automatic genome annotation tools are integral to understanding these organisms, yet older gene finding methods must be retraine...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7946324/ https://www.ncbi.nlm.nih.gov/pubmed/33635857 http://dx.doi.org/10.1371/journal.pcbi.1008727 |
_version_ | 1783663031267360768 |
---|---|
author | Sommer, Markus J. Salzberg, Steven L. |
author_facet | Sommer, Markus J. Salzberg, Steven L. |
author_sort | Sommer, Markus J. |
collection | PubMed |
description | Low-cost, high-throughput sequencing has led to an enormous increase in the number of sequenced microbial genomes, with well over 100,000 genomes in public archives today. Automatic genome annotation tools are integral to understanding these organisms, yet older gene finding methods must be retrained on each new genome. We have developed a universal model of prokaryotic genes by fitting a temporal convolutional network to amino-acid sequences from a large, diverse set of microbial genomes. We incorporated the new model into a gene finding system, Balrog (Bacterial Annotation by Learned Representation Of Genes), which does not require genome-specific training and which matches or outperforms other state-of-the-art gene finding tools. Balrog is freely available under the MIT license at https://github.com/salzberg-lab/Balrog. |
format | Online Article Text |
id | pubmed-7946324 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-79463242021-03-19 Balrog: A universal protein model for prokaryotic gene prediction Sommer, Markus J. Salzberg, Steven L. PLoS Comput Biol Research Article Low-cost, high-throughput sequencing has led to an enormous increase in the number of sequenced microbial genomes, with well over 100,000 genomes in public archives today. Automatic genome annotation tools are integral to understanding these organisms, yet older gene finding methods must be retrained on each new genome. We have developed a universal model of prokaryotic genes by fitting a temporal convolutional network to amino-acid sequences from a large, diverse set of microbial genomes. We incorporated the new model into a gene finding system, Balrog (Bacterial Annotation by Learned Representation Of Genes), which does not require genome-specific training and which matches or outperforms other state-of-the-art gene finding tools. Balrog is freely available under the MIT license at https://github.com/salzberg-lab/Balrog. Public Library of Science 2021-02-26 /pmc/articles/PMC7946324/ /pubmed/33635857 http://dx.doi.org/10.1371/journal.pcbi.1008727 Text en © 2021 Sommer, Salzberg http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Sommer, Markus J. Salzberg, Steven L. Balrog: A universal protein model for prokaryotic gene prediction |
title | Balrog: A universal protein model for prokaryotic gene prediction |
title_full | Balrog: A universal protein model for prokaryotic gene prediction |
title_fullStr | Balrog: A universal protein model for prokaryotic gene prediction |
title_full_unstemmed | Balrog: A universal protein model for prokaryotic gene prediction |
title_short | Balrog: A universal protein model for prokaryotic gene prediction |
title_sort | balrog: a universal protein model for prokaryotic gene prediction |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7946324/ https://www.ncbi.nlm.nih.gov/pubmed/33635857 http://dx.doi.org/10.1371/journal.pcbi.1008727 |
work_keys_str_mv | AT sommermarkusj balrogauniversalproteinmodelforprokaryoticgeneprediction AT salzbergstevenl balrogauniversalproteinmodelforprokaryoticgeneprediction |