Cargando…

StORF-Reporter: finding genes between genes

Large regions of prokaryotic genomes are currently without any annotation, in part due to well-established limitations of annotation tools. For example, it is routine for genes using alternative start codons to be misreported or completely omitted. Therefore, we present StORF-Reporter, a tool that t...

Descripción completa

Detalles Bibliográficos
Autores principales: Dimonaco, Nicholas J, Clare, Amanda, Kenobi, Kim, Aubrey, Wayne, Creevey, Christopher J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10682499/
https://www.ncbi.nlm.nih.gov/pubmed/37897345
http://dx.doi.org/10.1093/nar/gkad814
_version_ 1785150989163036672
author Dimonaco, Nicholas J
Clare, Amanda
Kenobi, Kim
Aubrey, Wayne
Creevey, Christopher J
author_facet Dimonaco, Nicholas J
Clare, Amanda
Kenobi, Kim
Aubrey, Wayne
Creevey, Christopher J
author_sort Dimonaco, Nicholas J
collection PubMed
description Large regions of prokaryotic genomes are currently without any annotation, in part due to well-established limitations of annotation tools. For example, it is routine for genes using alternative start codons to be misreported or completely omitted. Therefore, we present StORF-Reporter, a tool that takes an annotated genome and returns regions that may contain missing CDS genes from unannotated regions. StORF-Reporter consists of two parts. The first begins with the extraction of unannotated regions from an annotated genome. Next, Stop-ORFs (StORFs) are identified in these unannotated regions. StORFs are open reading frames that are delimited by stop codons and thus can capture those genes most often missing in genome annotations. We show this methodology recovers genes missing from canonical genome annotations. We inspect the results of the genomes of model organisms, the pangenome of Escherichia coli, and a set of 5109 prokaryotic genomes of 247 genera from the Ensembl Bacteria database. StORF-Reporter extended the core, soft-core and accessory gene collections, identified novel gene families and extended families into additional genera. The high levels of sequence conservation observed between genera suggest that many of these StORFs are likely to be functional genes that should now be considered for inclusion in canonical annotations.
format Online
Article
Text
id pubmed-10682499
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-106824992023-11-30 StORF-Reporter: finding genes between genes Dimonaco, Nicholas J Clare, Amanda Kenobi, Kim Aubrey, Wayne Creevey, Christopher J Nucleic Acids Res Data Resources and Analyses Large regions of prokaryotic genomes are currently without any annotation, in part due to well-established limitations of annotation tools. For example, it is routine for genes using alternative start codons to be misreported or completely omitted. Therefore, we present StORF-Reporter, a tool that takes an annotated genome and returns regions that may contain missing CDS genes from unannotated regions. StORF-Reporter consists of two parts. The first begins with the extraction of unannotated regions from an annotated genome. Next, Stop-ORFs (StORFs) are identified in these unannotated regions. StORFs are open reading frames that are delimited by stop codons and thus can capture those genes most often missing in genome annotations. We show this methodology recovers genes missing from canonical genome annotations. We inspect the results of the genomes of model organisms, the pangenome of Escherichia coli, and a set of 5109 prokaryotic genomes of 247 genera from the Ensembl Bacteria database. StORF-Reporter extended the core, soft-core and accessory gene collections, identified novel gene families and extended families into additional genera. The high levels of sequence conservation observed between genera suggest that many of these StORFs are likely to be functional genes that should now be considered for inclusion in canonical annotations. Oxford University Press 2023-10-28 /pmc/articles/PMC10682499/ /pubmed/37897345 http://dx.doi.org/10.1093/nar/gkad814 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Data Resources and Analyses
Dimonaco, Nicholas J
Clare, Amanda
Kenobi, Kim
Aubrey, Wayne
Creevey, Christopher J
StORF-Reporter: finding genes between genes
title StORF-Reporter: finding genes between genes
title_full StORF-Reporter: finding genes between genes
title_fullStr StORF-Reporter: finding genes between genes
title_full_unstemmed StORF-Reporter: finding genes between genes
title_short StORF-Reporter: finding genes between genes
title_sort storf-reporter: finding genes between genes
topic Data Resources and Analyses
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10682499/
https://www.ncbi.nlm.nih.gov/pubmed/37897345
http://dx.doi.org/10.1093/nar/gkad814
work_keys_str_mv AT dimonaconicholasj storfreporterfindinggenesbetweengenes
AT clareamanda storfreporterfindinggenesbetweengenes
AT kenobikim storfreporterfindinggenesbetweengenes
AT aubreywayne storfreporterfindinggenesbetweengenes
AT creeveychristopherj storfreporterfindinggenesbetweengenes